What is RAG?
RAG stands for Retrieval Augmented Generation. This is the process when the Large Language Models (LMM) input is enriched with a contextual information from the external sources by a provided prompt.
What problem it solves?
- Stale or outdated data. Retrieving relevant data from the external sources and data which is up to date.
- Invalid answers or hallucinations. Adds data which may be missing, new or proprietary.
What is the Process?
The RAG process consists of three main stages: Retrieval
- orchestrator accepts the prompt in a free text form. "Where are libraries in my district?"
- creates a vector query from the prompt. Vector is also called vector embedding. Vector is a floating point number (1.345). Vector embedding is a series of vectors (e.g.
[1.34, 4.78, -0.35]
). - vector query is used to retrieve information from vector database.
- vector database can match similar vector by a vector query and returns top matching documents or text passages Augmented
- The retrieved text passages are integrated into the original prompt, adding specific context. For example: "Context: Libraries in users district are located in X street 1km away form his location. Question: Where are libraries in my district?" Generation
- LLM generates response based on prompt and context
Is Vector database mandatory?
No, a vector database is not mandatory for the RAG process. However, it is highly recommended due to its efficiency, scalability, and high accuracy in matching semantically similar texts, significantly improving the quality and reliability of retrieval results.