RAG system - a two node pipeline

Goal: A two node pipeline with a databroker that takes one or more pdf docs and prepares them for RAG und a model that allows Q&A

RAG

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

Retrieval-Augmented Generation (RAG) combines two main components:

Retrieval Component: This part involves searching a large corpus of documents or knowledge base to find relevant information related to a given query.
Generation Component: This part involves generating a response or text based on the information retrieved by the first component.

How Does RAG Work?
1. Input Query: A user provides an input query or prompt.
2. Document Retrieval: The retrieval model (often a variant of a dense retriever like DPR - Dense Passage Retriever) searches through a pre-indexed database to find documents or passages that are most relevant to the input query.
3. Contextualization: The retrieved documents or passages provide context and information for the query.
4. Text Generation: The generation model (typically a sequence-to-sequence model like BERT or GPT) uses the retrieved documents to generate a coherent and contextually appropriate response to the query.
Applications of RAG
- Question Answering: Enhances the ability to answer questions with up-to-date and specific information by retrieving relevant documents.
- Content Creation: Aids in generating articles, reports, or stories by providing detailed information to support the generated content.
- Customer Support: Improves automated customer service by retrieving and providing precise information from a knowledge base.
Tasks
1. User Input : PDF document :
  - PDF documents - Unstructured API ( Free version - 1000 pages per month )
  - Langchain's other libraries -( UnstructuredPDFLoader, PyPdf, PyMuPDFLoader, PdfReader ) - Poppler missing, returns empty pages.
  - Online PDF documents - Langchain's OnlinePDFLoader ( open-source )
  In conclusion, OnlinePDFLoader is able to load and split the PDF and extract the data meaningfully. Since it is an extension of Unstructured library, it follows RecursiveCharacterTextSplitter , Chunck overlap.
2. Vector Library and Retriever :
  - FAISS db and Retriever with similarity search type.
    
    meta's Faiss is a vector library for efficient similarity search and clustering of dense vectors. There are other alternatives like Chrome, etc.
3. Generator :
  - LLama3-8b-Instruct - quantized model - Required GPU or the FEC -LLM hosting setup
  - openAI model with api access
  - other open-source models for Graphene ?
Containerization:
- Creation of 2 containers and import in Graphene
Unification of RAG pipline:
Server script and WebUI is currently work in progress

Conclusion

RAG represents a significant step forward by integrating retrieval mechanisms with generative models, thus enabling more accurate, informative, and contextually aware text generation. This hybrid approach leverages the strengths of both retrieval-based and generation-based methods to provide richer and more reliable responses.

References

Edited Jun 04, 2024 by Sangamithra Panneer Selvam

RAG system - a two node pipeline

RAG

How Does RAG Work?

Applications of RAG

Tasks

Containerization:

Unification of RAG pipline:

Conclusion

References