Skip to content

RAG system - a two node pipeline

Goal: A two node pipeline with a databroker that takes one or more pdf docs and prepares them for RAG und a model that allows Q&A

RAG

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

Retrieval-Augmented Generation (RAG) combines two main components:

  • Retrieval Component: This part involves searching a large corpus of documents or knowledge base to find relevant information related to a given query.

  • Generation Component: This part involves generating a response or text based on the information retrieved by the first component.

    image.png

    How Does RAG Work?

    1. Input Query: A user provides an input query or prompt.
    2. Document Retrieval: The retrieval model (often a variant of a dense retriever like DPR - Dense Passage Retriever) searches through a pre-indexed database to find documents or passages that are most relevant to the input query.
    3. Contextualization: The retrieved documents or passages provide context and information for the query.
    4. Text Generation: The generation model (typically a sequence-to-sequence model like BERT or GPT) uses the retrieved documents to generate a coherent and contextually appropriate response to the query.

    Applications of RAG

    • Question Answering: Enhances the ability to answer questions with up-to-date and specific information by retrieving relevant documents.
    • Content Creation: Aids in generating articles, reports, or stories by providing detailed information to support the generated content.
    • Customer Support: Improves automated customer service by retrieving and providing precise information from a knowledge base.

    Tasks

    1. User Input : PDF document :

      • PDF documents - Unstructured API ( Free version - 1000 pages per month )
      • Langchain's other libraries -( UnstructuredPDFLoader, PyPdf, PyMuPDFLoader, PdfReader ) - Poppler missing, returns empty pages.
      • Online PDF documents - Langchain's OnlinePDFLoader ( open-source )

      In conclusion, OnlinePDFLoader is able to load and split the PDF and extract the data meaningfully. Since it is an extension of Unstructured library, it follows RecursiveCharacterTextSplitter , Chunck overlap.

    2. Vector Library and Retriever :

      • FAISS db and Retriever with similarity search type.

        meta's Faiss is a vector library for efficient similarity search and clustering of dense vectors. There are other alternatives like Chrome, etc.

    3. Generator :

      • LLama3-8b-Instruct - quantized model - Required GPU or the FEC -LLM hosting setup
      • openAI model with api access
      • other open-source models for Graphene ?

    Containerization:

    • Creation of 2 containers and import in Graphene

    Unification of RAG pipline:

  • Server script and WebUI is currently work in progress

    Conclusion

    RAG represents a significant step forward by integrating retrieval mechanisms with generative models, thus enabling more accurate, informative, and contextually aware text generation. This hybrid approach leverages the strengths of both retrieval-based and generation-based methods to provide richer and more reliable responses.

    References

    1. https://colab.research.google.com/drive/1BJYYyrPVe0_9EGyXqeNyzmVZDrCRZwsg?usp=sharing#scrollTo=Y2m2l-vt_RSp,
    2. https://unstructured.io/blog/how-to-process-pdf-in-python
    3. https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/
    4. https://weaviate.io/blog/vector-library-vs-vector-database
    5. https://ai.meta.com/tools/faiss/
Edited by Sangamithra Panneer Selvam