RAG system - a two node pipeline
Goal: A two node pipeline with a databroker that takes one or more pdf docs and prepares them for RAG und a model that allows Q&A
RAG
Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
Retrieval-Augmented Generation (RAG) combines two main components:
-
Retrieval Component: This part involves searching a large corpus of documents or knowledge base to find relevant information related to a given query.
-
Generation Component: This part involves generating a response or text based on the information retrieved by the first component.
- Input Query: A user provides an input query or prompt.
- Document Retrieval: The retrieval model (often a variant of a dense retriever like DPR - Dense Passage Retriever) searches through a pre-indexed database to find documents or passages that are most relevant to the input query.
- Contextualization: The retrieved documents or passages provide context and information for the query.
- Text Generation: The generation model (typically a sequence-to-sequence model like BERT or GPT) uses the retrieved documents to generate a coherent and contextually appropriate response to the query.
- Question Answering: Enhances the ability to answer questions with up-to-date and specific information by retrieving relevant documents.
- Content Creation: Aids in generating articles, reports, or stories by providing detailed information to support the generated content.
- Customer Support: Improves automated customer service by retrieving and providing precise information from a knowledge base.
-
User Input : PDF document :
-
PDF documents - Unstructured API ( Free version - 1000 pages per month ) -
Langchain's other libraries -( UnstructuredPDFLoader, PyPdf, PyMuPDFLoader, PdfReader ) - Poppler missing, returns empty pages. -
Online PDF documents - Langchain's OnlinePDFLoader ( open-source )
In conclusion, OnlinePDFLoader is able to load and split the PDF and extract the data meaningfully. Since it is an extension of Unstructured library, it follows
RecursiveCharacterTextSplitter , Chunck overlap.
-
-
Vector Library and Retriever :
-
FAISS db and Retriever with similarity search type. meta's Faiss is a vector library for efficient similarity search and clustering of dense vectors. There are other alternatives like Chrome, etc.
-
-
Generator :
-
LLama3-8b-Instruct - quantized model - Required GPU or the FEC -LLM hosting setup -
openAI model with api access -
other open-source models for Graphene ?
-
-
Creation of 2 containers and import in Graphene
-
Server script and WebUI is currently work in progress
RAG represents a significant step forward by integrating retrieval mechanisms with generative models, thus enabling more accurate, informative, and contextually aware text generation. This hybrid approach leverages the strengths of both retrieval-based and generation-based methods to provide richer and more reliable responses.
- https://colab.research.google.com/drive/1BJYYyrPVe0_9EGyXqeNyzmVZDrCRZwsg?usp=sharing#scrollTo=Y2m2l-vt_RSp,
- https://unstructured.io/blog/how-to-process-pdf-in-python
- https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/
- https://weaviate.io/blog/vector-library-vs-vector-database
- https://ai.meta.com/tools/faiss/