Grounding LLM with Knowledge graph
The aim of this use case is to visualize the future goal in Graphene such as : 'We provide Grounding capabilities in Graphene'. The users with their LLM or LLM of their choice from langchain component modules can ground the LLM capabilities with the Knowledge graph generated.
Approaches on LLM + KG
LLMs can be combined with Knowledge Graphs (KGs) using three approaches:
- KG-enhanced LLMs: These integrate KGs into LLMs during training and use them for better comprehension.
- LLM-augmented KGs: LLMs can improve various KG tasks like embedding, completion, and question answering.
- Synergized LLMs + KGs: LLMs and KGs work together, enhancing each other for two-way reasoning driven by data and knowledge.
Pros: Reduce hallucinations, give better results and augment LLM with untrained information for good output.
Challenges: Containerize components, Query generation
KG VS Vector DB: At the moment, the langchain or other tools or platforms use Vector DB and RAG method but the major limitations is Vector databases are more likely to provide incomplete or irrelevant results when returning an answer because they rely on similarity scoring and a predefined result limit.
However, because knowledge graph entities are directly connected by relationships, the number of relationships is different for every entity. Knowledge graphs retrieve and return the exact answer, and nothing more.
Links: https://neo4j.com/blog/knowledge-graph-vs-vectordb-for-retrieval-augmented-generation/
Work Discussed in open forums
In then open discussion forums, most common approaches that research communities discuss about the below proposed methodology for both RAG with vector DB or RAG with Kg.
Tools and methodologies
During the literature survey phase, different methodologies are found which are executed and compared. The following approaches on experimented or still under experimentation to construct the Gounding LLM pipeline,
The work is still on going and will be updated when needed.
Work in progress
-
Parsing unstructured data into structured data for extracting triples ( entities and relations ) -
Creation of KG python module and KG in Neo4j with those triples. -
Integration of Langchain with Neo4j -
Definition of suitable prompt messages to the openAI model to get output. -
Literature study on OpenAI functions, Langchain tutorial, how to build KG so it works with Langchain, etc. -
Address Langchain component issue from literature study and embedding idea -
Comparison study on chat gpt models ( gpt-4 Vs. gpt-turbo-3.5 ) -
Add Fallbacks ( embeddings for KG, for LLM? ) -
Fixate components for the pipeline -
Container creation for individual components
Containers:
-
Databroker: The databroker is responsible for acquiring data ( user's query and usecase specific document ) from the user and pass it into the docker container - Parser model.
- First running version implemented
-
Parser Model: The parser model is an OpenAI's LLM model that converts the unstructured usecase specific data into structured data ( i.e. in terms of triples - entities and relations ) for constructing Knowledge Graph. The structured data is then sent into the final docker container - GroundingLLM
- First running version implemented
-
GroundingLLM module: The GroundingLLM container contains neo4j as the base image, a Langchain component and an OpenAI's LLM model. The structured data is used to construct KG in Neo4j database and the Langchain component access this to etxract relevant information from the Graph.
- Langchain with Neo4j and LLM
- Neo4j image readily available. Container created by disabling authentication.
- Checked transfer of data in and from the Neo4j docker container
- KG created in Neo4j and Langchain can access the KG within the docker container
- Langchain retrives and sends the data to LLM's prompt
- LLM generate response based on extracted graph and user's question
Future Scope
As a future scope, one can think about
- Conversational memory for LLM for interactive pipeline.
- Embeddings and Embedding store.
- Generic database node for graph and vector storage.
- Open source models and Opengpt-X
- Evaluating LLM via Metrics.
Refer to #36 (closed) for further improvements on the GroundingLLM pipeline
References
- https://neo4j.com/blog/knowledge-graph-vs-vectordb-for-retrieval-augmented-generation/
- https://arxiv.org/abs/2306.08302 -Unifying LLMs with KG
- https://www.marktechpost.com/2023/09/19/llms-knowledge-graphs/
- https://towardsdatascience.com/integrate-llm-workflows-with-knowledge-graph-using-neo4j-and-apoc-27ef7e9900a2
- https://techcommunity.microsoft.com/t5/fasttrack-for-azure/grounding-llms/ba-p/3843857
- https://towardsdatascience.com/langchain-has-added-cypher-search-cb9d821120d5
- https://towardsdatascience.com/integrate-llm-workflows-with-knowledge-graph-using-neo4j-and-apoc-27ef7e9900a2
- https://github.com/tomasonjo/blogs/blob/master/llm/langchain_neo4j.ipynb
- https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/chains/graph_qa
- https://blog.langchain.dev/constructing-knowledge-graphs-from-text-using-openai-functions/
- https://python.langchain.com/docs/modules/memory/types/kg
- https://python.langchain.com/docs/use_cases/graph/graph_networkx_qa
- https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/graphs