Skip to content
Snippets Groups Projects
Commit 509e5610 authored by Swetha Lakshmana Murthy's avatar Swetha Lakshmana Murthy
Browse files

Updated documentation

parent 3a53b4b0
No related branches found
No related tags found
No related merge requests found
# Dynamic Docker Container Generation for Various LLMs in the AI-Builder
# LLM Containers in AI-Builder
<img src="images/Sequence-diagram.svg" alt="alt text" width="350"/>
## Introduction
The AI-Builder platform enables users to integrate and utilize various large language models (LLMs) by leveraging Fraunhofer IAIS's GenAI Serving Platform and a Unified Protobuf Interface.
The concept involves dynamically creating and managing individual Docker containers for various language models (LLMs) available from the hosting platform. Users can incorporate these LLM containers into their workflows through a simple drag-and-drop interface. These containers are made available by successfully creating and onboarding them into the AI Builder. Once onboarded into AI-Builder, these containers allow application nodes to send queries directly to the relevant LLM container, which processes the input and returns a response.
This decoupled architecture abstracts the complexity of different LLM implementations, making it easy for applications to use multiple models without direct interaction with their underlying systems. Organizations can deploy modular AI systems where specific LLMs are designated for particular tasks, enhancing flexibility and efficiency. Containerization is particularly advantageous in research settings, where different LLMs must be compared or tested under varying conditions. Researchers can quickly spin up multiple containers, conduct experiments, and tear them down when finished, streamlining the experimentation process.
The accompanying diagram illustrates this workflow. The LLMs are hosted externally on Fraunhofer's platform and accessed through an SDK, with server-client communication enabled by a unified protobuf interface that creates and manages the individual Docker containers.
<p align="center">
<img src="images/Sequence-diagram.svg" alt="alt text" width="550"/>
</p>
# Table of Contents
- [Introduction](#introduction)
- [LLM Containerization Explained](#llm-containerization-explained)
- [About the Unified Protobuf Interface](#about-the-unified-protobuf-interface)
- [LLM Containerization Explained](#llm-containerization-explained)
- [Usage and Applications](#usage-and-applications)
- [References](#references)
- [Considerate Use of LLM Containers](#considerate-use-of-llm-containers)
## Introduction
The concept involves dynamically creating and managing individual Docker containers for various language models (LLMs) available from the Fraunhofer IAIS LLM hosting platform. Users can drag and drop these containers into their workflows once a container is successfully created and onboarded into AI Builder. This setup enables application nodes to send queries directly to the relevant LLM container, which processes the input and returns a response.
This decoupled architecture abstracts away the complexities of the underlying LLM implementations, allowing applications to utilize multiple LLMs seamlessly. Organizations can deploy modular AI systems where specific LLMs are designated for particular tasks, enhancing flexibility and efficiency. Containerization is particularly advantageous in research settings, where different LLMs must be compared or tested under varying conditions. Researchers can quickly spin up multiple containers, conduct experiments, and tear them down when finished, streamlining the experimentation process.
## LLM Containerization Explained
Each LLM is encapsulated within its own Docker container, ensuring a consistent and isolated environment for each model. These containers can be generated on-demand, depending on the models available on the LLM hosting platform, with a script specifically designed to facilitate this process. Using the LLM platform's SDK, we can create containers as client objects. We can then configure an LLM instance with the specific model we want to use.
Once the docker containers are created, we can onboard them to the AI-Builder platform. A sample list of currently available LLMs in AI-Builder is shown below,
| **LLMs** | **Design Studio View** |
|-------------|-----------|
| - mixtral:8x22b <br> - Mistral-7B-Instruct-v0.3_t2t <br> - mixtral:8x22b <br> - Mistral-7B-Instruct-v0.3_t2t <br> - llama3:8b <br> - llama3:70b <br> - codellama:7b <br> - Llama-3.1-8B-Instruct <br> - Llama-3.1-70B-Instruct <br> - Mixtral-8x7B-Instruct-v0.1 <br> - OpenGPT-X-24EU-2.5T-Instruct-17 <br> - OpenGPT-X-24EU-2.5T-Instruct-ENDEFRITES <br> - OpenGPT-X-24EU-3.5T-Instruct-HONEY | <img src="images/dd-llms.png" alt="alt text" width="250"/>
<details><summary>About the template generation</summary>
## About the Unified Protobuf Interface
> In this setup, Jinja templates are used to dynamically generate files for a gRPC server-client LLM container. The `templates` dictionary maps Jinja template files (with a `.j2` extension) to their target output file paths. For example, `"llms.proto.j2"` will be rendered to create the `llms.proto` file, and `"app.py.j2"` will generate the `app.py` file.
>
> The `context` dictionary contains the data that will be passed to these templates, allowing for the customization of the generated files. Here, `model_name` and `modality` are the key-value pairs in the context, which the templates can use to insert appropriate content into the output files.
>
> Overall, this setup automates the creation of key components of the gRPC application by leveraging Jinja templates, making it easier to manage and customize the codebase.
>
> To generate the docker containers, execute the following script,
`python generate_containers.py`
>
> To upload the docker containers to the cicd registry, execute the following script,
`python generate_containers.py`
</details>
The provided protobuf definition outlines a flexible framework that is fully customizable for integrating various language models (LLMs) in the Eclipse Graphene platform, allowing users to access different models through a standardized interface. The protobuf file serves as the backbone for client-server communication and acts as the core interface for the LLM Docker container.
**Key Points:**
- *Standardization:* Provides a unified structure for interacting with different LLMs while maintaining flexibility regarding framework, configuration, and deployment.
- *Scalability:* The system can scale quickly by encapsulating each LLM in a Docker container.
- *Easy Integration:* Users can interact with the LLMs through a standardized interface, allowing them to focus on the task without dealing with the intricacies of model selection, configuration, and framework dependencies.
- *Streamlined LLM Response and State Continuity*: Users can acquire a consistent traceable response.
## About the Unified Protobuf Interface
*Please Note* : The provided protobuf definitions are not limited to the abovementioned points and offer scope for further expansion.
Please refer to the following explanation regarding the
protobuf file.
For more details, please refer to the explanation regarding the unified protobuf file below.
![](images/arch-node.png)
```protobuf
......@@ -105,42 +95,94 @@ service LLMService {
> `instruct_llm_stream`
> On the other hand, the instruct_llm_stream method supports streaming RPC, enabling the exchange of multiple LLMQuery and LLMAnswer messages in a continuous flow. This is particularly useful for applications requiring real-time or ongoing interactions, such as conversational agents or systems dealing with large volumes of data. It allows for efficient handling of continuous input and output, supporting scenarios like live chats where we need to stream back responses as soon as they are available.
## LLM Containerization Explained
Each LLM is encapsulated within its own Docker container, ensuring a consistent and isolated environment for each model. These containers can be generated on-demand, depending on the models available on the LLM hosting platform, with a script specifically designed to facilitate this process.
<details><summary>Creating LLM Containers - About the template generation</summary>
> In this setup, Jinja templates are used to dynamically generate files for a gRPC server-client LLM container. The `templates` dictionary maps Jinja template files (with a `.j2` extension) to their target output file paths. For example, `"llms.proto.j2"` will be rendered to create the `llms.proto` file, and `"app.py.j2"` will generate the `app.py` file.
>
> The `context` dictionary contains the data that will be passed to these templates, allowing for the customization of the generated files. Here, `model_name` and `modality` are the key-value pairs in the context, which the templates can use to insert appropriate content into the output files.
>
> Overall, this setup automates the creation of key components of the gRPC application by leveraging Jinja templates, making it easier to manage and customize the codebase.
>
> To generate the docker containers, execute the following script,
`python generate_containers.py`
>
> To upload the docker containers to the cicd registry, execute the following script,
`python generate_containers.py`
</details>
<br>
Once the docker containers are created, we can onboard them to the AI-Builder platform. A sample list of currently available LLMs in AI-Builder is shown below,
| **LLMs** | **Design Studio View** |
|-------------|-----------|
| - mixtral:8x22b <br> - Mistral-7B-Instruct-v0.3_t2t <br> - Mixtral-8x7B-Instruct-v0.1 <br> - llama3:8b <br> - llama3:70b <br> - codellama:7b <br> - Llama-3.1-8B-Instruct <br> - Llama-3.1-70B-Instruct <br> - OpenGPT-X-24EU-4T-Bactrian-ENDEFRITES <br> - OpenGPT-X-24EU-4T-Instruct-HONEY | <img src="images/dd-llms.png" alt="alt text" width="150"/> <img src="images/llms_import.png" alt="alt text" width="150"/>
## Usage and Applications
### Drag and drop feature:
Users can drag and drop these LLM containers into their workflows as shown here,
### Drag and drop feature
Users can drag and drop these LLM containers into their workflows as shown here offering model switching ability.
<p align="center">
<img src="images/drag-drop.drawio.svg" alt="alt text" width="350"/>
</p>
### Serving as LLM Chatbots
Each Docker container can serve as an independent LLM chatbot. This means that users can deploy multiple chatbots, each powered by a different model, to handle different conversational needs. We are also working on enhancing session and history management for these chatbots. This feature is currently under development.
Each Docker container can serve as an independent LLM chatbot. This means that users can deploy multiple chatbots powered by a different model to handle different conversational needs. We are also enhancing the current implemented session and history management for these chatbots. This feature is currently under development.
<p align="center">
<a href="images/chatbot-working.mp4">
<img src="images/chatbot.PNG" alt="Watch the video" width="300"/>
</a>
</p>
### Benchmarking and Experimentation:
### Benchmarking and Experimentation
Containers are also ideal for benchmarking and experimentation. Since each model runs in its isolated environment, it’s easy to compare the performance of different models under identical conditions. This is particularly valuable in research and development settings where we can evaluate various models for accuracy, speed, and resource consumption.
**Note: The RAG-Pipeline on AI-Builder is one such example of this.**
### Creating Custom Workflows:
<p align="center">
<img src="images/rag-switch.PNG" alt="alt text" >
</p>
### Access to SharedFolder as a Workspace
The shared folder can be used as a storage location for chat history, embeddings, and other relevant data. Several workspaces can be created and accessed here.
<p align="center">
<img src="images/shared-folder.png" alt="alt text" >
</p>
<details><summary>Other Applications - To be Implemented</summary>
### Creating Custom Workflows
Users can design workflows by combining different LLM containers based on the task requirements. The drag-and-drop interface simplifies linking these models, making it accessible even to users with limited specialized expertise.
For instance, a workflow (pipeline), as shown below, may contain Multimodal data processing—where different types of data (text, images, and possibly audio or video) are used together. This scenario can occur in a medical context, where containerized LLMs handle different modalities to create a comprehensive patient report.
By leveraging different LLMs for text, image, and audio processing, this workflow can handle the complexity of multimodal data, ensuring that all relevant information is extracted and employed.
<p align="center">
<img src="images/sample-example.png" width="300"/>
</p>
### LLM-Comparator-UI-Node:
<p align="center">
<img src="images/LLM-Comparator-UI-Node.png" width="300"/>
</p>
Please refer to the following ticket for further updates - eclipse/graphene/tutorials#48
</details>
## References
This concept is developed based on the following two tickets. For details on ongoing changes and updates, please refer to these tickets:
1. Unified Protobuf Definition and Container Development for Various Large Language Models - eclipse/graphene/tutorials#35
2. Implement Dynamic Docker Container Generation for Various LLMs in the AI-Builder - eclipse/graphene/tutorials#42
3. To understand more about the Fraunhofer's LLM hosting platform, please refer to the following links, https://jira.iais.fraunhofer.de/wiki/pages/viewpage.action?spaceKey=LHI&title=GenAI+Gateway+Architecture
4. SDK - https://gitlab.cc-asp.fraunhofer.de/llm-hosting-iais/hosting-sdk.git
3. To understand more about the Fraunhofer IAIS's LLM hosting platform (GenAI Serving Platfor), please refer to the following links, https://jira.iais.fraunhofer.de/wiki/pages/viewpage.action?spaceKey=LHI&title=GenAI+Gateway+Architecture
4. GenAI Serving Platform SDK - https://gitlab.cc-asp.fraunhofer.de/llm-hosting-iais/hosting-sdk.git
## Considerate Use of LLM Containers
Please use the LLM containers responsibly, as they are resource-intensive and require considerable maintenance effort. Limit usage to necessary tasks to ensure sustainability and availability for all users. For more information on best practices and usage guidelines, refer to the hosting platform [] and SDK documentation []. Thank you for your cooperation.
Please use the LLM containers responsibly, as they are resource-intensive and require considerable maintenance effort. Limit usage to necessary tasks to ensure sustainability and availability for all users. For more information on best practices and usage guidelines, refer to the hosting platform [3] and SDK documentation [4]. Thank you for your cooperation.
If the container is not working as expected or you encounter any issues, please contact us for assistance.
\ No newline at end of file
llm_docker_generator/images/LLM-Comparator-UI-Node.png

37.6 KiB

This diff is collapsed.
llm_docker_generator/images/llms_import.png

56.6 KiB

llm_docker_generator/images/rag-switch.PNG

185 KiB

llm_docker_generator/images/shared-folder.png

333 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment