Unified Protobuf Definition and Container Development for Various Large Language Models
Overview:
The goal is to have a flexible and standardized way of defining and representing connectors for different language models in the Eclipse Graphene platform. Each LLM can leverage the independence of choosing their respective framework to be deployed, Python library, other config parameters, etc. These LLM definitions/methods will then be available to be chosen from a single interface (docker container). Users can seamlessly select the specific LLM from here to cater to their application requirements.
LLMs
Typically, the purpose envisions having, for instance, four such LLMs.
- OpenAI [https://openai.com/]
- Mistral [https://huggingface.co/mistralai]
- Lllama 2 [https://huggingface.co/blog/llama2]
- Opengpt-x [https://opengpt-x.de/en/]
Sample Protobuf Definition
A sample representation of the unified protobuf file is as follows, Please note that the messages in the protobuf are incomplete; they are only examples now.
Explanation: The user would create different pipelines to use different LLMs. The LLM containers all expose the unified LLM. The core functionality lives in the DocGenerator. The DocGenerator would then call instruct_llm() several times to generate various contents in the document, such as UML generation, information from Wikipedia, and other related information.
syntax = "proto3";
message PromptInput {
string system = 0;
string user = 1;
string context = 2;
}
message LLMConfig {
double temperature = 0;
int maxlength = 1;
}
message LLMQuery {
LLMConfig config = 0;
PromptInput input = 1;
}
message LLMAnswer {
string text = 0;
}
service LLMService {
rpc instruct_llm(LLMQuery) returns(LLMAnswer);
rpc instruct_llm_stream (stream LLMQuery) returns (stream LLMAnswer);
}

Streaming in gRPC 1:
rpc instruct_llm_stream (stream LLMQuery) returns (stream LLMAnswer);
As another required alternative, we could leverage streaming in gRPC. The client (DocGenerator) establishes a streaming connection to the server. It can then continuously send LLMQuery messages containing user questions over the streaming connection. The server (LLM) continually receives user questions from the streaming connection.
The server processes each question using the instruct_llm method and returns the corresponding answers. This allows for a dynamic and interactive session where the server can respond instantly to a user's questions. The streaming nature of the RPC method enables a continuous exchange of questions and answers, making it suitable for use cases like real-time Q&A, user feedback loops, or any scenario where ongoing interaction is required.
Tasks
- Define and develop a protobuf message structure by encapsulating the core details of each LLM.
- Create a standard Python interface using the above top-level protobuf message.
- Test the implementation for a simple application that allows users to choose the type of LLM on the fly.
- Containerize the implementation and onboard it to the Graphene platform.
- For LLMs that necessitate GPU access, it is critical to investigate and implement the necessary configurations to ensure compatibility with GPU resources.
Future Development:
- Refinement/Improvement of Protobuf Definitions: Complete the protobuf definitions to include all specifications for various LLMs.
- Expansion of LLMs: Support myriad LLMs with unique capabilities and configurations.