Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • eclipse/graphene/tutorials
  • swetha1095/tutorials
  • spanneers/tutorials
  • felixbmuller/tutorials
  • danialfraun/tutorials
  • farhad/tutorials
6 results
Show changes
Commits on Source (153)
Showing
with 12500 additions and 49 deletions
# AI4EU EXPERIMENTS # Graphene Container Format
## AI4EU Experiments Container Format
## Table of Contents ## Table of Contents
...@@ -19,7 +17,7 @@ ...@@ -19,7 +17,7 @@
### [1. Introduction](#1-introduction) ### [1. Introduction](#1-introduction)
This document specifies the docker container format for tools and models that can be onboarded on the AI4EU Experiments platform so they can be used in the visual composition editor as re-usable, highly interoperable building blocks for AI pipelines. This document specifies the docker container format for tools and models that can be onboarded on Graphene instances so they can be used in the visual composition editor as re-usable, highly interoperable building blocks for AI pipelines.
![image](src/images/intro_pipeline.PNG) ![image](src/images/intro_pipeline.PNG)
...@@ -98,17 +96,6 @@ https://docs.acumos.org/en/clio/submodules/license-manager/docs/user-guide-licen ...@@ -98,17 +96,6 @@ https://docs.acumos.org/en/clio/submodules/license-manager/docs/user-guide-licen
There are several detailed tutorials on how to create a dockerized model in this There are several detailed tutorials on how to create a dockerized model in this
repository: https://github.com/ai4eu/tutorials repository: https://github.com/ai4eu/tutorials
**Important recommendation:** For security reasons, the application in the container
should **not** run as root (which is the default). Instead an unpriviledged user should be
created that runs the application, here is an example snippet from a Dockerfile:
RUN useradd app
USER app
CMD ["java", "-jar", "/app.jar"]
This will also allow the docker container to be converted into a Singularity container
for HPC deployment.
#### [4. Status and Error Codes](#4-status-and-error-codes) #### [4. Status and Error Codes](#4-status-and-error-codes)
The models should use gRPC status codes according to the spec: The models should use gRPC status codes according to the spec:
...@@ -119,7 +106,7 @@ For example if no more data is available, the model should return status ...@@ -119,7 +106,7 @@ For example if no more data is available, the model should return status
### [5. Onboarding](#5-onboarding) ### [5. Onboarding](#5-onboarding)
The final step is to onboard the model. There are several was to onboard a model into The final step is to onboard the model. There are several ways to onboard a model into
AI4EU Experiments but currently the only recommended way is to use **“On-boarding AI4EU Experiments but currently the only recommended way is to use **“On-boarding
dockerized model URI”:** dockerized model URI”:**
...@@ -135,6 +122,8 @@ dockerized model URI”:** ...@@ -135,6 +122,8 @@ dockerized model URI”:**
Generally speaking, the orchestrator dispatches the output of the previous node to the following node. A special case is the first node, where obviously no output from the previous node exists. In order to be able to implement a general orchestrator, the first node must define its services with an Empty message type. Typically this concerns nodes of type Databroker as the usual starting point of a pipeline. Generally speaking, the orchestrator dispatches the output of the previous node to the following node. A special case is the first node, where obviously no output from the previous node exists. In order to be able to implement a general orchestrator, the first node must define its services with an Empty message type. Typically this concerns nodes of type Databroker as the usual starting point of a pipeline.
**Important:** If there is more than one node with an empty message at the beginning of the pipeline, ALL these nodes will be called by the orchestrator.
```proto ```proto
syntax = "proto3"; syntax = "proto3";
...@@ -180,8 +169,8 @@ A sample representation of the output in the metadata file is as shown below, ...@@ -180,8 +169,8 @@ A sample representation of the output in the metadata file is as shown below,
], ],
"checksum": "docker-pullable://cicd.ai4eu-dev.eu:7444/training_pipeline/news_databroker@sha256:0ff4184389e7768cbf7bb8c989e91aa17e51b5cb39846a700868320386131dee", "checksum": "docker-pullable://cicd.ai4eu-dev.eu:7444/training_pipeline/news_databroker@sha256:0ff4184389e7768cbf7bb8c989e91aa17e51b5cb39846a700868320386131dee",
"dataset_features": { "dataset_features": {
"type": "aiod-dataset/v1(TensorFlow Dataset(tfds)", "type": "aiod-dataset/v1",
"datasetname": "The Reuters Dataset", "datasetname": "The Reuters Dataset, TensorFlow Dataset(tfds)",
"description": "http://kdd.ics.uci.edu/databases/reuters21578/README.txt", "description": "http://kdd.ics.uci.edu/databases/reuters21578/README.txt",
"size": "4MB", "size": "4MB",
"DOI_ID": "Not available" "DOI_ID": "Not available"
...@@ -194,7 +183,7 @@ In this way, expanding the container specification for the databroker's data inp ...@@ -194,7 +183,7 @@ In this way, expanding the container specification for the databroker's data inp
Note : Note :
1. Please be aware that the logging should be enabled in the python script (`logging.basicConfig(level=logging.INFO`)). 1. Please be aware that the logging should be enabled in the python script (`logging.basicConfig(level=logging.INFO`)). For instance, we need to enable logging in the script containing the function/method that reads the dataset metadata.
1. Please refer the [10. Metrics Aggregation](#8-metrics-aggregation) section to understand the changes implemented in the playground-app. 1. Please refer the [10. Metrics Aggregation](#8-metrics-aggregation) section to understand the changes implemented in the playground-app.
...@@ -334,7 +323,7 @@ The model provider should ensure the following additions in order to accomplish ...@@ -334,7 +323,7 @@ The model provider should ensure the following additions in order to accomplish
1. Have the metrics collected and updated after the training process. 1. Have the metrics collected and updated after the training process.
For this purpose, a gRPC routine/method, `get_metrics_metadata(self, request, context)`, is subsequently called after the training process concludes. For this purpose, a gRPC routine/method, `get_metrics_metadata(self, request, context)`, is subsequently called after the training process concludes.
1. The Python logging should be enabled in order to see all the logs that are obtained. 1. The Python logging should be enabled in the script that contains metrics aggregation in order to see all the logs that are obtained.
Please refer to the Additional Information section to further comprehend the topics mentioned above. Please refer to the Additional Information section to further comprehend the topics mentioned above.
...@@ -344,7 +333,7 @@ When the Pipeline is started, the initial logs indicated in 2. are captured, all ...@@ -344,7 +333,7 @@ When the Pipeline is started, the initial logs indicated in 2. are captured, all
A sample representation of the metrics logs for the news_training pipeline is as follows, A sample representation of the metrics logs for the news_training pipeline is as follows,
`INFO:root:{'metrics': {'date_time': '2023-09-15 08:17:22', 'type': 'classificationMetrics/v1', 'accuracy': 0.9247897267341614, 'validation_loss': 0.9067514538764954, 'status_text': 'success'}}` `INFO:root:{'metrics': {'date_time': '2023-09-15 08:17:22', 'type': 'classification-metrics/v1', 'more-is-better': {'accuracy': 0.9247897267341614}, 'less-is-better': {'validation_loss': 0.9067514538764954}, 'status_text': 'success'}}`
A sample representation of the output in the metadata file is as shown below, A sample representation of the output in the metadata file is as shown below,
...@@ -369,9 +358,13 @@ A sample representation of the output in the metadata file is as shown below, ...@@ -369,9 +358,13 @@ A sample representation of the output in the metadata file is as shown below,
"checksum": "docker-pullable://cicd.ai4eu-dev.eu:7444/training_pipeline/news_classifier@sha256:3b4c88e571abbb0e536d8048ffb82ae4126d52dafa1e03eb13b6d8c22bf3e859", "checksum": "docker-pullable://cicd.ai4eu-dev.eu:7444/training_pipeline/news_classifier@sha256:3b4c88e571abbb0e536d8048ffb82ae4126d52dafa1e03eb13b6d8c22bf3e859",
"metrics": { "metrics": {
"date_time": "2023-09-12 06:00:09", "date_time": "2023-09-12 06:00:09",
"type": "classificationMetrics/v1", "type": "classification-metrics/v1",
"accuracy": 0.9239237904548645, "more_is_better": {
"validation_loss": 0.9049152731895447, "accuracy": 0.9239237904548645
},
"less_is_better": {
"validation_loss": 0.9049152731895447
}
"status_text": "success" "status_text": "success"
} }
} }
...@@ -416,15 +409,19 @@ A sample: ...@@ -416,15 +409,19 @@ A sample:
```json ```json
"metrics": { "metrics": {
"date_time": "2023-09-15 08:17:22", "date_time": "2023-09-15 08:17:22",
"type": "classificationMetrics/v1", "type": "classification-metrics/v1",
"accuracy": 0.9247897267341614, "more_is_better": {
"validation_loss": 0.9067514538764954, "accuracy": 0.9247897267341614
},
"less_is_better": {
"validation_loss": 0.9067514538764954
}
"status_text": "success" "status_text": "success"
} }
``` ```
In conclusion, Accuracy can be used as a guiding metric during model development and hyperparameter tuning. validation_loss is typically used during the training process to monitor whether the model is improving or overfitting. A decreasing validation loss indicates that the model is learning and improving its ability to make predictions. However, if the validation loss starts increasing, it may be a sign of overfitting. In conclusion, accuracy can be used as a guiding metric during model development and hyperparameter tuning. validation_loss is typically used during the training process to monitor whether the model is improving or overfitting. A decreasing validation loss indicates that the model is learning and improving its ability to make predictions. However, if the validation loss starts increasing, it may be a sign of overfitting.
Regressor model: Regressor model:
...@@ -469,15 +466,21 @@ A sample: ...@@ -469,15 +466,21 @@ A sample:
```json ```json
"metrics": { "metrics": {
"date_time": "2023-09-28 13:56:21", "date_time": "2023-09-28 13:56:21",
"mse": 0.0025680503998073007, "type": "regression-metrics/v1",
"rmse": 0.05067593511527242, "less_is_better": {
"r_squared": 0.831178119365146, "mse": 0.0025680503998073007,
"adjusted_r_squared": 0.8270170166734419, "rmse": 0.05067593511527242
},
"more_is_better": {
"r_squared": 0.831178119365146,
"adjusted_r_squared": 0.8270170166734419
}
"status_text": "success" "status_text": "success"
} }
``` ```
In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies the goodness of fit, and Adjusted R2 adjusts R2 for the complexity of the model. These metrics are crucial for evaluating regression models and selecting the most appropriate model for a given dataset. In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies the goodness of fit, and Adjusted R2 adjusts R2 for the complexity of the model. These metrics are crucial for evaluating regression models and selecting <br /><br />
Note : Another field in the metadata is 'type,' which represents the dataset and model's metrics type. This field is critical not only for model evaluation but also for the model's performance to the precise goals and requirements of the application. Adding this feature allows one to group the metrics into a primary group or sub-group of metrics. Determining the dataset type aids in discerning the source of the dataset.
**Dataset Features:** **Dataset Features:**
...@@ -491,23 +494,28 @@ In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies t ...@@ -491,23 +494,28 @@ In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies t
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">1</td> <td align="left">1</td>
<td align="left">Type</td>
<td align="left">Specifies the type or category information that the dataset belongs to.</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Dataset name</td> <td align="left">Dataset name</td>
<td align="left">This is the name or label given to a specific dataset, often indicating its content, source, or purpose.</td> <td align="left">This is the name or label given to a specific dataset, often indicating its content, source, or purpose.</td>
</tr> </tr>
<tr> <tr>
<td align="left">2</td> <td align="left">3</td>
<td align="left">Description</td> <td align="left">Description</td>
<td align="left">A brief or detailed explanation of what the dataset contains, its origin, format, and any other relevant information for potential users.</td> <td align="left">A brief or detailed explanation of what the dataset contains, its origin, format, and any other relevant information for potential users.</td>
</tr> </tr>
<tr> <tr>
<td align="left">3</td> <td align="left">4</td>
<td align="left">Size</td> <td align="left">Size</td>
<td align="left">The size of the dataset, typically measured in terms of the number of records, rows, columns, or the total file size in bytes or other appropriate units.</td> <td align="left">The size of the dataset, typically measured in terms of the number of records, rows, columns, or the total file size in bytes or other appropriate units.</td>
</tr> </tr>
<tr> <tr>
<td align="left">4</td> <td align="left">5</td>
<td align="left">DOI (Digital Object Identifier) or ID</td> <td align="left">DOI (Digital Object Identifier) or ID</td>
<td align="left">A unique and persistent identifier assigned to the dataset, often used for citation and reference purposes, ensuring its accessibility and traceability.</td> <td align="left">A unique and persistent identifier assigned to the dataset, often used for citation and reference purposes, ensuring its accessibility and traceability.</td>
</tr> </tr>
...@@ -521,6 +529,7 @@ In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies t ...@@ -521,6 +529,7 @@ In conclusion, RMSE and MSE measure the accuracy of predictions, R2 quantifies t
A sample: A sample:
```json ```json
"dataset_features": { "dataset_features": {
"type": "aiod-dataset/v1(Kaggle dataset)",
"datasetname": "House Prices dataset", "datasetname": "House Prices dataset",
"description": "https://www.kaggle.com/datasets/lespin/house-prices-dataset", "description": "https://www.kaggle.com/datasets/lespin/house-prices-dataset",
"size": "204 kB", "size": "204 kB",
...@@ -533,10 +542,10 @@ In conclusion, dataset features helps to get an overview about the dataset that ...@@ -533,10 +542,10 @@ In conclusion, dataset features helps to get an overview about the dataset that
<details> <details>
<summary>Additional Information - Type of metrics</summary> <summary>Additional Information - Future scope for the 'type' of metrics</summary>
Note : Another field in the metadata is 'type,' which represents the model's metrics type. This field is critical not only for model evaluation but also for the model's performance to the precise goals and requirements of the application. Adding this feature allows one to group the metrics into a primary group or sub-group of metrics indicating the training and testing phases. Further, we can categorize the type of metrics into training and testing phases. This can help us assess the model performance by detecting the ML model's over/underfitting, drift, and generalization parameters. We can thus make informed preferences about the model selection, hyperparameter tuning decisions, and monitor model performance in our intended applications. The categorization of the 'type' parameter into training and testing phases can be seamlessly introduced as a future extension to the metrics metadata for comprehensive metrics tracking.<br /><br />
A sample extension of the metadata for the metrics into training and testing phases using the "type" parameter is shown below. <br />
A sample further extension of the metadata is as shown below, <br />
```json ```json
{ {
...@@ -544,16 +553,24 @@ A sample further extension of the metadata is as shown below, ...@@ -544,16 +553,24 @@ A sample further extension of the metadata is as shown below,
{ {
"type": "classification-training-metrics/v1", "type": "classification-training-metrics/v1",
"date_time": "2023-09-07 07:28:51", "date_time": "2023-09-07 07:28:51",
"accuracy": 0.9216, "more_is_better": {
"validation_loss": 0.9244, "training_accuracy": 0.9257,
"validation_accuracy": 0.8051
},
"less_is_better": {
"training_loss": 0.3606,
"validation_loss": 0.8978
},
"status_text": "success" "status_text": "success"
}, },
{ {
"type": "classification-testing-metrics/v1", "type": "classification-testing-metrics/v1",
"date_time": "2023-09-07 07:28:51", "date_time": "2023-09-07 07:28:51",
"F1 Score": 0.85, "more_is_better": {
"Specificity": 0.90, "F1 Score": 0.8512,
"ROC-AUC": 0.92, "Specificity": 0.9036,
"ROC-AUC": 0.9205
}
"status_text": "success" "status_text": "success"
} }
] ]
...@@ -565,7 +582,7 @@ A sample further extension of the metadata is as shown below, ...@@ -565,7 +582,7 @@ A sample further extension of the metadata is as shown below,
<details> <details>
<summary>Additional Information </summary> <summary>Additional Information </summary>
Please refer the news-training pipeline tutorial to understand the metrics aggregation. Please refer to the news-training pipeline/house price prediction tutorials to understand the metrics aggregation.
##### **Changes in tutorials - news_training** ##### **Changes in tutorials - news_training**
...@@ -606,13 +623,15 @@ service NewsClassifier { ...@@ -606,13 +623,15 @@ service NewsClassifier {
} }
``` ```
Please adapt the above changes to the respective model to achieve the metrics aggregation. 3. Please add the function definition (`get_metrics_metadata(...)`) in the respective script that collects the final metrics. Here in the news-training pipeline, we collect the metrics after the training process, and the response is further sent to the method (`get_metrics_metadata(...)`) to display the metrics in a log format.
Please adapt all the above changes to the respective nodes to achieve the metrics aggregation in the ML model or tutorial example.
##### **Changes in playground-app - Already available** ##### **Changes in playground-app - Already available**
Changes in Pipeline.py, NodeManager.py and ExecutionRun.py Changes in Pipeline.py, NodeManager.py and ExecutionRun.py
Please refer the issues eclipse/graphene/playground-app#34, eclipse/graphene/tutorials#19 for a understanding the implementation. Please refer to the issues eclipse/graphene/playground-app#34, eclipse/graphene/tutorials#19 to understand the implementation.
> Pipeline.py\ > Pipeline.py\
> `_get_starting_nodes(path)` > `_get_starting_nodes(path)`
......
# Table of Contents
- [Overview](#overview)
- [Steps in Client-Server Communication](#steps-in-client-server-communication)
- [Insights into UML Diagram](#insights-into-uml-diagram)
- [Project Dependencies](#project-dependencies)
- [Docker Commands](#docker-commands)
- [Generic Information](#generic-information)
## Overview
The provided code defines a Protocol Buffers (protobuf) schema for a machine learning model service called "FlairModel." It includes messages for text input, embeddings, and training configuration, specifying parameters such as data filenames, epochs, learning rate, batch size, and model filename. The service offers two RPC methods: `startTraining`, which initiates training with a given configuration and returns the training status, and `extract_entities_from_text`, which processes text to extract entities.The provided code defines a Protocol Buffers (proto3) schema for a training configuration service. It includes an empty message, an embedding message containing a string, and a training configuration message that specifies various parameters such as file names for training, validation, and test data, a list of embeddings, epochs, learning rate, batch size, and model filename. The service, named `TrainingConfiguration`, has a single RPC method `startTraining` that takes an empty message and returns a `TrainingConfig` message.The provided code defines a Protocol Buffers (proto3) schema for a service called "Tensorboard." It includes an empty message type and a message type "LoggingStatus" that contains a single integer field for status. The service has one RPC method, "logging," which takes an "Empty" message as input and returns a "LoggingStatus" message.
## Steps in Client-Server Communication
- **Initialization**: The client (data broker) initializes the communication process by establishing a connection to the server hosting the Protocol Buffers services.
- **Service Request**: The client sends a request to the desired service (e.g., "FlairModel," "TrainingConfiguration," or "Tensorboard") using the defined RPC methods.
- **Message Preparation**: The client prepares the appropriate message format as defined in the Protocol Buffers schema, including necessary parameters (e.g., text input, training configuration).
- **Data Transmission**: The client transmits the serialized message over the network to the server.
- **Server Reception**: The server receives the incoming request and deserializes the message to interpret the client's request.
- **Processing**: The server processes the request according to the specified service method (e.g., starting training, extracting entities, logging status).
- **Response Generation**: After processing, the server generates a response message, which may include results, status updates, or confirmation of actions taken.
- **Response Transmission**: The server serializes the response message and sends it back to the client.
- **Client Reception**: The client receives the response from the server and deserializes it to access the information.
- **Finalization**: The client processes the response (e.g., updating the UI, logging results) and may initiate further requests or terminate the communication as needed.
## Insights into UML Diagram
The UML diagram contains several classes including `NerInputForm`, `FlairModelServicer`, `FlairModelStub`, `FlairModel`, `TrainingConfiguration`, `TrainingConfigurationStub`, `TrainingConfigurationServicer`, `MultiCheckboxField`, `SelectMultipleField`, `TrainerInputForm`, `TensorboardStub`, `TensorboardServicer`, and `Tensorboard`. The relationships include dashed associations indicating inheritance or extensions from `FlaskForm` for `NerInputForm` and `TrainerInputForm`, and a similar relationship between `SelectMultipleField` and `MultiCheckboxField`. There are also dashed associations from `object` to various classes, suggesting a service or stub pattern. Notably, the design patterns observed include the Service Layer pattern and the Form Object pattern.
## Project Dependencies
Here is a detailed explanation and comprehensive list of all project dependencies and libraries, including their versions, purposes, and any relevant configurations:
1. **torch==1.7.1**
- **Purpose**: A popular deep learning framework used for building and training neural networks.
2. **transformers==3.5.1**
- **Purpose**: A library for natural language processing (NLP) that provides pre-trained models and tools for working with transformer architectures.
3. **protobuf==3.16.0**
- **Purpose**: A language-neutral, platform-neutral extensible mechanism for serializing structured data, often used in gRPC.
4. **flair==0.7**
- **Purpose**: A simple NLP library that allows you to apply state-of-the-art natural language processing (NLP) models.
5. **packaging~=21.3**
- **Purpose**: A library for dealing with Python package versions and dependencies.
6. **grpcio==1.38.0**
- **Purpose**: A high-performance RPC framework that can run in any environment, used for communication between services.
7. **grpcio-tools==1.38.0**
- **Purpose**: Provides tools for generating gRPC client and server code from .proto files.
8. **googleapis-common-protos==1.53.0**
- **Purpose**: Common protocol buffers used by Google APIs.
9. **Bootstrap-Flask==1.5.2**
- **Purpose**: A Flask extension that integrates Bootstrap for building web applications.
10. **Flask==1.1.2**
- **Purpose**: A lightweight WSGI web application framework in Python.
11. **Flask-SQLAlchemy==2.5.1**
- **Purpose**: An extension for Flask that adds support for SQLAlchemy, a SQL toolkit and Object-Relational Mapping (ORM) system.
12. **Flask-WTF==0.14.3**
- **Purpose**: An extension that integrates Flask with WTForms, providing form handling capabilities.
13. **google==3.0.0**
- **Purpose**: A library for accessing Google APIs.
14. **WTForms==3.0.1**
- **Purpose**: A flexible forms validation and rendering library for Python.
15. **Jinja2==2.11.3**
- **Purpose**: A templating engine for Python, used by Flask for rendering templates.
16. **markupsafe==2.0.1**
- **Purpose**: A library for safe string handling, used by Jinja2.
17. **itsdangerous==2.0.1**
- **Purpose**: A library for securely signing data, used by Flask for session management.
18. **werkzeug==2.0.3**
- **Purpose**: A comprehensive WSGI web application library that Flask is built on.
19. **tensorboard==2.11.0**
- **Purpose**: A visualization tool for TensorFlow, useful for tracking and visualizing metrics during model training.
This list includes the dependencies mentioned in the original question, along with their purposes. If there are any additional dependencies in the `requirements.txt` files that I could not access, they would need to be reviewed directly from those files for a complete overview.
## Docker Commands
To build and run the Docker image for the Entity Recognition project, you can use the following commands:
1. Build the Docker image:
```bash
docker build -t entity_recognition_image /data/shared/Entity_Recognition
```
2. Run the Docker container:
```bash
docker run -d -p 8062:8062 -e SHARED_FOLDER_PATH=/path/to/logdir entity_recognition_image
```
Make sure to replace `/path/to/logdir` with the actual path to your log directory.
## Generic Information
Page: GRPC
Summary: gRPC (gRPC Remote Procedure Calls) is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing (mobile, web, and Internet of Things). gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.
gRPC's use of HTTP/2 is considered complex. It makes it impossible to implement a gRPC client in the browser, instead requiring a proxy.Page: Protocol Buffers
Summary: Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.Page: Docker (software)
Summary: Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.
Docker is a tool that is used to automate the deployment of applications in lightweight containers so that applications can work efficiently in different environments in isolation.
![alt text](UML_Entity_Recognition.png)
\ No newline at end of file
# Table of Contents
- [Overview](#overview)
- [Steps in Client-Server Communication](#steps-in-client-server-communication)
- [Insights into UML Diagram](#insights-into-uml-diagram)
- [Project Dependencies](#project-dependencies)
- [Docker Commands](#docker-commands)
- [Generic Information](#generic-information)
## Overview
The provided code defines a Protocol Buffers (proto3) schema for a service called "Databroker." It includes two message types: "Features," which holds various property attributes (e.g., MSSubClass, LotArea, YearBuilt), and "DatasetFeatures," which contains metadata about datasets (e.g., type, name, description, size, DOI_ID). The service offers two remote procedure calls (RPCs): "hppdatabroker," which returns property features, and "get_dataset_metadata," which returns dataset metadata.The provided code defines a Protocol Buffers (proto3) schema for a machine learning service that predicts house sale prices. It includes a `Features` message to capture various property attributes (e.g., MSSubClass, LotArea, YearBuilt) and a `Prediction` message for the predicted sale price. Additionally, a `TrainingStatus` message is defined, and the `Predict` service includes three RPC methods: `predict_sale_price` for price prediction, `regressor_metrics` for retrieving model metrics, and `get_metrics_metadata` for accessing metadata related to the training status.
## Steps in Client-Server Communication
### Communication Process Summary for the "Databroker" Protocol Buffers Service
- **Initial Node: Data Broker**
- Acts as the central communication hub for client-server interactions.
- Receives requests from clients and routes them to the appropriate service methods.
- **Client Interaction**
- The client initiates communication by sending a request to the Data Broker.
- Requests can be for property features or dataset metadata, depending on the RPC method invoked.
- **RPC Methods Overview**
- **hppdatabroker**
- Purpose: To retrieve property features.
- Client sends a request containing property identifiers or criteria.
- Data Broker processes the request and forwards it to the relevant service handling property features.
- **get_dataset_metadata**
- Purpose: To obtain metadata about datasets.
- Client sends a request specifying the dataset of interest.
- Data Broker forwards the request to the dataset metadata service.
- **Machine Learning Service Interaction**
- The Data Broker also facilitates communication with the machine learning service for predicting house sale prices.
- **Predict Service RPC Methods**
- **predict_sale_price**
- Client sends a request with property details encapsulated in the "Features" message.
- Data Broker forwards the request to the prediction service.
- The prediction service processes the input and returns a "Prediction" message with the predicted sale price.
- **regressor_metrics**
- Client requests performance metrics of the regression model.
- Data Broker forwards this request to the metrics service.
- The metrics service responds with relevant metrics data.
- **get_metrics_metadata**
- Client requests metadata about the metrics available.
- Data Broker forwards this request to the metrics metadata service.
- The service returns metadata information to the client.
- **Response Handling**
- After processing the requests, the respective services send responses back to the Data Broker.
- The Data Broker consolidates the responses and sends them back to the client.
- **Error Handling**
- If any errors occur during the request processing, the Data Broker captures these errors and communicates them back to the client in a structured format.
- **Conclusion**
- The communication process involves a clear flow of requests and responses between the client, Data Broker, and various services, ensuring efficient data retrieval and processing for property features and machine learning predictions.
## Insights into UML Diagram
The UML diagram contains several classes related to house price prediction, including `HppInputForm`, `DatabrokerStub`, `DatabrokerServicer`, `PredictStub`, and `PredictServicer`. `HppInputForm` extends `FlaskForm`, indicating a specialized form for input handling. There are dependencies between service classes and their respective stubs, suggesting the use of the Stub Pattern for testing and the Service Layer Pattern for encapsulating business logic.
## Project Dependencies
Here is a detailed explanation and comprehensive list of all project dependencies and libraries, including their versions, purposes, and any relevant configurations:
1. **Bootstrap-Flask==1.5.2**
- **Purpose**: Integrates Bootstrap with Flask to facilitate responsive web design.
2. **Flask==1.1.2**
- **Purpose**: A lightweight WSGI web application framework for Python.
3. **Flask-SQLAlchemy==2.5.1**
- **Purpose**: Adds SQLAlchemy support to Flask applications, simplifying database interactions.
4. **Flask-WTF==0.14.3**
- **Purpose**: Integrates WTForms with Flask, providing form handling capabilities.
5. **google==3.0.0**
- **Purpose**: A library for accessing Google APIs.
6. **googleapis-common-protos==1.53.0**
- **Purpose**: Common protocol buffers for Google APIs.
7. **grpcio==1.38.0**
- **Purpose**: A high-performance RPC framework that can run in any environment.
8. **grpcio-tools==1.38.0**
- **Purpose**: Provides tools for generating gRPC code from protocol buffer definitions.
9. **Jinja2==2.11.3**
- **Purpose**: A templating engine for Python, used by Flask for rendering templates.
10. **pandas==1.1.5**
- **Purpose**: A data manipulation and analysis library for Python, providing data structures like DataFrames.
11. **protobuf==3.16.0**
- **Purpose**: A library for serializing structured data, used with gRPC.
12. **PyYAML==5.4.1**
- **Purpose**: A YAML parser and emitter for Python, useful for configuration files.
13. **requests==2.25.1**
- **Purpose**: A simple HTTP library for Python, used for making API calls.
14. **scikit-learn==0.24.2**
- **Purpose**: A machine learning library for Python, providing tools for data mining and data analysis.
15. **sklearn==0.0**
- **Purpose**: A placeholder for scikit-learn, typically used for compatibility.
16. **SQLAlchemy==1.4.7**
- **Purpose**: A SQL toolkit and Object-Relational Mapping (ORM) system for Python.
17. **threadpoolctl==2.2.0**
- **Purpose**: A library for controlling the number of threads used by native libraries.
18. **urllib3==1.26.5**
- **Purpose**: A powerful HTTP library for Python, used for making requests.
19. **Werkzeug==1.0.1**
- **Purpose**: A comprehensive WSGI web application library, used by Flask.
20. **WTForms==2.3.3**
- **Purpose**: A flexible forms validation and rendering library for Python.
This list includes all the dependencies mentioned in the question, along with their purposes. The configurations for these libraries would typically be found in the respective `requirements.txt` files or in the application code itself, but since those files could not be accessed, this summary is based solely on the provided information.
## Docker Commands
To build and run the Docker image for the House Price Prediction application, you can use the following commands:
1. **Build the Docker image:**
```bash
docker build -t house_price_prediction /data/shared/House_Price_Prediction
```
2. **Run the Docker container:**
```bash
docker run -p 8061:8061 -p 8062:8062 house_price_prediction
```
These commands will build the image using the Dockerfile in the specified directory and run the container while mapping the necessary ports.
## Generic Information
Page: GRPC
Summary: gRPC (gRPC Remote Procedure Calls) is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing (mobile, web, and Internet of Things). gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.
gRPC's use of HTTP/2 is considered complex. It makes it impossible to implement a gRPC client in the browser, instead requiring a proxy.Page: Protocol Buffers
Summary: Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.Page: Docker (software)
Summary: Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.
Docker is a tool that is used to automate the deployment of applications in lightweight containers so that applications can work efficiently in different environments in isolation.
![alt text](UML_House_Price_Prediction-1.png)
\ No newline at end of file
# Table of Contents
- [Overview](#overview)
- [Steps in Client-Server Communication](#steps-in-client-server-communication)
- [Insights into UML Diagram](#insights-into-uml-diagram)
- [Project Dependencies](#project-dependencies)
- [Docker Commands](#docker-commands)
- [Generic Information](#generic-information)
## Overview
The provided code defines a Protocol Buffers (protobuf) schema for a news classification system. It includes messages for configuring training (`TrainingConfig`), reporting training status (`TrainingStatus`), and handling news text and categories (`NewsText`, `NewsCategory`). The `TrainingConfig` message specifies parameters like data filenames, epochs, batch size, and model filename. The `TrainingStatus` message indicates the type of status, with sub-messages for metrics that are better when lower (`Lessisbetter`) or higher (`Moreisbetter`). The `NewsClassifier` service offers three RPC methods: `startTraining` to initiate training, `classify` to categorize news text, and `get_metrics_metadata` to retrieve training metrics.The provided code defines a Protocol Buffers (protobuf) schema using syntax version 3. It includes two message types: `Empty`, which has no fields, and `NewsText`, which contains a single string field for text. Additionally, it defines a `DatasetFeatures` message with fields for type, dataset name, description, size, and DOI ID. The `NewsDatabroker` service includes two RPC methods: `get_next`, which returns a `NewsText` message, and `get_dataset_metadata`, which returns a `DatasetFeatures` message, both taking an `Empty` message as input.The provided code defines a Protocol Buffers (proto3) schema for a service called "Tensorboard." It includes an empty message type called "Empty" and a message type "LoggingStatus" that contains a single integer field "status." The "Tensorboard" service has a remote procedure call (RPC) named "loggging" that takes an "Empty" message as input and returns a "LoggingStatus" message.The provided code defines a Protocol Buffers (protobuf) schema using syntax version 3. It includes an empty message type called `Empty` and a message type `TrainingConfig` that specifies parameters for training a model, such as filenames for training data and labels, number of epochs, batch size, validation ratio, and model filename. Additionally, it defines a service `NewsTrainer` with a remote procedure call (RPC) method `startTraining`, which takes an `Empty` message as input and returns a `TrainingConfig` message.
## Steps in Client-Server Communication
- **Initialization**: The client (e.g., a news classification application) initializes the communication process by establishing a connection to the `NewsDatabroker` service.
- **Data Request**: The client sends a request to the `NewsDatabroker` service to fetch news text and dataset metadata using the appropriate RPC method.
- **Data Retrieval**: The `NewsDatabroker` service processes the request, retrieves the necessary news text and metadata from its database, and sends the data back to the client.
- **Training Configuration**: The client prepares a `TrainingConfig` message with the desired parameters for training the news classification model.
- **Start Training**: The client sends the `TrainingConfig` to the `NewsClassifier` service, invoking the RPC method to start the training process.
- **Training Status Monitoring**: The client periodically requests the `TrainingStatus` from the `NewsClassifier` service to monitor the progress of the training.
- **Classification Request**: Once training is complete, the client sends a `NewsText` message to the `NewsClassifier` service to classify new news articles.
- **Classification Response**: The `NewsClassifier` service processes the classification request and returns the classification results to the client.
- **Metrics Retrieval**: The client can request performance metrics from the `NewsClassifier` service to evaluate the model's effectiveness.
- **Logging**: Throughout the process, the client may log relevant information using the `Tensorboard` service for visualization and analysis of training and classification performance.
- **Completion**: The communication process concludes when the client has successfully classified news articles and retrieved all necessary metrics, or when the client decides to terminate the session.
## Insights into UML Diagram
The UML diagram contains several classes including `ClassifierInputForm`, `NewsClassifier`, `NewsDatabroker`, `Tensorboard`, and `NewsTrainer`, among others. The relationships include associations with `FlaskForm` for input forms and dependencies on `object` for various stubs and servicers. There are no explicit inheritance relationships. The presence of `Stub` and `Servicer` classes suggests the use of the Proxy design pattern.
## Project Dependencies
The project dependencies and libraries, along with their versions and purposes, are as follows:
1. **grpcio==1.38.0**: A high-performance, open-source universal RPC framework that is used for communication between services.
2. **grpcio-tools==1.38.0**: Tools for generating gRPC client and server code from protocol buffer definitions.
3. **grpc-interceptor**: A library for intercepting gRPC calls, allowing for additional processing such as logging or authentication.
4. **multithreading**: A module that allows for concurrent execution of code, useful for improving performance in I/O-bound applications.
5. **protobuf==3.16.0**: A language-agnostic binary serialization format used for defining data structures and services in gRPC.
6. **numpy**: A library for numerical computing in Python, providing support for arrays and matrices, along with a collection of mathematical functions.
7. **tensorflow**: An open-source machine learning framework used for building and training machine learning models.
8. **keras2onnx**: A library for converting Keras models to the ONNX (Open Neural Network Exchange) format.
9. **tf2onnx**: A tool for converting TensorFlow models to the ONNX format.
Flask-related dependencies:
10. **Bootstrap-Flask==1.8.0**: A Flask extension that integrates Bootstrap with Flask applications.
11. **Flask==2.0.2**: A lightweight WSGI web application framework for Python.
12. **idna==3.3**: A library for handling Internationalized Domain Names (IDN).
13. **Jinja2==3.0.3**: A templating engine for Python, used by Flask for rendering templates.
14. **MarkupSafe==2.0.1**: A library for safe handling of HTML and XML strings.
15. **python-dateutil==2.8.2**: A powerful extension to the standard datetime module, providing additional features for date and time manipulation.
16. **Werkzeug==2.0.2**: A comprehensive WSGI web application library that Flask is built on.
17. **Flask-WTF==0.14.3**: An extension for Flask that integrates WTForms, providing form handling capabilities.
18. **WTForms==2.3.3**: A flexible forms validation and rendering library for Python.
Note: The `requirements.txt` files in the `classifier`, `databroker`, and `trainer` directories could not be accessed, so any additional dependencies listed there are not included in this summary.
## Docker Commands
1. Build the image:
```bash
docker build -t news_trainer /data/shared/news_training
```
2. Run the container:
```bash
docker run --name news_trainer_container news_trainer
```
## Generic Information
Page: GRPC
Summary: gRPC (gRPC Remote Procedure Calls) is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing (mobile, web, and Internet of Things). gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.
gRPC's use of HTTP/2 is considered complex. It makes it impossible to implement a gRPC client in the browser, instead requiring a proxy.Page: Protocol Buffers
Summary: Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.No good Wikipedia Search Result was found
![alt text](UML_news_training.png)
\ No newline at end of file
Excerpts-Readme/UML_Entity_Recognition.png

47.6 KiB

Excerpts-Readme/UML_House_Price_Prediction-1.png

23.9 KiB

Excerpts-Readme/UML_news_training.png

55.5 KiB

cicd.ai4eu-dev.eu:7444/tutorials/groundingllms/gllmdatabroker:v1
\ No newline at end of file
FROM python:3.8
RUN apt-get update -y
RUN apt-get install -y python3-pip python3-dev
RUN pip3 install --upgrade pip
COPY requirements.txt .
RUN pip3 install -r requirements.txt
RUN mkdir /GLMMdatabroker
COPY . /GLMMdatabroker
WORKDIR /GLMMdatabroker
RUN python3 -m grpc_tools.protoc --python_out=. --proto_path=. --grpc_python_out=. GLLM_databroker.proto
EXPOSE 8061 8062
ENTRYPOINT python3 -u GLLMserver.py
\ No newline at end of file
syntax = "proto3";
message Empty {
// Empty message
}
message UserInputs {
string organization_id = 1;
string api_key = 2;
string user_query = 3;
string usecase_data = 4;
}
service Databroker {
rpc GLMMdatabroker(Empty) returns (UserInputs);
}
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: GLLM_databroker.proto
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='GLLM_databroker.proto',
package='',
syntax='proto3',
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_pb=b'\n\x15GLLM_databroker.proto\"\x07\n\x05\x45mpty\"`\n\nUserInputs\x12\x17\n\x0forganization_id\x18\x01 \x01(\t\x12\x0f\n\x07\x61pi_key\x18\x02 \x01(\t\x12\x12\n\nuser_query\x18\x03 \x01(\t\x12\x14\n\x0cusecase_data\x18\x04 \x01(\t23\n\nDatabroker\x12%\n\x0eGLMMdatabroker\x12\x06.Empty\x1a\x0b.UserInputsb\x06proto3'
)
_EMPTY = _descriptor.Descriptor(
name='Empty',
full_name='Empty',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=25,
serialized_end=32,
)
_USERINPUTS = _descriptor.Descriptor(
name='UserInputs',
full_name='UserInputs',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
_descriptor.FieldDescriptor(
name='organization_id', full_name='UserInputs.organization_id', index=0,
number=1, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='api_key', full_name='UserInputs.api_key', index=1,
number=2, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='user_query', full_name='UserInputs.user_query', index=2,
number=3, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='usecase_data', full_name='UserInputs.usecase_data', index=3,
number=4, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=34,
serialized_end=130,
)
DESCRIPTOR.message_types_by_name['Empty'] = _EMPTY
DESCRIPTOR.message_types_by_name['UserInputs'] = _USERINPUTS
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
Empty = _reflection.GeneratedProtocolMessageType('Empty', (_message.Message,), {
'DESCRIPTOR' : _EMPTY,
'__module__' : 'GLLM_databroker_pb2'
# @@protoc_insertion_point(class_scope:Empty)
})
_sym_db.RegisterMessage(Empty)
UserInputs = _reflection.GeneratedProtocolMessageType('UserInputs', (_message.Message,), {
'DESCRIPTOR' : _USERINPUTS,
'__module__' : 'GLLM_databroker_pb2'
# @@protoc_insertion_point(class_scope:UserInputs)
})
_sym_db.RegisterMessage(UserInputs)
_DATABROKER = _descriptor.ServiceDescriptor(
name='Databroker',
full_name='Databroker',
file=DESCRIPTOR,
index=0,
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_start=132,
serialized_end=183,
methods=[
_descriptor.MethodDescriptor(
name='GLMMdatabroker',
full_name='Databroker.GLMMdatabroker',
index=0,
containing_service=None,
input_type=_EMPTY,
output_type=_USERINPUTS,
serialized_options=None,
create_key=_descriptor._internal_create_key,
),
])
_sym_db.RegisterServiceDescriptor(_DATABROKER)
DESCRIPTOR.services_by_name['Databroker'] = _DATABROKER
# @@protoc_insertion_point(module_scope)
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
"""Client and server classes corresponding to protobuf-defined services."""
import grpc
import GLLM_databroker_pb2 as GLLM__databroker__pb2
class DatabrokerStub(object):
"""Missing associated documentation comment in .proto file."""
def __init__(self, channel):
"""Constructor.
Args:
channel: A grpc.Channel.
"""
self.GLMMdatabroker = channel.unary_unary(
'/Databroker/GLMMdatabroker',
request_serializer=GLLM__databroker__pb2.Empty.SerializeToString,
response_deserializer=GLLM__databroker__pb2.UserInputs.FromString,
)
class DatabrokerServicer(object):
"""Missing associated documentation comment in .proto file."""
def GLMMdatabroker(self, request, context):
"""Missing associated documentation comment in .proto file."""
context.set_code(grpc.StatusCode.UNIMPLEMENTED)
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')
def add_DatabrokerServicer_to_server(servicer, server):
rpc_method_handlers = {
'GLMMdatabroker': grpc.unary_unary_rpc_method_handler(
servicer.GLMMdatabroker,
request_deserializer=GLLM__databroker__pb2.Empty.FromString,
response_serializer=GLLM__databroker__pb2.UserInputs.SerializeToString,
),
}
generic_handler = grpc.method_handlers_generic_handler(
'Databroker', rpc_method_handlers)
server.add_generic_rpc_handlers((generic_handler,))
# This class is part of an EXPERIMENTAL API.
class Databroker(object):
"""Missing associated documentation comment in .proto file."""
@staticmethod
def GLMMdatabroker(request,
target,
options=(),
channel_credentials=None,
call_credentials=None,
insecure=False,
compression=None,
wait_for_ready=None,
timeout=None,
metadata=None):
return grpc.experimental.unary_unary(request, target, '/Databroker/GLMMdatabroker',
GLLM__databroker__pb2.Empty.SerializeToString,
GLLM__databroker__pb2.UserInputs.FromString,
options, channel_credentials,
insecure, call_credentials, compression, wait_for_ready, timeout, metadata)
import grpc
from timeit import default_timer as timer
import logging
# import the generated classes
import GLLM_databroker_pb2_grpc
import GLLM_databroker_pb2
port = 8061
def run():
print("Calling GLLM_Stub..")
with grpc.insecure_channel('localhost:{}'.format(port)) as channel:
stub = GLLM_databroker_pb2_grpc.DatabrokerStub(channel)
ui_request = GLLM_databroker_pb2.Empty()
response = stub.GLMMdatabroker(ui_request)
print("Greeter client received: ")
print(response)
if __name__ == '__main__':
logging.basicConfig()
run()
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: GLLMdatabroker.proto
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='GLLMdatabroker.proto',
package='',
syntax='proto3',
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_pb=b'\n\x14GLLMdatabroker.proto\"\x07\n\x05\x45mpty\"`\n\nUserInputs\x12\x17\n\x0forganization_id\x18\x01 \x01(\t\x12\x0f\n\x07\x61pi_key\x18\x02 \x01(\t\x12\x12\n\nuser_query\x18\x03 \x01(\t\x12\x14\n\x0cusecase_data\x18\x04 \x01(\t23\n\nDatabroker\x12%\n\x0eGLMMdatabroker\x12\x06.Empty\x1a\x0b.UserInputsb\x06proto3'
)
_EMPTY = _descriptor.Descriptor(
name='Empty',
full_name='Empty',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=24,
serialized_end=31,
)
_USERINPUTS = _descriptor.Descriptor(
name='UserInputs',
full_name='UserInputs',
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
_descriptor.FieldDescriptor(
name='organization_id', full_name='UserInputs.organization_id', index=0,
number=1, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='api_key', full_name='UserInputs.api_key', index=1,
number=2, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='user_query', full_name='UserInputs.user_query', index=2,
number=3, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
_descriptor.FieldDescriptor(
name='usecase_data', full_name='UserInputs.usecase_data', index=3,
number=4, type=9, cpp_type=9, label=1,
has_default_value=False, default_value=b"".decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key),
],
extensions=[
],
nested_types=[],
enum_types=[
],
serialized_options=None,
is_extendable=False,
syntax='proto3',
extension_ranges=[],
oneofs=[
],
serialized_start=33,
serialized_end=129,
)
DESCRIPTOR.message_types_by_name['Empty'] = _EMPTY
DESCRIPTOR.message_types_by_name['UserInputs'] = _USERINPUTS
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
Empty = _reflection.GeneratedProtocolMessageType('Empty', (_message.Message,), {
'DESCRIPTOR' : _EMPTY,
'__module__' : 'GLLMdatabroker_pb2'
# @@protoc_insertion_point(class_scope:Empty)
})
_sym_db.RegisterMessage(Empty)
UserInputs = _reflection.GeneratedProtocolMessageType('UserInputs', (_message.Message,), {
'DESCRIPTOR' : _USERINPUTS,
'__module__' : 'GLLMdatabroker_pb2'
# @@protoc_insertion_point(class_scope:UserInputs)
})
_sym_db.RegisterMessage(UserInputs)
_DATABROKER = _descriptor.ServiceDescriptor(
name='Databroker',
full_name='Databroker',
file=DESCRIPTOR,
index=0,
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_start=131,
serialized_end=182,
methods=[
_descriptor.MethodDescriptor(
name='GLMMdatabroker',
full_name='Databroker.GLMMdatabroker',
index=0,
containing_service=None,
input_type=_EMPTY,
output_type=_USERINPUTS,
serialized_options=None,
create_key=_descriptor._internal_create_key,
),
])
_sym_db.RegisterServiceDescriptor(_DATABROKER)
DESCRIPTOR.services_by_name['Databroker'] = _DATABROKER
# @@protoc_insertion_point(module_scope)
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
"""Client and server classes corresponding to protobuf-defined services."""
import grpc
import GLLMdatabroker_pb2 as GLLMdatabroker__pb2
class DatabrokerStub(object):
"""Missing associated documentation comment in .proto file."""
def __init__(self, channel):
"""Constructor.
Args:
channel: A grpc.Channel.
"""
self.GLMMdatabroker = channel.unary_unary(
'/Databroker/GLMMdatabroker',
request_serializer=GLLMdatabroker__pb2.Empty.SerializeToString,
response_deserializer=GLLMdatabroker__pb2.UserInputs.FromString,
)
class DatabrokerServicer(object):
"""Missing associated documentation comment in .proto file."""
def GLMMdatabroker(self, request, context):
"""Missing associated documentation comment in .proto file."""
context.set_code(grpc.StatusCode.UNIMPLEMENTED)
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')
def add_DatabrokerServicer_to_server(servicer, server):
rpc_method_handlers = {
'GLMMdatabroker': grpc.unary_unary_rpc_method_handler(
servicer.GLMMdatabroker,
request_deserializer=GLLMdatabroker__pb2.Empty.FromString,
response_serializer=GLLMdatabroker__pb2.UserInputs.SerializeToString,
),
}
generic_handler = grpc.method_handlers_generic_handler(
'Databroker', rpc_method_handlers)
server.add_generic_rpc_handlers((generic_handler,))
# This class is part of an EXPERIMENTAL API.
class Databroker(object):
"""Missing associated documentation comment in .proto file."""
@staticmethod
def GLMMdatabroker(request,
target,
options=(),
channel_credentials=None,
call_credentials=None,
insecure=False,
compression=None,
wait_for_ready=None,
timeout=None,
metadata=None):
return grpc.experimental.unary_unary(request, target, '/Databroker/GLMMdatabroker',
GLLMdatabroker__pb2.Empty.SerializeToString,
GLLMdatabroker__pb2.UserInputs.FromString,
options, channel_credentials,
insecure, call_credentials, compression, wait_for_ready, timeout, metadata)
from concurrent import futures
import grpc
from app import app_run, get_parameters
import sys
import threading
import GLLM_databroker_pb2
import GLLM_databroker_pb2_grpc
import openai
from openai import OpenAI
import os
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
class DatabrokerServicer(GLLM_databroker_pb2_grpc.DatabrokerServicer):
def __init__(self):
super().__init__()
self.send_data=True
def GLMMdatabroker(self, request, context):
"""
Collect the parameters.
"""
parameters = get_parameters()
print('parameters from server side:', parameters)
logger.info(f'parameters from server side:: {parameters}')
response = GLLM_databroker_pb2.UserInputs(organization_id=str(parameters[0]), api_key=str(parameters[1]),user_query=str(parameters[2]),usecase_data=str(parameters[3]))
print('Response')
print(response)
logger.info(f'response: {response}')
logger.debug(response)
if not self.send_data:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details("all data has been processed")
self.send_data= not self.send_data
return response
def serve(port):
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
GLLM_databroker_pb2_grpc.add_DatabrokerServicer_to_server(DatabrokerServicer(), server)
server.add_insecure_port("[::]:{}".format(port))
print("Starting server. Listening on port : " + str(port))
server.start()
threading.Thread(target=app_run()).start()
server.wait_for_termination()
if __name__ == "__main__":
logging.basicConfig()
port = 8061
serve(port)
\ No newline at end of file
from flask import Flask, render_template, redirect, url_for
from flask_bootstrap import Bootstrap
from flask_wtf import FlaskForm
from wtforms.fields import StringField, SubmitField, PasswordField
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
app = Flask(__name__)
parameters = []
port = 8062
class GllmInputForm(FlaskForm):
organization_id= PasswordField('Organisation ID')
api_key = PasswordField('Your API-Key')
user_query = StringField('Enter your question for the GPT model')
usecase_data = StringField('Submit the usecase specific data')
submit_button = SubmitField('Submit')
#GLLM_databroker_Page
@app.route('/', methods=['GET', 'POST'])
def hello():
form = GllmInputForm()
if form.validate_on_submit():
parameters.clear()
parameters.append(form.organization_id.data)
parameters.append(form.api_key.data)
parameters.append(form.user_query.data)
parameters.append(form.usecase_data.data)
logger.info('Parameters')
print(parameters)
logger.info(f'parameters from app.py: {parameters}')
return render_template("display_prediction.html")
return render_template("index.html", example_form=form)
def get_parameters():
logger.debug("Return databroker parameters")
return parameters
def app_run():
app.secret_key = 'gllm'
bootstrap = Bootstrap(app)
app.run(host="0.0.0.0", port=8062)
grpcio==1.38.0
grpcio-tools==1.38.0
grpc-interceptor
multithreading
protobuf==3.16.0
openai
python-dotenv
Bootstrap-Flask==1.5.2
Flask==1.1.2
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.14.3
google==3.0.0
googleapis-common-protos==1.53.0
Jinja2==2.11.3
pandas==1.1.5
PyYAML==5.4.1
requests==2.25.1
scikit-learn==0.24.2
sklearn==0.0
SQLAlchemy==1.4.7
threadpoolctl==2.2.0
urllib3==1.26.5
Werkzeug==1.0.1
WTForms==2.3.3
MarkupSafe==1.1.1
ItsDangerous==1.1.0
Flask-Bootstrap==3.3.7.1
.grid-container {
display: grid;
grid-column-gap: 50px;
grid-row-gap: 50px;
grid-template-columns: auto auto;
padding: 10px;
}
.grid-item {
padding: 20px;
font-size: 30px;
text-align: left;
}
\ No newline at end of file
This diff is collapsed.