Skip to content
Snippets Groups Projects
Commit cb6ab607 authored by Swetha Lakshmana Murthy's avatar Swetha Lakshmana Murthy
Browse files

Update README.md

parent 870099e0
Branches main
No related tags found
No related merge requests found
......@@ -14,7 +14,7 @@
### [8. Shared Folders for Pipeline Nodes](#8-shared-folders-for-pipeline-nodes)
### [9. Streaming Support for gRPC Services](#9-streaming-support-for-grpc-services)
<hr style="border:1px solid #000">
### [1. Introduction](#1-introduction)
......@@ -29,10 +29,11 @@ https://developers.google.com/protocol-buffers/docs/overview
https://developers.google.com/protocol-buffers/docs/proto3
https://www.grpc.io/docs/
Because the goal is to have re-usable building blocks to comopse pipelines, the main reason to choose the above technology stack is to achieve the highest level of interoperability:
Because the goal is to have re-usable building blocks to compose pipelines, the main reason to choose the above technology stack is to achieve the highest level of interoperability:
- Docker is today the defacto standard for server side software distribution including all dependencies. It is possible to onboard containers for different architectures (x86_64, GPU, ARM, HPC/Singularity)
- Docker is today the defacto standard for serverside software distribution including all dependencies. It is possible to onboard containers for different architectures (x86_64, GPU, ARM, HPC/Singularity)
- gRPC together with protobuf is a proven specification and implementation for remote procecure calls supporting a broad range of programming languages and it is optimized for performance and high throughput.
- gRPC together with protobuf is a proven specification and implementation for remote procedure calls supporting a broad range of programming languages and it is optimized for performance and high throughput.
**Please note that the tools and models are not limited to deep learning models.** Any AI tool from any AI area like reasoning, semantic web, symbolic AI and of course deep learning can be used for piplines as long as it exposes a set of public methods via gRPC.
......@@ -69,14 +70,15 @@ service Predict {
}
```
**Important:** The parser for .proto-files inside AI4EU Experiments is much less flexible than the original protobuf compiler, so here are some rules. If the rules are not followed, it prohibits the model from being usable inside the visual editor AcuCompose.
**Important:** The parser for .proto-files inside AI4EU Experiments is much less flexible than the original protobuf compiler, so here are some rules. If the rules are not followed, it prohibits the model from being usable inside the visual editor AcuCompose. **Moreover, the enum keyword is not yet supported!**
![image](src/images/table_of_contents.PNG)
### [3. Create the gRPC docker container](#3-create-the-grpc-docker-container)
Based on model.proto, you can generate the necessary gRPC stubs and skeletons for the programming language of your choice using the protobuf compiler protoc and the respective protoc-plugins. Then create a short main executable that will read and initialize the model or tool and starts the gRPC server. This executable will be the entrypoint for the docker container.
Based on model.proto, you can generate the necessary gRPC stubs and skeletons for the programming language of your choice using the protobuf compiler **protoc** and the respective protoc-plugins. Then create a short main executable that will read and initialize the model or tool and starts the gRPC server. This executable will be the entrypoint for the docker container.
**The gPRC server must listen on port 8061.**
If the model also exposes a **Web-UI** for human interaction, which is optional, it must listen on **port 8062**.
The filetree of the docker container should look like below. In the top level folder of the container should be the files.
......@@ -88,8 +90,7 @@ And also the folders for the microservice like app and data, or any supplementar
![image](src/images/docker_container.PNG)
The license file is not mandatory and can be generated after onboarding with the
License Profile Editor in the AI4EU Experiments Web-UI:
The license file is not mandatory and can be generated after onboarding with the License Profile Editor in the AI4EU Experiments Web-UI:
https://docs.acumos.org/en/clio/submodules/license-manager/docs/user-guide-license-profile-editor.html
......@@ -131,9 +132,7 @@ dockerized model URI”:**
#### [6. First Node Parameters (e.g. for Databrokers)](#6-first-node-parameters-eg-for-databrokers)
Genereally speaking, the orchestrator dispatches the output of the previous node to the following node. A special case is the first node, where obviously no output from
the previous node exists. In order to be able to implement a general orchestrator, the first node must define its services with an Empty message type. Typically this
concerns nodes of type Databroker as the usual starting point of a pipeline.
Generally speaking, the orchestrator dispatches the output of the previous node to the following node. A special case is the first node, where obviously no output from the previous node exists. In order to be able to implement a general orchestrator, the first node must define its services with an Empty message type. Typically this concerns nodes of type Databroker as the usual starting point of a pipeline.
```proto
......@@ -152,37 +151,18 @@ service NewsDatabroker {
```
To indicate the end of data, a Databroker should return status code 5 or 11 (see chapter 4)
**Dataset features file**
Every Databroker can contain a file called dataset_features.txt. Currently, we make sure that is in accordance with the message fields in the databroker.proto file.
For instance,
Please refer the news-training pipeline tutorial for this ticket.
**Leveraging Databroker for enhanced metadata with dataset's attributes**
The dataset features can incorporated into the databroker. The information about the dataset name, associated description, size, DOI ID, etc., may be found in the logs generated out of this node.
```proto
syntax = "proto3";
When the Pipeline is launched, these initial logs are recorded. The implemented methods reads and finds the logs with the meta key - **dataset_features**. The extracted logs are then added to the metadata file (execution_run.json ) in the form of a Python dictionary.
message Empty {
}
......
A sample representation of the logs for the news_training pipeline is as follows,
message DatasetFeatues{
string datasetname = 1;
string description = 2;
string size = 3;
string DOI_ID = 4;
}
`INFO:root:{'dataset_features': {'datasetname': 'The Reuters Dataset', 'description': 'http://kdd.ics.uci.edu/databases/reuters21578/README.txt', 'size': '4MB', 'DOI_ID': 'Not available'}}`
service NewsDatabroker {
......
rpc get_dataset_metadata(Empty) returns(DatasetFeatues);
}
```
Note : Though this functionality (get_dataset_metadata()) can be implemented independent of the grpc communication, as a future scope, it is currently implemented as a part of the Databroker's protobuf definition.
Thus, we can extend the container spec for data input and output for the databroker. Finally, the dataset attributes will be available in the metadata file (execution_run.json)
A sample representation of the output is as follows,
A sample representation of the output in the metadata file is as shown below,
```json
......@@ -207,6 +187,35 @@ A sample representation of the output is as follows,
},
```
The added dataset's features is shown along with the other information in the databroker's node.
In this way, it is possible to expand the container specification for the databroker's data input and output.
<details>
<summary>Additional Information</summary>
The get_dataset_metadata() method in the databroker script can be used to accomplish the method described above. A file called dataset_features.txt may be present in each Databroker. Currently, we ensure that it complies with the message fields in the databroker.proto file. This method reads data from a text file and then sends the log data appropriately to be read by the orchestrator. To follow this example, please refer to the news-training pipeline tutorial.
```proto
syntax = "proto3";
message Empty {
}
......
message DatasetFeatues{
string datasetname = 1;
string description = 2;
string size = 3;
string DOI_ID = 4;
}
service NewsDatabroker {
......
rpc get_dataset_metadata(Empty) returns(DatasetFeatues);
}
```
Note : Though this functionality (get_dataset_metadata()) can be implemented independent of the grpc communication, as a future scope, it is currently implemented as a part of the Databroker's protobuf definition.
</details>
### [7. Scalability, GPU Support and Training](#7-scalability-gpu-support-and-training)
The potential execution environments range from Minikube on a Laptop over small
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment