Datamite Infrastructure Project setup
Introduction
This README provides setup instructions, configuration details, and commands needed to deploy and manage the infrastructure services for Datamite Framework. This setup includes configuration of a Docker Swarm cluster, deployment of services, and steps to manage and monitor these services.
Cloning the Project from GitLab
Prerequisites
-
Git: Ensure Git is installed (
git --version
). - Access: Confirm you have permission to access the repository.
Steps
- Open a terminal and navigate to your desired directory.
- Clone the repository:
git clone https://gitlab.eclipse.org/eclipse-research-labs/datamite-project/data-support-tools/data_ingestion_and_storage/infrastructure.git
Deployment configuration
Configure the Docker Swarm Cluster
Create the file docker/cluster.json
based on the following template:
{
"ssh_key_path": "SSH_KEY_PATH",
"ssh_user": "SSH_USER",
"external_ip": "EXTERNAL_IP",
"manager": "MANAGER",
"workers": [
"WORKER"
],
"source_env": "ENV_FILE",
"subnet": "SUBNET",
"gateway": "GATEWAY",
"quality_node_hostname": "QUALITY_NODE_HOSTNAME"
}
Where, the attributes means:
Attribute | Description | Mandatory / Default Value |
---|---|---|
ssh_key_path |
Path to the SSH key for connecting to worker nodes. | Optional (depends on workers) |
ssh_user |
SSH user for connecting to each node. | Optional (depends on workers) |
external_ip |
External IP for accessing the cluster. | Mandatory |
manager |
Internal IP address of the Docker Swarm manager node. | Mandatory |
workers |
List of internal IP addresses of worker nodes in the cluster. | Optional, can be an empty array (Cluster can operate with only the manager node) |
source_env |
Path to the environment file (e.g., dev.env ). |
Mandatory |
subnet |
Docker network subnet mask in CIDR notation. | Optional (Default: Automatically assigned by Docker if not provided) |
gateway |
Docker network gateway IP address within the subnet. | Optional (Default: Automatically assigned by Docker if not provided) |
quality_node_hostname |
(Only for multi-node clusters) Hostname of the node/VM for deploying the quality evaluator. | Optional (Required only in multi-node clusters) |
Example:
{
"external_ip": "203.0.113.1",
"manager": "10.0.0.1",
"workers": [],
"source_env": "dev.env",
}
Configure the environment
The table below lists environment variables from the common.env
file. You can adjust these values as needed to configure service ports.
Env Var | Default Port | Service |
---|---|---|
MINIO_EXT_PORT | 9000 | MinIO Object Storage |
GOVERNANCE_EXT_PORT | 8091 | Governance Backend |
STREAMING_EXT_PORT | 8000 | Streaming Service |
STORAGE_EXT_PORT | 8089 | Storage Service |
Deploy and Start all Services
Step 1: Set Up the Cluster and install dependencies
Run the following command to set up the Docker Swarm cluster and install necessary dependencies on the manager node:
sudo ./scripts/install_docker_manager.sh
Step 2: Deploy services
To deploy all services within the Docker Swarm cluster, execute:
./scripts/deploy.sh
Step 3: Verify Running Services
After deployment, check the status of all running services to ensure that each service has at least one replica running (e.g., replicas: 1/1).
docker service ls
Sample output:
ID | NAME | MODE | REPLICAS | IMAGE | PORTS |
---|---|---|---|---|---|
5z5mnwvra8mj | datamite-demo_frontend | replicated | 1/1 | datamite/frontend:latest | *:3000->3000/tcp |
5z5mnwvra8mj | datamite-demo_governance-service | replicated | 1/1 | datamite/data-governance-backend:latest | *:8091->8091/tcp |
Stop services
Step 1: List the Stacks
To find the stack you wish to stop, list all stacks and locate the datamite-demo
stack (or any other stack name you’re working with):
docker stack ls
Sample output:
NAME | SERVICES |
---|---|
datamite-demo | 20 |
Step 2: Remove the Stack
To stop all services within the specified stack, use:
docker stack rm <stack-name>
Step 3: Confirm the Stack removal
After removing the stack, you can confirm it has been removed by running:
docker stack ls
Remove Docker Swarm cluster
TODO: Implement a script to automate the deletion of the entire cluster dynamically
Services and Technology ports
Default ports used by the Datamite services:
Service | Default Port(s) |
---|---|
Atlas | 21000 |
Frontend | 3000 |
MageAI | 6789 |
MinIO | 9000, 9001 |
Quality Evaluator | 8001 |
Governance Service | 8091 |
Spark | 6080, 7077, 8080, 8081 |
Streaming Service | 8000 |
Storage Service | 8089 |