Skip to content
Snippets Groups Projects
Forked from Eclipse Foundation / IT / Websites / sparkplug.eclipse.org
Source project has a limited visibility.

Datamite Infrastructure Project setup

Introduction

This README provides setup instructions, configuration details, and commands needed to deploy and manage the infrastructure services for Datamite Framework. This setup includes configuration of a Docker Swarm cluster, deployment of services, and steps to manage and monitor these services.

Cloning the Project from GitLab

Prerequisites

  • Git: Ensure Git is installed (git --version).
  • Access: Confirm you have permission to access the repository.

Steps

  1. Open a terminal and navigate to your desired directory.
  2. Clone the repository:
    git clone https://gitlab.eclipse.org/eclipse-research-labs/datamite-project/data-support-tools/data_ingestion_and_storage/infrastructure.git

Deployment configuration

Configure the Docker Swarm Cluster

Create the file docker/cluster.json based on the following template:

{
    "ssh_key_path": "SSH_KEY_PATH",
    "ssh_user": "SSH_USER",
    "external_ip": "EXTERNAL_IP",
    "manager": "MANAGER",
    "workers": [
        "WORKER"
    ],
    "source_env": "ENV_FILE",
    "subnet": "SUBNET",
    "gateway": "GATEWAY",
    "quality_node_hostname": "QUALITY_NODE_HOSTNAME"
}

Where, the attributes means:

Attribute Description Mandatory / Default Value
ssh_key_path Path to the SSH key for connecting to worker nodes. Optional (depends on workers)
ssh_user SSH user for connecting to each node. Optional (depends on workers)
external_ip External IP for accessing the cluster. Mandatory
manager Internal IP address of the Docker Swarm manager node. Mandatory
workers List of internal IP addresses of worker nodes in the cluster. Optional, can be an empty array (Cluster can operate with only the manager node)
source_env Path to the environment file (e.g., dev.env). Mandatory
subnet Docker network subnet mask in CIDR notation. Optional (Default: Automatically assigned by Docker if not provided)
gateway Docker network gateway IP address within the subnet. Optional (Default: Automatically assigned by Docker if not provided)
quality_node_hostname (Only for multi-node clusters) Hostname of the node/VM for deploying the quality evaluator. Optional (Required only in multi-node clusters)

Example:

{
    "external_ip": "203.0.113.1",
    "manager": "10.0.0.1",
    "workers": [],
    "source_env": "dev.env",
}

Configure the environment

The table below lists environment variables from the common.env file. You can adjust these values as needed to configure service ports.

Env Var Default Port Service
MINIO_EXT_PORT 9000 MinIO Object Storage
GOVERNANCE_EXT_PORT 8091 Governance Backend
STREAMING_EXT_PORT 8000 Streaming Service
STORAGE_EXT_PORT 8089 Storage Service

Deploy and Start all Services

Step 1: Set Up the Cluster and install dependencies

Run the following command to set up the Docker Swarm cluster and install necessary dependencies on the manager node:

sudo ./scripts/install_docker_manager.sh

Step 2: Deploy services

To deploy all services within the Docker Swarm cluster, execute:

./scripts/deploy.sh

Step 3: Verify Running Services

After deployment, check the status of all running services to ensure that each service has at least one replica running (e.g., replicas: 1/1).

docker service ls

Sample output:

ID NAME MODE REPLICAS IMAGE PORTS
5z5mnwvra8mj datamite-demo_frontend replicated 1/1 datamite/frontend:latest *:3000->3000/tcp
5z5mnwvra8mj datamite-demo_governance-service replicated 1/1 datamite/data-governance-backend:latest *:8091->8091/tcp

Stop services

Step 1: List the Stacks

To find the stack you wish to stop, list all stacks and locate the datamite-demo stack (or any other stack name you’re working with):

docker stack ls

Sample output:

NAME SERVICES
datamite-demo 20

Step 2: Remove the Stack

To stop all services within the specified stack, use:

docker stack rm <stack-name>

Step 3: Confirm the Stack removal

After removing the stack, you can confirm it has been removed by running:

docker stack ls

Remove Docker Swarm cluster

TODO: Implement a script to automate the deletion of the entire cluster dynamically

Services and Technology ports

Default ports used by the Datamite services:

Service Default Port(s)
Atlas 21000
Frontend 3000
MageAI 6789
MinIO 9000, 9001
Quality Evaluator 8001
Governance Service 8091
Spark 6080, 7077, 8080, 8081
Streaming Service 8000
Storage Service 8089