Skip to content
Snippets Groups Projects

Telemetry Data Collector and Monitoring Engine (TDCME)

The TDCME is an open-source solutions capable of collecting metrics and evetns from multiple locations and multiple layers. The tools is designed to monitor multi-node virtualized applications.
This project has received funding from the European Union’s “Horizon Europe” Programme Under Grant Agreement No 101135423 (ENACT).

Dependancies

TDCME is based on open-source tools and technologies. A list of frameworks and tools that TDCME is based on follows:

  • Kubernetes
  • Cilium
  • Prometheus
  • Kepler

Deployment

All subcomponents and services are deployed within the enact namespace. To deploy them in a different namespace, please update the Helm Charts accordingly.

Install cillium (optional)

Specify cluster.name, cluster.id and clusterPoolIPv4CIDRList.

Note: All the above must be unique for each cluster

cilium install --version="1.16.4" \
--set cluster.name="cloud1" \
--set cluster.id=1-255 \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.10.0.0/16" \
--set sctp.enabled=true \
--set hubble.enabled=true \
--set hubble.tls.enabled=false \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
--set global.hubble.enabled=true \
--set global.hubble.listenAddress=":4244" \
--set global.hubble.ui.enabled=true \
--set envoy.enabled=true \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set cni.chainingMode="none" 

Install Prometheus stack

In this scope of this project, the namespace enact is used for the prometheus stack. To install the stack:

helm install infra prom-stack --namespace enact --create-namespace

Note: Change infra-prometheus-prometheus service type to NodePort.

Deploy Kepler

Add Kepler Helm repository:

helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart
helm repo update

Find the label to use in kepler deployment:

 kubectl get prometheus --all-namespaces -o yaml | grep -A5 "serviceMonitorSelector"

Find the label under:

matchLabels:
        release: <label>

Deploy Kepler using the label above:

helm install kepler kepler/kepler --namespace enact --create-namespace \
    --set serviceMonitor.enabled=true \
    --set serviceMonitor.labels.release=<label>

Install Telemetry services

cd infrastructure-monitor/monitor-api/
helm upgrade --install tdcme-api ./api/chart/ --namespace enact --create-namespace
helm upgrade --install tdcme-api-topology ./topology_exporter/chart/ --namespace enact --create-namespace

Uninstall all Telemetry services

helm uninstall -n enact infra kepler tdcme-api tdcme-api-topology

Cluster Registration/Removal

To register a remote cluster with the API, use the following curl command:

curl -X PUT http://<api-node-ip>:32554/clusters \
  -H "Content-Type: application/json" \
  -d '{
    "cluster_name": "<cluster-name>",
    "host":         "http://<remote-node-ip>:<prometheus-service-NodePort>"
  }'

To remove a registered cluster:

curl -X DELETE "http://<api-node-ip>:32554/clusters/<cluster-name>"

API Endpoints

Endpoint Description
http://{endpoint}:32554/pod/{pod}/cpu returns the CPU utilisation of {pod} Pod with respect to total node CPU resources.
http://{endpoint}:32554/pod/{pod}/memory returns the Memory utilisation of {pod} Pod with respect to total node memory.
http://{endpoint}:32554/node/{node}/cpu returns the CPU utilisation of {node} node with respect to total node CPU resources.
http://{endpoint}:32554/node/{node}/memory returns the Memory utilisation of {node} node.
http://{endpoint}:32554/node/{node}/fs returns the Filesystem space utilisation of {node} node.
http://{endpoint}:32554/node/{node}/network returns the total TX/RX network metrics of {node} node.
http://{endpoint}:32554/namespaces/{namespace}/pods returns all pods under the {namespace}.
http://{endpoint}:32554/nodes return all registered nodes
http://{endpoint}:32553/topology return the Pod topology graph
http://{endpoint}:32554/pod/{pod_name}/{container_name}/metrics returns CPU and memory metrics of a container inside the specified pod.
http://{endpoint}:32554/pod/{pod_name}/energy returns the energy consumption of the specified pod.
http://{endpoint}:32554/latency/{source_node}/{destination-node} returns the latency between two Kubernetes nodes across the Edge-to-Cloud continuum