Telemetry Data Collector and Monitoring Engine (TDCME)
The TDCME is an open-source solutions capable of collecting metrics and evetns from multiple locations and multiple layers. The tools is designed to monitor multi-node virtualized applications.
This project has received funding from the European Union’s “Horizon Europe” Programme Under Grant Agreement No 101135423 (ENACT).
Dependancies
TDCME is based on open-source tools and technologies. A list of frameworks and tools that TDCME is based on follows:
- Kubernetes
- Cilium
- Prometheus
- Kepler
Deployment
All subcomponents and services are deployed within the enact
namespace. To deploy them in a different namespace, please update the Helm Charts accordingly.
Install cillium (optional)
Specify cluster.name
, cluster.id
and clusterPoolIPv4CIDRList
.
Note: All the above must be unique for each cluster
cilium install --version="1.16.4" \
--set cluster.name="cloud1" \
--set cluster.id=1-255 \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.10.0.0/16" \
--set sctp.enabled=true \
--set hubble.enabled=true \
--set hubble.tls.enabled=false \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
--set global.hubble.enabled=true \
--set global.hubble.listenAddress=":4244" \
--set global.hubble.ui.enabled=true \
--set envoy.enabled=true \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set cni.chainingMode="none"
Install Prometheus stack
In this scope of this project, the namespace enact is used for the prometheus stack. To install the stack:
helm install infra prom-stack --namespace enact --create-namespace
Note: Change infra-prometheus-prometheus service type to NodePort.
Deploy Kepler
Add Kepler Helm repository:
helm repo add kepler https://sustainable-computing-io.github.io/kepler-helm-chart
helm repo update
Find the label to use in kepler deployment:
kubectl get prometheus --all-namespaces -o yaml | grep -A5 "serviceMonitorSelector"
Find the label under:
matchLabels:
release: <label>
Deploy Kepler using the label above:
helm install kepler kepler/kepler --namespace enact --create-namespace \
--set serviceMonitor.enabled=true \
--set serviceMonitor.labels.release=<label>
Install Telemetry services
cd infrastructure-monitor/monitor-api/
helm upgrade --install tdcme-api ./api/chart/ --namespace enact --create-namespace
helm upgrade --install tdcme-api-topology ./topology_exporter/chart/ --namespace enact --create-namespace
Uninstall all Telemetry services
helm uninstall -n enact infra kepler tdcme-api tdcme-api-topology
Cluster Registration/Removal
To register a remote cluster with the API, use the following curl
command:
curl -X PUT http://<api-node-ip>:32554/clusters \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "<cluster-name>",
"host": "http://<remote-node-ip>:<prometheus-service-NodePort>"
}'
To remove a registered cluster:
curl -X DELETE "http://<api-node-ip>:32554/clusters/<cluster-name>"
API Endpoints
Endpoint | Description |
---|---|
http://{endpoint}:32554/pod/{pod}/cpu | returns the CPU utilisation of {pod} Pod with respect to total node CPU resources. |
http://{endpoint}:32554/pod/{pod}/memory | returns the Memory utilisation of {pod} Pod with respect to total node memory. |
http://{endpoint}:32554/node/{node}/cpu | returns the CPU utilisation of {node} node with respect to total node CPU resources. |
http://{endpoint}:32554/node/{node}/memory | returns the Memory utilisation of {node} node. |
http://{endpoint}:32554/node/{node}/fs | returns the Filesystem space utilisation of {node} node. |
http://{endpoint}:32554/node/{node}/network | returns the total TX/RX network metrics of {node} node. |
http://{endpoint}:32554/namespaces/{namespace}/pods | returns all pods under the {namespace}. |
http://{endpoint}:32554/nodes | return all registered nodes |
http://{endpoint}:32553/topology | return the Pod topology graph |
http://{endpoint}:32554/pod/{pod_name}/{container_name}/metrics | returns CPU and memory metrics of a container inside the specified pod. |
http://{endpoint}:32554/pod/{pod_name}/energy | returns the energy consumption of the specified pod. |
http://{endpoint}:32554/latency/{source_node}/{destination-node} | returns the latency between two Kubernetes nodes across the Edge-to-Cloud continuum |