[k3s] acm-operator CrashLoopBackOff on installation
By making a deployment on the following environment:
- K3s, with only 1 master node. A RaspberryPi 4 (ARM64) with Raspberry Pi OS.
ACM Operator is crashing.
Deployment Logs:
make deploy IMG=$DOCKERHUB_USER/codecoapp-operator:2.0.0
test -s /home/pi/acm/bin/controller-gen && /home/pi/acm/bin/controller-gen --version | grep -q v0.16.4 || \
GOBIN=/home/pi/acm/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.16.4
/home/pi/acm/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./api/v1alpha1" output:crd:artifacts:config=config/crd/bases
./scripts/pre_deploy.sh
Directory qos-scheduler already exists. Skipping cloning.
Directory mdm-api already exists. Skipping cloning.
Directory mdm-connectors already exists. Skipping cloning.
Directory pdlc-integration already exists. Skipping cloning.
Directory secure-connectivity already exists. Skipping cloning.
Directory network-exposure already exists. Skipping cloning.
Directory network-state-management already exists. Skipping cloning.
Directory multus-cni already exists. Skipping cloning.
Directory kube-prometheus already exists. Skipping cloning.
Directory kepler already exists. Skipping cloning.
cd config/manager && /home/pi/acm/bin/kustomize edit set image controller=danieluliedi2cat/codecoapp-operator:2.0.0
/home/pi/acm/bin/kustomize build config/default | kubectl create -f -
namespace/he-codeco-acm created
customresourcedefinition.apiextensions.k8s.io/codecoapps.codeco.he-codeco.eu created
serviceaccount/acm-operator-controller-manager created
role.rbac.authorization.k8s.io/acm-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-codecoapp-editor-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-codecoapp-operator-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-codecoapp-viewer-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/acm-operator-prometheus-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-proxy-role created
clusterrole.rbac.authorization.k8s.io/acm-operator-swm-manager-role created
rolebinding.rbac.authorization.k8s.io/acm-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/acm-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/acm-operator-prometheus-role-binding created
clusterrolebinding.rbac.authorization.k8s.io/acm-operator-proxy-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/acm-operator-swm-manager-rolebinding created
service/acm-operator-controller-manager-metrics-service created
deployment.apps/acm-operator-controller-manager created
./scripts/post_deploy.sh
Executing post deployment tasks...
........................................ GETTING CLUSTER INFORMATION ...............................................
node/bilbo labeled
........................................Kustomize Installing...............................................
/home/pi/acm/kustomize exists. Remove it first.
........................................Kustomize Install Finished...............................................
........................................Prometheus Installing...............................................
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied
customresourcedefinition.apiextensions.k8s.io/addons.k3s.cattle.io condition met
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/codecoapps.codeco.he-codeco.eu condition met
customresourcedefinition.apiextensions.k8s.io/etcdsnapshotfiles.k3s.cattle.io condition met
customresourcedefinition.apiextensions.k8s.io/helmchartconfigs.helm.cattle.io condition met
customresourcedefinition.apiextensions.k8s.io/helmcharts.helm.cattle.io condition met
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com condition met
alertmanager.monitoring.coreos.com/main created
networkpolicy.networking.k8s.io/alertmanager-main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
networkpolicy.networking.k8s.io/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-grafana-overview created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-multicluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-windows-cluster created
configmap/grafana-dashboard-k8s-resources-windows-namespace created
configmap/grafana-dashboard-k8s-resources-windows-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-k8s-windows-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-windows-node-rsrc-use created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes-aix created
configmap/grafana-dashboard-nodes-darwin created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
networkpolicy.networking.k8s.io/grafana created
prometheusrule.monitoring.coreos.com/grafana-rules created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
networkpolicy.networking.k8s.io/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
networkpolicy.networking.k8s.io/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
networkpolicy.networking.k8s.io/prometheus-k8s created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
Warning: resource apiservices/v1beta1.metrics.k8s.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
Warning: resource clusterroles/system:aggregated-metrics-reader is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
networkpolicy.networking.k8s.io/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
error when creating "manifests/prometheusOperator-clusterRoleBinding.yaml": Post "https://192.168.59.58:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?fieldManager=kubectl-client-side-apply&fieldValidation=Strict": unexpected EOF
error when retrieving current configuration of:
Resource: "apps/v1, Resource=deployments", GroupVersionKind: "apps/v1, Kind=Deployment"
Name: "prometheus-operator", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-deployment.yaml": Get "https://192.168.59.58:6443/apis/apps/v1/namespaces/monitoring/deployments/prometheus-operator": dial tcp 192.168.59.58:6443: connect: connection refused - error from a previous attempt: read tcp 192.168.59.58:44256->192.168.59.58:6443: read: connection reset by peer
error when retrieving current configuration of:
Resource: "networking.k8s.io/v1, Resource=networkpolicies", GroupVersionKind: "networking.k8s.io/v1, Kind=NetworkPolicy"
Name: "prometheus-operator", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-networkPolicy.yaml": Get "https://192.168.59.58:6443/apis/networking.k8s.io/v1/namespaces/monitoring/networkpolicies/prometheus-operator": dial tcp 192.168.59.58:6443: connect: connection refused
error when retrieving current configuration of:
Resource: "monitoring.coreos.com/v1, Resource=prometheusrules", GroupVersionKind: "monitoring.coreos.com/v1, Kind=PrometheusRule"
Name: "prometheus-operator-rules", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-prometheusRule.yaml": Get "https://192.168.59.58:6443/apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheusrules/prometheus-operator-rules": dial tcp 192.168.59.58:6443: connect: connection refused
error when retrieving current configuration of:
Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
Name: "prometheus-operator", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-service.yaml": Get "https://192.168.59.58:6443/api/v1/namespaces/monitoring/services/prometheus-operator": dial tcp 192.168.59.58:6443: connect: connection refused
error when retrieving current configuration of:
Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
Name: "prometheus-operator", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-serviceAccount.yaml": Get "https://192.168.59.58:6443/api/v1/namespaces/monitoring/serviceaccounts/prometheus-operator": dial tcp 192.168.59.58:6443: connect: connection refused
error when retrieving current configuration of:
Resource: "monitoring.coreos.com/v1, Resource=servicemonitors", GroupVersionKind: "monitoring.coreos.com/v1, Kind=ServiceMonitor"
Name: "prometheus-operator", Namespace: "monitoring"
from server for: "manifests/prometheusOperator-serviceMonitor.yaml": Get "https://192.168.59.58:6443/apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/prometheus-operator": dial tcp 192.168.59.58:6443: connect: connection refused
........................................Prometheus Installed...............................................
........................................Installing Primary CNI: Flannel...............................................
error: error validating "https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
........................................Installing NetMA...............................................
error: error validating "https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
Error from server (ServiceUnavailable): apiserver not ready
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
namespace/he-codeco-netma created
serviceaccount/l2sm-operator created
clusterrolebinding.rbac.authorization.k8s.io/l2sm-operator created
clusterrolebinding.rbac.authorization.k8s.io/l2sm-operator-netma created
deployment.apps/l2sm-controller created
service/l2sm-controller-service created
customresourcedefinition.apiextensions.k8s.io/l2networks.l2sm.l2sm.k8s.local created
customresourcedefinition.apiextensions.k8s.io/networkedgedevices.l2sm.l2sm.k8s.local created
customresourcedefinition.apiextensions.k8s.io/overlays.l2sm.l2sm.k8s.local created
serviceaccount/l2sm-controller-manager created
role.rbac.authorization.k8s.io/l2sm-leader-election-role created
clusterrole.rbac.authorization.k8s.io/l2sm-manager-role created
clusterrole.rbac.authorization.k8s.io/l2sm-metrics-reader created
clusterrole.rbac.authorization.k8s.io/l2sm-proxy-role created
rolebinding.rbac.authorization.k8s.io/l2sm-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/l2sm-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/l2sm-proxy-rolebinding created
service/l2sm-controller-manager-metrics-service created
service/l2sm-webhook-service created
deployment.apps/l2sm-controller-manager created
mutatingwebhookconfiguration.admissionregistration.k8s.io/l2sm-mutating-webhook-configuration created
deployment.apps/l2sm-operator created
service/l2sm-operator-service created
daemonset.apps/l2sm-switch created
configmap/mysql-schema created
service/mysql-service created
secret/mysql-secret created
persistentvolumeclaim/mysql-pv-claim created
persistentvolume/mysql-pv created
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post certificates.cert-manager.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post issuers.cert-manager.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (NotFound): error when creating "./deployments/l2sm-deployment.yaml": the server could not find the requested resource (post network-attachment-definitions.k8s.cni.cncf.io)
Error from server (Forbidden): error when creating "./deployments/l2sm-deployment.yaml": pods "mysql-pod" is forbidden: error looking up service account he-codeco-netma/default: serviceaccount "default" not found
E0522 10:30:16.658933 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: apiserver not ready"
W0522 10:32:17.225271 2663323 reflector.go:492] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W0522 10:32:28.289786 2663323 reflector.go:569] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: failed to list *unstructured.Unstructured: Get "https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334": net/http: TLS handshake timeout
E0522 10:32:28.289870 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get \"https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334\": net/http: TLS handshake timeout"
W0522 10:32:30.775177 2663323 reflector.go:569] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: failed to list *unstructured.Unstructured: Get "https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334": dial tcp 192.168.59.58:6443: connect: connection refused
E0522 10:32:30.775259 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get \"https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334\": dial tcp 192.168.59.58:6443: connect: connection refused"
W0522 10:32:35.908600 2663323 reflector.go:569] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: failed to list *unstructured.Unstructured: Get "https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334": dial tcp 192.168.59.58:6443: connect: connection refused
E0522 10:32:35.908659 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get \"https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334\": dial tcp 192.168.59.58:6443: connect: connection refused"
W0522 10:32:42.422891 2663323 reflector.go:569] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: failed to list *unstructured.Unstructured: Get "https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334": dial tcp 192.168.59.58:6443: connect: connection refused
E0522 10:32:42.422974 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: Get \"https://192.168.59.58:6443/api/v1/namespaces/he-codeco-acm/pods?fieldSelector=metadata.name%3Dacm-operator-controller-manager-b6d575d9c-8kgm8&resourceVersion=1334\": dial tcp 192.168.59.58:6443: connect: connection refused"
W0522 10:32:56.177615 2663323 reflector.go:569] k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: failed to list *unstructured.Unstructured: apiserver not ready
E0522 10:32:56.177689 2663323 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v1.32.4-k3s1/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: apiserver not ready"
timed out waiting for the condition on pods/acm-operator-controller-manager-b6d575d9c-8kgm8
timed out waiting for the condition on pods/coredns-697968c856-wvl8p
timed out waiting for the condition on pods/local-path-provisioner-774c6665dc-zcxl5
timed out waiting for the condition on pods/metrics-server-6f4c6675d5-lvm76
timed out waiting for the condition on pods/blackbox-exporter-d989f64d9-k2tp9
timed out waiting for the condition on pods/grafana-6b4fbf6649-xp485
timed out waiting for the condition on pods/kube-state-metrics-76ddfbb447-9cscp
timed out waiting for the condition on pods/node-exporter-ldr2n
Warning: resource namespaces/he-codeco-netma is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
namespace/he-codeco-netma configured
Network topology configuration has been generated at ./configs/switchConfig.json.
l2sm-switch-4lvdv
Copying config file to l2sm-switch-4lvdv...
Defaulted container "l2sm-switch" out of: l2sm-switch, wait-for-l2sm-operator (init)
error: Internal error occurred: unable to upgrade connection: container not found ("l2sm-switch")
Executing configuration script on l2sm-switch-4lvdv...
Defaulted container "l2sm-switch" out of: l2sm-switch, wait-for-l2sm-operator (init)
error: Internal error occurred: unable to upgrade connection: container not found ("l2sm-switch")
Configuration deployment completed.
Applying non-deployment resources...
serviceaccount/lpm-network-sa created
configmap/bilbo-config created
configmap/prometheus-config-lpm-network created
clusterrole.rbac.authorization.k8s.io/lpm-network-topology-manager created
clusterrolebinding.rbac.authorization.k8s.io/lpm-network-topology-manager-binding created
service/bilbo-lpm created
service/prometheus-lpm-network created
l2network.l2sm.l2sm.k8s.local/lpm-network created
Applying deployments individually with a 5-second delay...
Applying deployments_01.yaml...
deployment.apps/bilbo-lpm created
Applying deployments_02.yaml...
deployment.apps/prometheus-lpm-network created
Deployment process completed.
namespace/he-codeco-netma unchanged
pod/bilbo-lpm-7b7dbd59dd-gwqth condition met
timed out waiting for the condition on pods/acm-operator-controller-manager-b6d575d9c-8kgm8
timed out waiting for the condition on pods/l2sm-controller-6677d5c67c-w5mq9
timed out waiting for the condition on pods/l2sm-controller-manager-5865f9659f-9ldk8
timed out waiting for the condition on pods/l2sm-operator-5dc54875f4-69xn5
timed out waiting for the condition on pods/l2sm-switch-4lvdv
timed out waiting for the condition on pods/coredns-697968c856-wvl8p
timed out waiting for the condition on pods/local-path-provisioner-774c6665dc-zcxl5
timed out waiting for the condition on pods/metrics-server-6f4c6675d5-lvm76
timed out waiting for the condition on pods/blackbox-exporter-d989f64d9-k2tp9
timed out waiting for the condition on pods/grafana-6b4fbf6649-xp485
timed out waiting for the condition on pods/kube-state-metrics-76ddfbb447-9cscp
timed out waiting for the condition on pods/node-exporter-ldr2n
timed out waiting for the condition on pods/prometheus-adapter-599c88b6c4-xc4dq
timed out waiting for the condition on pods/prometheus-adapter-599c88b6c4-zs9d6
customresourcedefinition.apiextensions.k8s.io/netma-topologies.codeco.com created
deployment.apps/netperf-pod created
service/netperf-server created
daemonset.apps/netperf-pod created
daemonset.apps/netperf-host created
deployment.apps/nemesys created
serviceaccount/nemesys-sa created
role.rbac.authorization.k8s.io/nemesys-netma-manager created
rolebinding.rbac.authorization.k8s.io/nemesys-netma-manager-binding created
role.rbac.authorization.k8s.io/nemesys-monitoring-role created
rolebinding.rbac.authorization.k8s.io/nemesys-monitoring-role-binding created
timed out waiting for the condition on pods/l2sm-controller-6677d5c67c-w5mq9
timed out waiting for the condition on pods/l2sm-controller-manager-5865f9659f-9ldk8
timed out waiting for the condition on pods/l2sm-operator-5dc54875f4-69xn5
timed out waiting for the condition on pods/l2sm-switch-4lvdv
timed out waiting for the condition on pods/nemesys-7b7588dccf-dh27x
........................................Finished installing NetMA...............................................
.....................Installing MDM.....................................
namespace/he-codeco-mdm created
"bitnami" already exists with the same configuration, skipping
"neo4j" already exists with the same configuration, skipping
NAME: mdm-zookeeper
LAST DEPLOYED: Thu May 22 11:29:21 2025
NAMESPACE: he-codeco-mdm
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: zookeeper
CHART VERSION: 13.8.2
APP VERSION: 3.9.3
Did you know there are enterprise versions of the Bitnami catalog? For enhanced secure software supply chain features, unlimited pulls from Docker, LTS support, or application customization, see Bitnami Premium or Tanzu Application Catalog. See https://www.arrow.com/globalecs/na/vendors/bitnami for more information.
** Please be patient while the chart is being deployed **
ZooKeeper can be accessed via port 2181 on the following DNS name from within your cluster:
mdm-zookeeper.he-codeco-mdm.svc.cluster.local
To connect to your ZooKeeper server run the following commands:
export POD_NAME=$(kubectl get pods --namespace he-codeco-mdm -l "app.kubernetes.io/name=zookeeper,app.kubernetes.io/instance=mdm-zookeeper,app.kubernetes.io/component=zookeeper" -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $POD_NAME -- zkCli.sh
To connect to your ZooKeeper server from outside the cluster execute the following commands:
kubectl port-forward --namespace he-codeco-mdm svc/mdm-zookeeper 2181:2181 &
zkCli.sh 127.0.0.1:2181
WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
- resources
- tls.resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
E0522 11:31:23.906036 2669976 reflector.go:166] "Unhandled Error" err="k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request (get jobs.batch)" logger="UnhandledError"
W0522 11:33:03.165571 2669976 reflector.go:492] k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *unstructured.Unstructured ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
Error: INSTALLATION FAILED: failed post-install: 1 error occurred:
* timed out waiting for the condition
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": net/http: TLS handshake timeout
Unable to connect to the server: net/http: TLS handshake timeout
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": dial tcp 192.168.59.58:6443: connect: connection refused
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": dial tcp 192.168.59.58:6443: connect: connection refused
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": dial tcp 192.168.59.58:6443: connect: connection refused
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": dial tcp 192.168.59.58:6443: connect: connection refused
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://192.168.59.58:6443/version": dial tcp 192.168.59.58:6443: connect: connection refused
The connection to the server 192.168.59.58:6443 was refused - did you specify the right host or port?
........................................Finished installing MDM...............................................
.....................Installing PDLC.....................................
Selected WORKER_NODE_1:
The connection to the server 192.168.59.58:6443 was refused - did you specify the right host or port?
error: error validating "./volume/storage-class.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./volume/shared-pvc-pdlc.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./volume/shared-pv-pdlc.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./data_preprocessing/pdlc-dp-role.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./data_preprocessing/pdlc-dp-role-binding.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./data_preprocessing/pdlc-dp-deployment.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
The connection to the server 192.168.59.58:6443 was refused - did you specify the right host or port?
error: error validating "./context_awareness/pdlc-ca-deployment.yaml": error validating data: failed to download openapi: Get "https://192.168.59.58:6443/openapi/v2?timeout=32s": dial tcp 192.168.59.58:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
error: error validating "./gnn_model/gnn-role.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./gnn_model/gnn-role-binding.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./gnn_model/gnn_inference.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./gnn_model/gnn_controller.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
error: error validating "./rl_model/rl-role.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./rl_model/rl-role-binding.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
error: error validating "./rl_model/rl-model-deployment.yaml": error validating data: failed to download openapi: the server is currently unable to handle the request; if you choose to ignore these errors, turn validation off with --validate=false
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods)
........................................Finished installing PDLC...............................................
.....................Installing SWM.....................................
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: the server is currently unable to handle the request
......................................Finished installing SWM..................................
.....................Installing Kepler.....................................
make[1]: Entering directory '/home/pi/kepler'
./hack/tools.sh kustomize
✅ kustomize v4.5.2 is already installed
./hack/build-manifest.sh "PROMETHEUS_DEPLOY"
🔆🔆🔆 Ensuring all tools are installed 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔔 installing all tools ...
✅ bpf2go version v0.15.0 was installed successfully
🔔 installing golang.org/x/vuln/cmd/govulncheck version: latest
✅ golang.org/x/vuln/cmd/govulncheck - latest was installed successfully
✅ jq installed successfully
✅ kubectl installed successfully
✅ kustomize v4.5.2 is already installed
✅ yq installed successfully
🔔 Setting PROMETHEUS_DEPLOY as True
🔔 move to untrack workspace _output/generated-manifest
❯ rm -rf _output/generated-manifest
❯ mkdir -p _output/generated-manifest
❯ cp -r manifests/k8s/config/base manifests/k8s/config/dashboard manifests/k8s/config/exporter manifests/k8s/config/model-server manifests/k8s/config/rbac _output/generated-manifest/
🔔 Building manifests ...
🔆🔆🔆 Baremetal Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping baremetal deployment
🔆🔆🔆 CI Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping ci deployment
🔆🔆🔆 DCGM Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping dcgm deployment
🔆🔆🔆 Debug Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping debug deployment
🔆🔆🔆 Estimator Sidecar Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping estimator with sidecar deployment
🔆🔆🔆 Habana Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping habana deployment
🔆🔆🔆 Machine Spec Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping machine spec deployment
🔆🔆🔆 Model Server Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping model server deployment
🔆🔆🔆 OpenShift Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping openshift deployment
🔆🔆🔆 Prometheus Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔔 deploying prometheus
✅ Prometheus deployment configured
🙈 SKIP: skipping prometheus deployment with high granularity
🔆🔆🔆 QAT Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping qat deployment
🔆🔆🔆 Rootless Deployment 🔆🔆🔆
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🙈 SKIP: skipping rootless deployment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
~/kepler/_output/generated-manifest/exporter ~/kepler
❯ kustomize edit set image kepler=quay.io/sustainable_computing_io/kepler:release-0.7.8
❯ kustomize edit set image kepler_model_server=quay.io/sustainable_computing_io/kepler_model_server:latest
~/kepler
~/kepler/_output/generated-manifest/model-server ~/kepler
❯ kustomize edit set image kepler_model_server=quay.io/sustainable_computing_io/kepler_model_server:latest
~/kepler
🔔 kustomize manifests...
✅ Manifests build successfully.
🔔 run kubectl create -f _output/generated-manifest/deployment.yaml to deploy
make[1]: Leaving directory '/home/pi/kepler'
namespace/kepler created
serviceaccount/kepler-sa created
role.rbac.authorization.k8s.io/prometheus-k8s created
clusterrole.rbac.authorization.k8s.io/kepler-clusterrole created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/kepler-clusterrole-binding created
configmap/kepler-cfm created
secret/redfish-4kh9d7bc7m created
service/kepler-exporter created
daemonset.apps/kepler-exporter created
prometheusrule.monitoring.coreos.com/kepler-common-rules created
servicemonitor.monitoring.coreos.com/kepler-exporter created
......................................Finished installing Kepler..................................`
All this installation ends up with the following Pods being deployed;
`k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default bilbo-lpm-7b7dbd59dd-gwqth 1/1 Running 0 109m
default netperf-host-j6shh 1/1 Running 0 89m
default netperf-pod-8f78745df-v5bhn 1/1 Running 0 89m
default netperf-pod-8f78745df-vw5f7 1/1 Running 0 89m
default netperf-pod-rv8lw 1/1 Running 0 89m
he-codeco-acm acm-operator-controller-manager-b6d575d9c-8kgm8 1/2 CrashLoopBackOff 46 (2m5s ago) 130m
he-codeco-mdm mdm-kafka-0 0/1 Pending 0 68m
he-codeco-mdm mdm-zookeeper-0 0/1 Pending 0 68m
he-codeco-netma l2sm-controller-6677d5c67c-w5mq9 0/1 Pending 0 124m
he-codeco-netma l2sm-controller-manager-5865f9659f-9ldk8 0/2 Pending 0 124m
he-codeco-netma l2sm-operator-5dc54875f4-69xn5 0/1 Pending 0 124m
he-codeco-netma l2sm-switch-4lvdv 0/1 Init:0/1 0 124m
he-codeco-netma nemesys-7b7588dccf-dh27x 0/1 Pending 0 88m
kepler kepler-exporter-72pfr 1/1 Running 0 62m
kube-system coredns-697968c856-wvl8p 1/1 Running 0 132m
kube-system local-path-provisioner-774c6665dc-zcxl5 1/1 Running 0 132m
kube-system metrics-server-6f4c6675d5-lvm76 1/1 Running 0 132m
monitoring blackbox-exporter-d989f64d9-k2tp9 3/3 Running 0 130m
monitoring grafana-6b4fbf6649-xp485 1/1 Running 0 130m
monitoring kube-state-metrics-76ddfbb447-9cscp 3/3 Running 0 130m
monitoring node-exporter-ldr2n 2/2 Running 0 130m
monitoring prometheus-adapter-599c88b6c4-xc4dq 1/1 Running 0 124m
monitoring prometheus-adapter-599c88b6c4-zs9d6 1/1 Running 0 124m