Ingress Traffic from CODECO monitoring components to Prometheus: network policies need revision (ACM)
Hello,
We (at FOR) are working on deploying CODECO in our local cluster (bare-matel)(K3s) and we had an issue w.r.t the freshness connector pod in MDM component relating to the connection to Prometheus svc being blocked. Default kube-Prometheus doesn't allow ingress traffic (HTTP requests) from any pod to its service (Prometheus-k8s) unless it was granted access in the network policy yaml file, ( prometheus-networkPolicy.yaml) for security purposes probably. (this issue is not reproducible in KinD environment without network policy and isolation).
logs of the freshness connector pod, stating the error:
kubectl logs freshness-connector-check-connection-68fdc47456-p7ch7 -n he-codeco-mdm
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
2024-12-04:17:54:46,536 INFO [eventpublisher.py:114] Load last connector state from json file
2024-12-04:17:54:46,537 INFO [eventpublisher.py:185] Pf-model-registry url: http://mdm-api.he-codeco-mdm:8090/mdm/api/v1
loop
2024-12-04:17:54:46,539 DEBUG [connectionpool.py:243] Starting new HTTP connection (1): prometheus-k8s.monitoring.svc.cluster.local:9090
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 495, in _make_request
conn.request(
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connection.py", line 441, in request
self.endheaders()
File "/usr/lib64/python3.11/http/client.py", line 1298, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.11/http/client.py", line 1058, in _send_output
self.send(msg)
File "/usr/lib64/python3.11/http/client.py", line 996, in send
self.connect()
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connection.py", line 279, in connect
self.sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connection.py", line 214, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x76503feb1950>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='prometheus-k8s.monitoring.svc.cluster.local', port=9090): Max retries exceeded with url: /api/v1/query?query=CODECO_freshness%7Bkubernetes_pod_name%21%3D%27%27%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x76503feb1950>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/app-root/src/python/connector.py", line 99, in <module>
main()
File "/opt/app-root/src/python/connector.py", line 62, in main
response =requests.get(config.PROMETHEUS_URL +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='prometheus-k8s.monitoring.svc.cluster.local', port=9090): Max retries exceeded with url: /api/v1/query?query=CODECO_freshness%7Bkubernetes_pod_name%21%3D%27%27%7D (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x76503feb1950>: Failed to establish a new connection: [Errno 111] Connection refused'))
If this is not how the pod should behave, and it should have been able to connect to Prometheus service, please let us know. Otherwise, we have manually fixed this issue:
a possible solution that worked for us is to label the namespace of he-codeco-mdm, then use this along the freshness connector deployment label to add a new network policy rule to allow access: (https://kubernetes.io/docs/concepts/services-networking/network-policies/#allow-all-ingress-traffic)
- kubectl label namespace he-codeco-mdm namespace=he-codeco-mdm
- kube-prometheus/manifests/prometheus-networkPolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 3.0.0
name: prometheus-k8s
namespace: monitoring
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 9090
protocol: TCP
- port: 8080
protocol: TCP
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus-adapter
ports:
- port: 9090
protocol: TCP
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: grafana
ports:
- port: 9090
protocol: TCP
- from: ##Changed
- podSelector: ##Changed
matchLabels: ##Changed
app.kubernetes.io/instance: freshness-connector ##Changed
namespaceSelector: ##Changed
matchExpressions: ##Changed
- key: namespace ##Changed
operator: In ##Changed
values: ["he-codeco-mdm"] ##Changed
ports:
- port: 9090
protocol: TCP
- port: 8080
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
- after deleting the freshness-connector-check-connection and deploying again, the pod is running, with the following logs:
kubectl logs freshness-connector-check-connection-68fdc47456-m6bql -n he-codeco-mdm
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
2024-12-04:18:11:11,697 INFO [eventpublisher.py:114] Load last connector state from json file
2024-12-04:18:11:11,697 INFO [eventpublisher.py:185] Pf-model-registry url: http://mdm-api.he-codeco-mdm:8090/mdm/api/v1
loop
2024-12-04:18:11:11,700 DEBUG [connectionpool.py:243] Starting new HTTP connection (1): prometheus-k8s.monitoring.svc.cluster.local:9090
2024-12-04:18:11:12,726 DEBUG [connectionpool.py:546] http://prometheus-k8s.monitoring.svc.cluster.local:9090 "GET /api/v1/query?query=CODECO_freshness%7Bkubernetes_pod_name%21%3D%27%27%7D HTTP/11" 200 93
(.venv) codeco@master:~/codeco-on-premise/acm/scripts$ kubectl logs freshness-connector-check-connection-68fdc47456-m6bql -n he-codeco-mdm
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
{'CAcerts': [], 'crawler': {'k8s': {'api_url': 'https://kubernetes.default.svc'}, 'prometheus': {'entity': 'Freshness_state', 'metric': 'CODECO_freshness', 'server': {'port': 9090, 'url': 'http://prometheus-k8s.monitoring.svc.cluster.local'}}, 'schedule': 300}, 'cronSchedule': '{{ cat (untilStep (mod (randNumeric 3 | atoi) (.Values.frequency | int) | int) 59 (.Values.frequency | int) | join ",") "* * * *" }}', 'frequency': 5, 'image': {'name': 'mdm-connector-prometheus', 'pullPolicy': 'Always', 'repository': 'hecodeco'}, 'k8sauth': {'enabled': False}, 'oidc': {'authServerUrl': 'https://local.oidc.server', 'clientId': 'pf-connector', 'clientSecret': 'xxxxxxxxxxxxxxx', 'enabled': False}, 'pathfinder': {'connector': {'id': 'prometheus-connector-default', 'state': {'access-key': 'XXX', 'bucket-name': 'pathfinder-test-cos-connector-status', 'secret-key': 'YYY', 'service-endpoint': 's3.us-south.cloud-object-storage.appdomain.cloud', 'signing-region': 'us-south', 'type': 'local'}}, 'kubernetesUrl': 'https://kubernetes.default.svc', 'url': 'http://mdm-api.he-codeco-mdm:8090/mdm/api/v1'}, 'podAnnotations': {}, 'podSecurityContext': {}, 'resources': {}, 'securityContext': {}, 'serviceAccount': {'annotations': {}, 'automount': True, 'create': True, 'name': ''}, 'stopMode': 'stop', 'suspended': False}
2024-12-04:18:11:11,697 INFO [eventpublisher.py:114] Load last connector state from json file
2024-12-04:18:11:11,697 INFO [eventpublisher.py:185] Pf-model-registry url: http://mdm-api.he-codeco-mdm:8090/mdm/api/v1
loop
2024-12-04:18:11:11,700 DEBUG [connectionpool.py:243] Starting new HTTP connection (1): prometheus-k8s.monitoring.svc.cluster.local:9090
2024-12-04:18:11:12,726 DEBUG [connectionpool.py:546] http://prometheus-k8s.monitoring.svc.cluster.local:9090 "GET /api/v1/query?query=CODECO_freshness%7Bkubernetes_pod_name%21%3D%27%27%7D HTTP/11" 200 93
Hoping to fix this issue in the main deployment script to avoid this error.