Skip to content

scheduling stuck in loop when deploying lpm on notready-node

during deployment of LPM the yaml currently targets every node without consideration.

this gives issues on nodes that have either taints or are not-ready.

one of our nodes looks like:

$ kubectl get nodes
stonevk             NotReady   agent           8m8s   v1.31.4-vk-N/A

$ kubectl describe node stonevk
Taints:             kubernetes.io/arch=Armv7:NoSchedule
                    kubernetes.io/os=Bluenet:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
                    virtual-kubelet.io/provider=sphere:NoSchedule

This results in an endless-loop where a pod gets deployed and deleted:

$ kubectl get events 
   SuccessfulCreate          replicaset/stonevk-lpm-67b967b94              Created pod: stonevk-lpm-67b967b94-mbndm
   TaintManagerEviction      pod/stonevk-lpm-67b967b94-mbndm               Marking for deletion Pod default/stonevk-lpm-67b967b94-mbndm
   SuccessfulCreate          replicaset/stonevk-lpm-67b967b94              Created pod: stonevk-lpm-67b967b94-h6q92
   TaintManagerEviction      pod/stonevk-lpm-67b967b94-h6q92               Marking for deletion Pod default/stonevk-lpm-67b967b94-h6q92
   SuccessfulCreate          replicaset/stonevk-lpm-67b967b94              Created pod: stoneVK-lpm-67b967b94-4bvz7
   TaintManagerEviction      pod/stonevk-lpm-67b967b94-4bvz7               Marking for deletion Pod default/stonevk-lpm-67b967b94-4bvz7
   ...

after some time (at around 200 pods in Terminating phase ) it will start throttling, but it never stops.

cc: @karamolegkos