Support for rescheduling of pods after changes to the SWM Application CR
When adding or removing pods from the SWM Application CR or changing the nodeRecommendations
, while the SWM Application is in the Running
state, SWM QoS Scheduler does not adapt to these changes and the changes are also not reflected in the infrastructure.
The CODECO Deliverable D9 v1.0 from 28.06.2023 also lists the following items under main functionality:
- Dynamic placement of application workloads (re-scheduling). SWM makes a new decision on placement of workloads whenever conditions change (e.g., new applications/components added, applications/components removed or changed, infrastructure conditions changed, change of QoS requirements by the user).
- [...]
- Application workload migration. SWM manages moving a workload already running on a compute node to another compute node. This will be implemented in different qualities and steps:
- Migration of stateless workloads.
- Migration of stateful workloads (with service disruption).
- Zero downtime migration of stateful workloads: without (or only small, defined) service disruption.
However, these seem to not be implemented right now: If a SWM Application is in the Running
state, the Reconcile
method will call the processRunningState
function in scheduler/controllers/application_controller.go
. However, that function ignores pods that were removed from the Application, it ignores changes to the nodeRecommendations
and it ignores any other change to the SWM Application CR, with the exception of missing pods and changes to the channels.
We would have expected that if the nodeRecommendations
change or we add a pod or remove a pod, SWM would call the Workload Placement Solver again, to construct a new assignment plan and then migrate pods to new nodes and recreate them based on the new assignment plan.
Without this critical function, the PDLC component is effectively useless, as changes to the nodeRecommendations
don't trigger a rescheduling in SWM.