K8s Workload Simulator
A modular and extensible simulator for evaluating diverse pod scheduling strategies in Kubernetes-like environments. Supports customizable clusters, workloads, and schedulers — including rule-based and learning-based approaches.
Installation
Requires Python 3.10+. The simulator can be installed as a normal Python Package if used externally:
pip install git+https://gitlab.eclipse.org/eclipse-research-labs/hyper-ai-project/k8s-workload-simulator.git
Alternatively, for local development run:
pip install -e .
For internal use, just clone the repository:
git clone https://gitlab.eclipse.org/eclipse-research-labs/hyper-ai-project/k8s-workload-simulator.git
The optional KWOK binary is needed only for KWOK-based clusters.
- For KWOK-based clusters:
- Install KWOK
- Ensure
kubectl
is configured
Standalone Simulation
To run a one-time simulation:
python3 scripts/simulation_controller.py configs/config.yaml
This uses our YAML configuration (/configs/config.yaml
) to:
- Deploy a synthetic cluster (KWOK or Python)
- Generate pod workloads based on task/pod distributions
- Apply the selected scheduler (e.g.,
ROUNDROBIN
,DEFAULT
, orDAROTRAIN
) - Save simulation traces and rewards (if enabled)
YAML Parameters for Simulation include:
🔹 Cluster Parameters
-
cluster_type
,cluster_reset
-
cluster_nodes_cloud
,cluster_nodes_edge
,cluster_nodes_iot
-
cluster_node_cloud_cpu_dist
,cluster_node_cloud_mem_dist
,cluster_node_cloud_max_pods
-
cluster_node_edge_cpu_dist
,cluster_node_edge_mem_dist
,cluster_node_edge_max_pods
-
cluster_node_iot_cpu_dist
,cluster_node_iot_mem_dist
,cluster_node_iot_max_pods
🔹 Workload Parameters
workload_tasks
-
workload_pods_number_dist
,workload_pods_cpu_dist
,workload_pods_mem_dist
-
workload_pods_interarrival_dist
,workload_pods_duration_dist
,workload_pods_max_restarts
🔹 Scheduler Parameters
scheduler_type
🔹 Simulation Settings
simulation_speedup
-
simulation_save_trace
,simulation_save_basic_stats
,simulation_save_detail_stats
🔹 Training Parameters
training_episodes
-
training_cloud_nodes_per_episode_min
,training_cloud_nodes_per_episode_max
-
training_edge_nodes_per_episode_min
,training_edge_nodes_per_episode_max
-
training_iot_nodes_per_episode_min
,training_iot_nodes_per_episode_max
-
training_tasks_per_episode_min
,training_tasks_per_episode_max
Multi-Episode Training
To launch MARL-based training using the DAROTRAIN scheduler:
python3 scripts/training_controller.py configs/config.yaml
The training process will:
- Randomize cluster size and workload per episode
- Schedule pods using the DAROTRAIN (QMIX) agent
- Train and update the agent using reward feedback
- Save model weights (
qmix_latest.pth
) and logs
Additional YAML Parameters for Training:
- All
scheduler_daro_*
hyperparameters (learning rate, gamma, etc.)
Output Artifacts
File | Description |
---|---|
simulation_trace.csv |
Trace with deployment and termination events |
simulation_basic_stats.csv |
Deployment and termination events |
simulation_detail_stats.csv |
Deployment and termination events |
reward_trace.csv |
Reward values per pod and node (only for DAROTRAIN) |
qmix_latest.pth |
Trained QMIX model (only for DAROTRAIN) |
Configurable Components
All settings are defined in a single flattened YAML (configs/config.yaml
):
- Cluster type, size, and node resource distributions
- Workload task structure and pod arrival/duration/resource distributions
- Scheduler type and parameters (including DAROTRAIN hyperparameters)
- Simulation toggles and speed
- Training episode counts and node/task ranges
Supported Schedulers
Scheduler | Description |
---|---|
DEFAULT |
Native Kubernetes (KWOK) scheduler |
ROUNDROBIN |
Simple round-robin node selection |
DAROTRAIN |
Decentralized RL scheduler using QMIX |
MOSTAVAILABLE |
Schedules on most avaliabel CPU, MEM node |
Supported Distributions
You can configure the following statistical distributions:
-
fixed
,normal
,poisson
,uniform
- Fields: CPU, memory, pod interarrival, duration, number of pods per task
Type | Format Example |
---|---|
Fixed | {type: fixed, value: 4} |
Normal | {type: normal, mean: 6, stdev: 2, min: 2, max: 8, round: 1} |
Poisson | {type: poisson, mean: 6, min: 2, max: 8, round: 1} |
Uniform | {type: uniform, min: 2, max: 8, round: 1} |
Pareto | {type: pareto, alpha: 2, min: 2, max: 8, round: 1} |
Units:
- CPU: millicores
- Memory: Mi (Kubernetes expects integer memory values for pods)
- Time (Interarrival/Duration): seconds
round
(optional): Rounds output to given decimal.
Contact
Developed and maintained by the CUT.
For issues or contributions, please contact us or submit a pull request.