Adapting PDLC-RL for UC-P1 Smartcity: Train model to minimize latency and maximize GPU usage and throughput

What steps would we need to take as developers for use case P1 (Smartcity) to train and inference a model that is tasked with minimizing latency, minimizing the entropy of GPU utilization across nodes (GPU usage should be roughly average across all nodes, not single nodes with a very high utilization and lots of nodes with low utilization) and maximizes throughput? All of these custom metrics are pulled by Prometheus in our current implementation and would be provided by ACM to PDLC (eclipse-research-labs/codeco-project/acm#28 (moved)).

How can we model this goal in PDLC-RL and what changes would we need to make to implement it?