Deploy Single Model as JuypterConnect to HPC Jülich
To the goal is to create a file hpc-solution-jupyter.zip, that contains all files and scripts to submit a model together with a juypterlab-node for execution as sbatch job via Unicore to Jülich HPC with gpu use and Web-UI connections.
Preconditions:
- the user has an account on the Jülich system and a sufficient amount of CPU/GPU-hours available.
- the example model: to be defined: can be detr-object-detection-model or an LLM model with GPU usage
Flow
- the user extracts the hpc-solution-jupyter.zip into his home folder
- the zip contains the protobuf file of the model
- hpc-solution.zip contains a script (python or bash), e.g. "submit-slurm-jupyter-job.py" that the user must execute on the commandline
- the script can ask for user credentials
- the script creates the necessary sbatch file(s)
- the script uses unicore to submit the job and connect the web-uis (jupyter/model)
- the JupyterConnect setup includes a shared folder to which both apptainers are connected
- finally, the script prints the job-id from slurm and the connection endpoints (grpc + http) of the nodes
reference to the existing jupyterconnect-script for kubernetes:https://gitlab.eclipse.org/eclipse/graphene/kubernetes-client/-/blob/main/deploy/private/jupyter-deployment-script.py?ref_type=heads
Edited by Martin Welss