[backend_cuda] How can I obtain the full CUDA kernel source code for an entire model?

I’m currently working on optimizing and analyzing deep learning model execution on GPUs, and I’d like to better understand how to obtain or generate the actual CUDA kernel source code (in .cu form or similar) corresponding to an entire model's execution pipeline.

Specifically, I’m looking for guidance on the following. Is there a recommended way to generate or extract the complete CUDA source code (not just per-op or fused fragments) for a model from a given framework. The LeNet model from the Backend_cuda.ipynb tutorial is fine.

Thanks in advance