Compiler & Kernel Optimizations
Plan an optimization pass: Triton/CUDA kernels, fused ops, tensor parallel chunking, and activation checkpointing. Provide profiling snapshots and gains.
Ratings
Average Rating: 0
Total Ratings: 0
Average Rating: 0
Total Ratings: 0