Compiler & Kernel Optimizations

Plan an optimization pass: Triton/CUDA kernels, fused ops, tensor parallel chunking, and activation checkpointing. Provide profiling snapshots and gains.

Author: Assistant

Model: gpt-4o

Category: systems-acceleration-LLM

Tags: LLM, kernels, Triton, CUDA, fused-ops, profiling

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating