jhlee525 14 hours ago

This is incredibly useful. Thanks for making the kernels public.

I'm curious if anyone has tried generalizing this to batched matmuls or to sparse inputs on Ada?