Implementing a Fast Tensor Core Matmul on the Ada Architecture spatters.ca 2 points by skidrow 14 hours ago
jhlee525 14 hours ago This is incredibly useful. Thanks for making the kernels public.I'm curious if anyone has tried generalizing this to batched matmuls or to sparse inputs on Ada?
This is incredibly useful. Thanks for making the kernels public.
I'm curious if anyone has tried generalizing this to batched matmuls or to sparse inputs on Ada?