NVIDIA explores optimizing GPU performance by reducing instruction cache misses, focusing on a genomics workload using the Smith-Waterman algorithm. (Read...
TEAL offers a training-free approach to activation sparsity, significantly enhancing the efficiency of large language models (LLMs) with minimal degradation....