R&D
SRS Labs publishes AI infrastructure research
Our R&D team has published findings on efficient LLM inference optimization, achieving 3x throughput improvement on commodity hardware.
Published: November 18, 2024
SRS Labs has released a research paper detailing novel approaches to LLM inference optimization. The team achieved a 3x throughput improvement on standard GPU hardware without quality degradation, using a combination of speculative decoding and custom attention kernels.