Search Results - dimitrios+nikolopoulos

1 Results Sort By:
APEX: Efficient LLM Inference on Low-Memory GPUs
THE CHALLENGE Deploying large language models efficiently has become a major business obstacle as GPU memory limitations increasingly drive-up infrastructure costs and restrict scalability. During inference, the rapid growth of the key value cache consumes GPU memory quickly, limiting batch sizes and output lengths and reducing overall system throughput....
Published: 3/31/2026   |   Updated: 3/31/2026   |   Inventor(s): Jiakun Fan, Dimitrios Nikolopoulos
Keywords(s):  
Category(s): Technology Classifications > Computer Science