Search Results - dimitrios+nikolopoulos

1 Results

Sort By:

APEX: Efficient LLM Inference on Low-Memory GPUs

THE CHALLENGE Deploying large language models efficiently has become a major business obstacle as GPU memory limitations increasingly drive-up infrastructure costs and restrict scalability. During inference, the rapid growth of the key value cache consumes GPU memory quickly, limiting batch sizes and output lengths and reducing overall system throughput....

Published: 3/31/2026 | Updated: 3/31/2026 | Inventor(s): Jiakun Fan, Dimitrios Nikolopoulos

Keywords(s):

Category(s): Technology Classifications > Computer Science