Processing Model Memory

1don MSN

Google unveils TurboQuant to reduce AI model memory usage

Google introduces TurboQuant, a compression method that reduces memory usage and increases speed ...

13d

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...

Semiconductor Engineering

Developing ReRAM As Next Generation On-Chip Memory For Machine Learning, Image Processing And Other Advanced CPU Applications

In modern CPU device operation, 80% to 90% of energy consumption and timing delays are caused by the movement of data between the CPU and off-chip memory. To alleviate this performance concern, ...

Semiconductor Engineering

Difficult Memory Choices In AI Systems

The number of memory choices and architectures is exploding, driven by the rapid evolution in AI and machine learning chips being designed for a wide range of very different end markets and systems.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results