Turn Memory Bandwidth into Sustained Throughput

Reduce ~80% of KV cache / attention traffic at the memory interface — before it hits the bus.

THE PROBLEM

AI accelerators are not limited by peak HBM bandwidth.
They are limited by what workloads actually sustain under attention and KV cache traffic.

KV turns bandwidth into a throughput problem.

As models scale and context windows grow, attention-driven data movement dominates — creating a bottleneck that additional compute cannot overcome.

The gap between peak and sustained bandwidth is where real performance is lost.

SOFT-NMC SOLUTION

Soft-NMC: Near-Memory Compute for AI Systems
Soft-NMC reduces attention and KV cache traffic at the memory interface before it traverses the interconnect.
This converts a bandwidth-limited system into a throughput-scaled system — increasing effective tokens per second without requiring changes to the programming model.

IMPACT GRID

~80% — Reduction in KV and attention data movement
12–13× — Throughput improvement at the HBM interface
3–4× — Sustained tokens/sec at the system level
~100% — Utilization of available HBM bandwidth

How It Works

Soft-NMC preprocesses KV cache and attention traffic at the memory interface — before it hits the bus.
By reducing data movement at the source, it eliminates the dominant bottleneck in modern AI inference workloads.

DRDCL

Soft-NMC is built on DRDCL (Dynamically Reconfigurable Differential Cascode Logic), enabling dense, low-energy compute near memory.

Early modeling:

SYSTEM APPROACH

STATUS

Early-stage development focused on silicon validation and real workload characterization.

Evaluate Soft-NMC in Your Architecture

We’re working with architecture teams to explore integration paths and quantify real workload impact.

ABOUT

SoftChip is a semiconductor IP company developing DRDCL-based architectures to reduce data movement and timing inefficiencies.
Founded by experienced semiconductor engineers, focused on integration into existing architectures.