Reduce ~80% of KV cache / attention traffic at the memory interface — before it hits the bus.
AI accelerators are not limited by peak HBM bandwidth.
They are limited by what workloads actually sustain under attention and KV cache traffic.
KV turns bandwidth into a throughput problem.
As models scale and context windows grow, attention-driven data movement dominates — creating a bottleneck that additional compute cannot overcome.
The gap between peak and sustained bandwidth is where real performance is lost.
Soft-NMC: Near-Memory Compute for AI Systems
Soft-NMC reduces attention and KV cache traffic at the memory interface before it traverses the interconnect.
This converts a bandwidth-limited system into a throughput-scaled system — increasing effective tokens per second without requiring changes to the programming model.
Soft-NMC preprocesses KV cache and attention traffic at the memory interface — before it hits the bus.
By reducing data movement at the source, it eliminates the dominant bottleneck in modern AI inference workloads.
Soft-NMC is built on DRDCL (Dynamically Reconfigurable Differential Cascode Logic), enabling dense, low-energy compute near memory.
Early-stage development focused on silicon validation and real workload characterization.
We’re working with architecture teams to explore integration paths and quantify real workload impact.
SoftChip is a semiconductor IP company developing DRDCL-based architectures to reduce data movement and timing inefficiencies.
Founded by experienced semiconductor engineers, focused on integration into existing architectures.