Abstract: Near-memory processing (NMP), which places lightweight processing units near the DRAM memory, has been actively studied to speed up the execution of memory-intensive applications by reducing ...
Abstract: Matrix-vector multiplication (MVM) underpins modern AI workloads, yet scaling it on electronic hardware is increasingly constrained by energy, bandwidth, and latency bottlenecks. Photonic ...
* Program re-ordering for improved L2 cache hit rate. * Automatic performance tuning. # Motivations # Matrix multiplications are a key building block of most modern high-performance computing systems.