Inference Performance Optimization for Large Language Models on CPUs Paper • 2407.07304 • Published Jul 10, 2024 • 53
VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale Paper • 2303.10271 • Published Mar 17, 2023