GPT-OSS-20B out-of-memory on g4dn.12xlarge (4x16GB VRAM)
#231
by
Filipblahof - opened
Hello, I am trying to run this model (gpt-oss-20b) locally on an AWS g4dn.12xlarge instance (4 GPUs × 16 GB VRAM), but when using the code from the description, I get an out-of-memory error. Why does this happen when the model should fit within 16 GB? Can someone share working code if they have solved this issue?