GPT-OSS-20B out-of-memory on g4dn.12xlarge (4x16GB VRAM)

#231
by Filipblahof - opened

Hello, I am trying to run this model (gpt-oss-20b) locally on an AWS g4dn.12xlarge instance (4 GPUs × 16 GB VRAM), but when using the code from the description, I get an out-of-memory error. Why does this happen when the model should fit within 16 GB? Can someone share working code if they have solved this issue?

Sign up or log in to comment