Suggest a cloud gpu service to fine tune Bloom.

#128

by mishavee - opened Oct 22, 2022

Discussion

mishavee

Oct 22, 2022

Suggest a cloud gpu service to fine tune Bloom.

TimeRobber

BigScience Workshop org Oct 23, 2022

Could you explain more clearly what your issue is? Typically what is the problem you're facing. I mean whatever provider with enough A100 would work. In our case we use 384 80G A100s to train it, with 48 GPUs per replicas. Depending on how you want to finetune, you might need less than 48 A100s.

And also you can batch your questions into a single discussion.

mishavee

Oct 23, 2022

well I would like to fine tune Bloom for two things. One is paraphrasing and two summarization. However the costs of a 100 rental is a lot because I assume it would take at least a month to fine tune for each purpose. What is the cheapest approach? How long does one epoch take on a reasonable amount of gpus. My calculations are something like at least 25k per month per purpose from lambda, the cheapest cloud gpu service.

TimeRobber

BigScience Workshop org Oct 23, 2022

Not sure what the cheapest approach is. I do think that you first need to figure out how to optimize your software before worrying about the hardware. Typically we use Megatron-DeepSpeed to train as it allowed to get the fastest training we could get at the time.

How long does one epoch take on a reasonable amount of gpus.

That question isn't very well formulated, as epoch is usually dataset dependent. During pretraining we only do one epoch, but it last 3 months on 382 GPUs. I'm guessing your fine tuning setup is probably going to be shorter.

I would usually advise against fine tuning such big models as you could just use prompting in order to solve your task. The only fine tuning I would really consider is T0-style, where we fine tune on a bunch of tasks.

mishavee

Oct 24, 2022

what is T0-style?

Is prompting just giving it one example and then starting the sequence for the next and have Bloom complete it?

yes you are right it is badly formulated. If the bloom original training set was 1.6tb and all my training data is 1gb. Will it be 1600 times faster or require 1600 times less gpu power or some combination of both?

what is pretraining, does it mean before validation?

how many words is 2048 tokens in Bloom approximately?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment