8-bit optimizer crashes when fine-tuning gpt2-large

Using the bnb.optim.Adam8bit optimizer in place of torch.optim.Adam causes a crash after a handful of batches:

```12it [00:22,  1.82s/it]Error an illegal memory access was encountered at line 198 in file /home/alyssa/gpt_math/bitsandbytes/csrc/ops.cu```

I am fine-tuning Huggingface's version of the gpt2-large model on an Ampere 3090 GPU with CUDA version 11.6 and nVidia driver version 510.73.05. I have tried compiling bitsandbytes on my machine from source, and the `set_optim_to_run_embedding_in_fp32` trick from https://github.com/huggingface/transformers/issues/14819; neither of them affected the behavior. Running with the standard pytorch Adam optimizer works fine. `nvidia-smi` shows 16 GB of memory used on a GPU with 24 GB, so it shouldn't be running out of RAM or anywhere close to that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8-bit optimizer crashes when fine-tuning gpt2-large #26

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

8-bit optimizer crashes when fine-tuning gpt2-large #26

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions