Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

CUDA: Check failed: e == cudaSuccess: misaligned address with 3-layer BERT pretraining  #19155

@szhengac

Description

@szhengac

When I pretrained a 3-layer BERT model using GluonNLP 0.10 on one p3.24dn instance with 32GB GPU memory, I received CUDA: Check failed: e == cudaSuccess: misaligned address. With batch size 128 in total, it uses 11GB GPU memory and no error occurs. But when I slightly increased the total batch size to 176 or double it to 256, I received the error. I have cherry-picked #17767.

@sxjscience you may want to try the setting in numpy version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions