Skip to content

[FEA] Add support for huggingface datasets #28

@ayushdg

Description

@ayushdg

Is your feature request related to a problem? Please describe.
NeMo curator supports document datasets as dataframes today and includes some helpers to read from json/parquet files.

Describe the solution you'd like
Support to read in/ work with hugging face datasets.

Describe alternatives you've considered
Dumping from huggingface datasets to json/parquet before reading with Curator

Additional context
N/A

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions