Is your feature request related to a problem? Please describe.
NeMo curator supports document datasets as dataframes today and includes some helpers to read from json/parquet files.
Describe the solution you'd like
Support to read in/ work with hugging face datasets.
Describe alternatives you've considered
Dumping from huggingface datasets to json/parquet before reading with Curator
Additional context
N/A
Is your feature request related to a problem? Please describe.
NeMo curator supports document datasets as dataframes today and includes some helpers to read from json/parquet files.
Describe the solution you'd like
Support to read in/ work with hugging face datasets.
Describe alternatives you've considered
Dumping from huggingface datasets to json/parquet before reading with Curator
Additional context
N/A