Is your feature request related to a problem or challenge?
Currently, Datafusion doesn't appear to support reading CSV files that use a non-UTF-8 encoding scheme, such as the common ISO-8859-1 or others.
While CSV may be a terrible data format, it's also ubiquitous in the wild and many of them use alternative character encodings. It would be useful if there was an option to read CSV files that use an encoding other than UTF-8.
Describe the solution you'd like
Add an option to CsvOptions or elsewhere to specify the encoding used by the input file, defaulting to UTF-8. Datafusion could then use encoding_rs internally to decode chunks of incoming data.
Describe alternatives you've considered
An alternative to depending on encoding_rs directly would be to expose an option that allowed users to provide their own decoding logic, which they would then likely delegate to encoding_rs. This might be desirable if the added dependency is deemed to heavy (though it could easily be put behind a feature flag).
Additional context
No response
Is your feature request related to a problem or challenge?
Currently, Datafusion doesn't appear to support reading CSV files that use a non-UTF-8 encoding scheme, such as the common ISO-8859-1 or others.
While CSV may be a terrible data format, it's also ubiquitous in the wild and many of them use alternative character encodings. It would be useful if there was an option to read CSV files that use an encoding other than UTF-8.
Describe the solution you'd like
Add an option to
CsvOptionsor elsewhere to specify the encoding used by the input file, defaulting toUTF-8. Datafusion could then useencoding_rsinternally to decode chunks of incoming data.Describe alternatives you've considered
An alternative to depending on
encoding_rsdirectly would be to expose an option that allowed users to provide their own decoding logic, which they would then likely delegate toencoding_rs. This might be desirable if the added dependency is deemed to heavy (though it could easily be put behind a feature flag).Additional context
No response