Describe the bug
I have a data set created by Apache Spark and I tried to query it from the DataFusion CLI. It failed, saying that a parquet file was corrupt.
CREATE EXTERNAL TABLE store_sales STORED AS PARQUET LOCATION 'store_sales.dat';
0 rows in set. Query took 0.002 seconds.
❯ select count(*) from store_sales;
Parquet reader thread terminated due to error: ParquetError(General("Invalid Parquet file. Corrupt footer"))
I added some debug logging and found that it was actually trying to read the following file, which is not a Parquet file.
store_sales.dat/.part-00005-5142b177-bacb-499d-b14f-12de4b94d9d9-c000.snappy.parquet.crc
To Reproduce
Create a non-Parquet file with a non-Parquet extension and put it in a directory along with some valid parquet files.
Expected behavior
Should only try and read files with file extension .parquet.
Additional context
None
Describe the bug
I have a data set created by Apache Spark and I tried to query it from the DataFusion CLI. It failed, saying that a parquet file was corrupt.
I added some debug logging and found that it was actually trying to read the following file, which is not a Parquet file.
To Reproduce
Create a non-Parquet file with a non-Parquet extension and put it in a directory along with some valid parquet files.
Expected behavior
Should only try and read files with file extension
.parquet.Additional context
None