What is your issue?
Summary
The netcdf4-python API docs say the following
If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.
If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.
Although compression is considered a valid encoding option by Xarray
|
valid_encodings = { |
|
"zlib", |
|
"complevel", |
|
"fletcher32", |
|
"contiguous", |
|
"chunksizes", |
|
"shuffle", |
|
"_FillValue", |
|
"dtype", |
|
"compression", |
|
} |
...it appears that we silently ignores the compression option when creating new netCDF4 variables:
|
nc4_var = self.ds.createVariable( |
|
varname=name, |
|
datatype=datatype, |
|
dimensions=variable.dims, |
|
zlib=encoding.get("zlib", False), |
|
complevel=encoding.get("complevel", 4), |
|
shuffle=encoding.get("shuffle", True), |
|
fletcher32=encoding.get("fletcher32", False), |
|
contiguous=encoding.get("contiguous", False), |
|
chunksizes=encoding.get("chunksizes"), |
|
endian="native", |
|
least_significant_digit=encoding.get("least_significant_digit"), |
|
fill_value=fill_value, |
|
) |
Code example
shape = (10, 20)
chunksizes = (1, 10)
encoding = {
'compression': 'zlib',
'shuffle': True,
'complevel': 8,
'fletcher32': False,
'contiguous': False,
'chunksizes': chunksizes
}
da = xr.DataArray(
data=np.random.rand(*shape),
dims=['y', 'x'],
name="foo",
attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()
fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")
with xr.open_dataset(fname, engine="netcdf4") as ds1:
display(ds1.foo.encoding)
{'zlib': False,
'szip': False,
'zstd': False,
'bzip2': False,
'blosc': False,
'shuffle': False,
'complevel': 0,
'fletcher32': False,
'contiguous': False,
'chunksizes': (1, 10),
'source': 'test.nc',
'original_shape': (10, 20),
'dtype': dtype('float64'),
'_FillValue': nan}
In addition to showing that compression is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip, zstd, bzip2, blosc).
Proposal
We should align with the recommendation from the netcdf4 docs and support compression= style encoding in NetCDF. We should deprecate zlib=True syntax.
What is your issue?
Summary
The netcdf4-python API docs say the following
Although
compressionis considered a valid encoding option by Xarrayxarray/xarray/backends/netCDF4_.py
Lines 232 to 242 in bbe63ab
...it appears that we silently ignores the
compressionoption when creating new netCDF4 variables:xarray/xarray/backends/netCDF4_.py
Lines 488 to 501 in bbe63ab
Code example
In addition to showing that
compressionis ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip,zstd,bzip2,blosc).Proposal
We should align with the recommendation from the netcdf4 docs and support
compression=style encoding in NetCDF. We should deprecatezlib=Truesyntax.