Xarray does not support full range of netcdf-python compression options

### What is your issue?

### Summary

The [netcdf4-python API docs](https://unidata.github.io/netcdf4-python/#Dataset.createVariable) say the following

> If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is None (no compression). All of the compressors except `zlib` and `szip` use the HDF5 plugin architecture.
>
> If the optional keyword `zlib` is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of `compression='zlib'`.

Although `compression` is considered a valid encoding option by Xarray

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L232-L242

...it appears that we silently ignores the `compression` option when creating new netCDF4 variables:

https://github.com/pydata/xarray/blob/bbe63ab657e9cb16a7cbbf6338a8606676ddd7b0/xarray/backends/netCDF4_.py#L488-L501

### Code example

```python
shape = (10, 20)
chunksizes = (1, 10)

encoding = {
    'compression': 'zlib',
    'shuffle': True,
    'complevel': 8,
    'fletcher32': False,
    'contiguous': False,
    'chunksizes': chunksizes
}

da = xr.DataArray(
    data=np.random.rand(*shape),
    dims=['y', 'x'],
    name="foo",
    attrs={"bar": "baz"}
)
da.encoding = encoding
ds = da.to_dataset()

fname = "test.nc"
ds.to_netcdf(fname, engine="netcdf4", mode="w")

with xr.open_dataset(fname, engine="netcdf4") as ds1:
    display(ds1.foo.encoding)
```

```
{'zlib': False,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': False,
 'complevel': 0,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 10),
 'source': 'test.nc',
 'original_shape': (10, 20),
 'dtype': dtype('float64'),
 '_FillValue': nan}
```

In addition to showing that `compression` is ignored, this also reveals several other encoding options that are not available when writing data from xarray (`szip`, `zstd`, `bzip2`, `blosc`).

### Proposal

We should align with the recommendation from the netcdf4 docs and support `compression=` style encoding in NetCDF. We should deprecate `zlib=True` syntax.

	valid_encodings = {
	"zlib",
	"complevel",
	"fletcher32",
	"contiguous",
	"chunksizes",
	"shuffle",
	"_FillValue",
	"dtype",
	"compression",
	}

	nc4_var = self.ds.createVariable(
	varname=name,
	datatype=datatype,
	dimensions=variable.dims,
	zlib=encoding.get("zlib", False),
	complevel=encoding.get("complevel", 4),
	shuffle=encoding.get("shuffle", True),
	fletcher32=encoding.get("fletcher32", False),
	contiguous=encoding.get("contiguous", False),
	chunksizes=encoding.get("chunksizes"),
	endian="native",
	least_significant_digit=encoding.get("least_significant_digit"),
	fill_value=fill_value,
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Xarray does not support full range of netcdf-python compression options #7388

What is your issue?

Summary

Code example

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Xarray does not support full range of netcdf-python compression options #7388

Description

What is your issue?

Summary

Code example

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions