-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
What is the issue?
According to this: "This attribute [missing_value] is not treated in any special way by the library or conforming generic applications, but is often useful documentation and may be used by specific applications."
However, it is currently handled by xarray very similarly to _FillValue (see code) when reading a netCDF file.
I find it to be an issue when dealing with datasets containing both:
- float variables with
_FillValue/scale_factor/add_offsetin encoding and - non-float variables with
missing_valuein attrs (and/or in encoding).
Indeed, in that case, loading this dataset from disk would result:
- in casting all variables to float if
mask_and_scale=True, - or in not scaling float variables at all if
mask_and_scale=False.
I think it would be better to basically just ignore the missing_value attribute when loading netCDF files with xarray (instead of doing as if it was an alias of _FilleValue), so people have a way to properly indicate missing data for data types that do not support NaNs in CF-compliant datasets.
Another solution would be not to promote dtypes that are not compatible with NaNs to float when decoding a variable, but instead keeping the original dtype, set fill_value (in maybe_promote) to missing_value (if it exists) and write it in the attrs of the newly created DataArray so it can be tracked.
Note: This issue is loosely linked to #8359, which is about the other netCDF encoding attributes.
Example
import numpy as np
import xarray as xr
ds = xr.Dataset({
'my_float_var': xr.DataArray(np.array([[1.01, 2.01], [np.nan, 4.01]])),
'my_int_var': xr.DataArray(np.array([[1, 2], [-128, 4]]).astype(np.int8), attrs={'missing_value': -128}),
})
ds.my_float_var.encoding = {'dtype': np.uint16, '_FillValue': 65535, 'scale_factor': 100, 'add_offset': 0}
ds.to_netcdf('test_missing_value.nc')
print(ds)
ds1 = xr.load_dataset('test_missing_value.nc')
print(ds1)
ds2 = xr.load_dataset('test_missing_value.nc', mask_and_scale=False)
print(ds2)Current datasets
Input
<xarray.Dataset> Size: 36B
Dimensions: (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
my_float_var (dim_0, dim_1) float64 32B 1.01 2.01 nan 4.01
my_int_var (dim_0, dim_1) int8 4B 1 2 -128 4
Output 1 (with mask_and_scale=True)
<xarray.Dataset> Size: 48B
Dimensions: (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
my_float_var (dim_0, dim_1) float64 32B 0.0 0.0 nan 0.0
my_int_var (dim_0, dim_1) float32 16B 1.0 2.0 nan 4.0
Output 2 (with mask_and_scale=False)
<xarray.Dataset> Size: 12B
Dimensions: (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
my_float_var (dim_0, dim_1) uint16 8B 0 0 65535 0
my_int_var (dim_0, dim_1) int8 4B 1 2 -128 4
Expected output 1
<xarray.Dataset> Size: 36B
Dimensions: (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
my_float_var (dim_0, dim_1) float64 32B 0.0 0.0 nan 0.0
my_int_var (dim_0, dim_1) int8 4B 1 2 -128 4