Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
b395d3b
devs
davidhassell Jan 19, 2021
0a89b57
devs
davidhassell Jan 19, 2021
bb0bba1
devs
davidhassell Jan 20, 2021
8a8a5de
data utils
davidhassell Jan 21, 2021
2a765a7
working read
davidhassell Jan 21, 2021
44653c3
working read - all compression
davidhassell Jan 22, 2021
9bf7771
getitem, major reorganisation
davidhassell Jan 26, 2021
66d654c
setitem
davidhassell Jan 27, 2021
9025316
tidy
davidhassell Jan 27, 2021
bcdc676
tidy
davidhassell Jan 28, 2021
3d9fd2e
lock/asarray
davidhassell Jan 28, 2021
1a33bd2
dask_array -> property
davidhassell Jan 28, 2021
6d51bdf
hard/soft mask
davidhassell Jan 29, 2021
72ec46d
hard/soft mask; _map_blockscd
davidhassell Jan 30, 2021
555a3e1
hard/soft mask
davidhassell Jan 31, 2021
92a6648
hard/soft mask; setitem test
davidhassell Feb 1, 2021
9acdf90
hard/soft mask; setitem test
davidhassell Feb 1, 2021
210b380
hard/soft mask; setitem test
davidhassell Feb 1, 2021
aa7eaf2
hard/soft mask; setitem test
davidhassell Feb 2, 2021
b21a129
bug fixes to __getitem__
davidhassell Feb 2, 2021
0868dfd
Set version to ultimate branch goal of 4.0.0
sadielbartholomew Feb 19, 2021
9d7d2ea
Fix mistake in rebase conflict resolution, now test_Data passes
sadielbartholomew Feb 24, 2021
e8dc885
Re-apply black and docformatter to cf/ dir post-rebase w/ 'dask'
sadielbartholomew Feb 24, 2021
dd3fe38
Update branch via merging master (w/ --no-verify, needs a few fixes)
sadielbartholomew Aug 12, 2021
d09c5f0
Fix post-merge import, aliasing & undefined/unused variable issues
sadielbartholomew Aug 12, 2021
1817f28
Remove final lingering references to mpi_* functions
sadielbartholomew Aug 12, 2021
68a52a5
Remove dead code from data.data module
sadielbartholomew Aug 12, 2021
a1cdf20
Add test_Data skips to monitor progress of LAMA to Dask migration
sadielbartholomew Aug 13, 2021
2cc1757
Add new Actions workflow to test cf.Data only for lama-to-dask branch
sadielbartholomew Aug 13, 2021
97964a5
Remove Python 3.6 job from new dask migration workflow
sadielbartholomew Aug 16, 2021
9fea9ad
Merge pull request #244 from sadielbartholomew/lama-to-dask-1
sadielbartholomew Aug 16, 2021
16cae6a
Create & apply temporary decorator to mark & log 'daskified' methods
sadielbartholomew Aug 19, 2021
68e8a73
Merge pull request #245 from sadielbartholomew/lama-to-dask-2
sadielbartholomew Aug 19, 2021
4166100
Isolate newly-deprecated methods ready to move all deprecations
sadielbartholomew Aug 19, 2021
c85eebc
Move deprecated objects out of data module into dedicated mixin
sadielbartholomew Aug 19, 2021
b18f653
Address DeprecationWarning raised due to backslashes in docstring
sadielbartholomew Aug 19, 2021
b137867
Merge pull request #246 from sadielbartholomew/lama-to-dask-3
sadielbartholomew Aug 20, 2021
ee716e6
Remove all test_Data loops over chunksize & update immediate errors
sadielbartholomew Aug 23, 2021
665314c
Reinstate some test variables still required for now
sadielbartholomew Aug 23, 2021
219343b
Merge pull request #248 from sadielbartholomew/lama-to-dask-5
sadielbartholomew Aug 23, 2021
3acffce
cyclic axes
davidhassell Sep 1, 2021
900832e
getitem: cyclic axes, unit tests
davidhassell Sep 6, 2021
336bd68
remove debugging print statements
davidhassell Sep 6, 2021
1bfe9fe
setitem
davidhassell Sep 6, 2021
8f07e1f
hardmask, unknown shape
davidhassell Sep 8, 2021
021e068
information
davidhassell Sep 9, 2021
9c1a4c4
tidy
davidhassell Sep 9, 2021
9d3f5a7
hardmask
davidhassell Sep 9, 2021
599ca1a
where
davidhassell Sep 10, 2021
97c1cdd
where
davidhassell Sep 11, 2021
a4ef294
fix setting of non-dask array in __init__
davidhassell Sep 11, 2021
865caed
re-instated select parameter to _read_a_file
davidhassell Sep 12, 2021
9314241
comment
davidhassell Sep 13, 2021
a5a20ec
dask where
davidhassell Sep 13, 2021
742c3e2
dask reset_mask_hardness, docs
davidhassell Sep 13, 2021
a091728
dask reset_mask_hardness
davidhassell Sep 13, 2021
d8a9ad1
tighten broadcasting
davidhassell Sep 15, 2021
7fbcb27
force can_compute to return True (for now ...)
davidhassell Sep 30, 2021
8a63f9b
Merge pull request #265 from davidhassell/dask-can-compute
sadielbartholomew Sep 30, 2021
59bc506
Typos
davidhassell Oct 4, 2021
142556d
Typos
davidhassell Oct 4, 2021
aecd89b
Correct docstring
davidhassell Oct 4, 2021
73704a5
Merge pull request #257 from davidhassell/dask-getitem
sadielbartholomew Oct 4, 2021
dfc111e
fix doc string examples
davidhassell Oct 5, 2021
2d788e9
move dask utils to new file
davidhassell Oct 5, 2021
4be9098
Merge pull request #269 from davidhassell/dask-chunk-utils
sadielbartholomew Oct 5, 2021
24bb2d4
Merge branch 'lama-to-dask' into dask-where
davidhassell Oct 5, 2021
6ddc30a
_map_blocks
davidhassell Oct 5, 2021
cff3d79
Merge pull request #260 from davidhassell/dask-where
davidhassell Oct 5, 2021
5da6fde
move _cf_where to dask_utils
davidhassell Oct 6, 2021
fde1894
Merge pull request #271 from davidhassell/dask-where-2
sadielbartholomew Oct 6, 2021
10c3001
Merge branch 'lama-to-dask' into dask-init
sadielbartholomew Oct 6, 2021
fc11e97
Merge pull request #262 from davidhassell/dask-init
sadielbartholomew Oct 6, 2021
b9c4897
Address outstanding TODO from #262
sadielbartholomew Oct 7, 2021
1b8933c
Migrate Data.transpose method from LAMA to Dask
sadielbartholomew Aug 20, 2021
abab556
Update cf/data/data.py in light of DH feedback on #247
sadielbartholomew Oct 7, 2021
df6836e
Address trivial-to-apply feedback from DH review in #247
sadielbartholomew Oct 7, 2021
dd19fa3
Address DH feedback on #247 RE logic to check axes validity
sadielbartholomew Oct 7, 2021
55fad7c
Update cf/data/data.py
sadielbartholomew Oct 22, 2021
b7a9021
Correct typo in data.data module ValueError message
sadielbartholomew Oct 22, 2021
c3164e7
Merge pull request #247 from sadielbartholomew/lama-to-dask-4
sadielbartholomew Oct 22, 2021
30f8a29
dev
davidhassell Jan 11, 2022
3b1b9ff
dev
davidhassell Jan 11, 2022
f489088
dev
davidhassell Jan 12, 2022
fc3e827
dev
davidhassell Jan 13, 2022
74de1dd
no idea, all sorts!
davidhassell Jan 14, 2022
6595ed4
tidy
davidhassell Jan 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .github/workflows/dask-migration-testing.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# A GitHub Action to run test_Data.py only for the 'lama-to-dask' branch
name: Test `cf.Data` during the replacement of LAMA with Dask

on:
push:
branches:
- lama-to-dask
pull_request:
types: [opened, reopened, ready_for_review]
branches:
- lama-to-dask

jobs:
test-suite-job-0:

# Set-up the build matrix. We run on different distros and Python versions.
strategy:
matrix:
# Skip older ubuntu-16.04 & macos-10.15 to save usage resource
os: [ubuntu-latest, macos-latest]
python-version: [3.7, 3.8, 3.9]

# Run on new and old(er) versions of the distros we support (Linux, Mac OS)
runs-on: ${{ matrix.os }}

# The sequence of tasks that will be executed as part of this job:
steps:

- name: Checkout cf-python
uses: actions/checkout@v2
with:
path: main

# Provide a notification message
- name: Notify about setup
run: echo Now setting up the environment for the cf-python test suite...

- name: Checkout the current cfdm master to use as the dependency
uses: actions/checkout@v2
with:
repository: NCAS-CMS/cfdm
path: cfdm

# Prepare to run the test-suite on different versions of Python 3:
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

# Setup conda, which is the simplest way to access all dependencies,
# especially as some are C-based so otherwise difficult to setup.
- name: Setup Miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
miniconda-version: 'latest'
activate-environment: cf-latest
python-version: ${{ matrix.python-version }}
channels: ncas, conda-forge

# Ensure shell is configured with conda activated:
- name: Check conda config
shell: bash -l {0}
run: |
conda info
conda list
conda config --show-sources
conda config --show
# Install cf-python dependencies, excluding cfdm, pre-testing
# We do so with conda which was setup in a previous step.
- name: Install dependencies
shell: bash -l {0}
run: |
conda install -c ncas -c conda-forge udunits2=2.2.25
conda install -c conda-forge mpich esmpy
conda install scipy matplotlib dask
pip install pycodestyle
# Install cfdm from master branch, then the cf-python development version
# We do so with conda which was setup in a previous step.
- name: Install development cfdm and cf-python
shell: bash -l {0}
run: |
cd ${{ github.workspace }}/cfdm
pip install -e .
cd ${{ github.workspace }}/main
pip install -e .
# Make UMRead library
- name: Make UMRead
shell: bash -l {0}
run: |
cd ${{ github.workspace }}/main/cf/umread_lib/c-lib
make
# Install the coverage library
# We do so with conda which was setup in a previous step.
- name: Install coverage
shell: bash -l {0}
run: |
conda install coverage
# Provide another notification message
- name: Notify about starting testing
run: echo Setup complete. Now starting to run the cf-python test suite...

# Finally run test_Data.py!
- name: Run the test_Data test module
shell: bash -l {0}
run: |
cd ${{ github.workspace }}/main/cf/test
python test_Data.py

# End with a message indicating the suite has completed its run
- name: Notify about a completed run
run: |
echo The test_Data module has run and you can inspect the results.
21 changes: 8 additions & 13 deletions cf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,11 @@
"""

__Conventions__ = "CF-1.8"
__date__ = "2021-06-10"
__version__ = "3.10.0"

_requires = (
"numpy",
"netCDF4",
"cftime",
"cfunits",
"cfdm",
"psutil",
)
__author__ = "David Hassell"
__date__ = "2021-??-??"
__version__ = "4.0.0"

_requires = ("numpy", "netCDF4", "cftime", "cfunits", "cfdm", "psutil")

x = ", ".join(_requires)
_error0 = f"cf v{ __version__} requires the modules {x}. "
Expand Down Expand Up @@ -193,8 +187,8 @@
)

# Check the version of cfdm
_minimum_vn = "1.8.9.0"
_maximum_vn = "1.8.10.0"
_minimum_vn = "1.9.0.1"
_maximum_vn = "1.9.1.0"
_cfdm_version = LooseVersion(cfdm.__version__)
if not LooseVersion(_minimum_vn) <= _cfdm_version < LooseVersion(_maximum_vn):
raise RuntimeError(
Expand Down Expand Up @@ -243,6 +237,7 @@
RaggedContiguousArray,
RaggedIndexedArray,
RaggedIndexedContiguousArray,
SubsampledArray,
)

from .aggregate import aggregate
Expand Down
2 changes: 1 addition & 1 deletion cf/aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2510,7 +2510,7 @@ def _get_hfl(
if d._pmsize == 1:
partition = d.partitions.matrix.item()
if not partition.part:
key = getattr(partition.subarray, "file_pointer", None)
key = getattr(partition.subarray, "file_address", None)
if key is not None:
hash_value = hfl_cache.hash.get(key, None)
create_hash = hash_value is None
Expand Down
4 changes: 1 addition & 3 deletions cf/bounds.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,7 @@ def contiguous(self, overlap=True, direction=None, period=None, verbose=1):
else:
if direction is None:
b = data[(0,) * ndim].array
direction = b.item(0,) < b.item(
1,
)
direction = b.item(0) < b.item(1)

if direction:
return (data[1:, 0] <= data[:-1, 1]).all()
Expand Down
10 changes: 1 addition & 9 deletions cf/cfdatetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,15 +385,7 @@ def st2elements(date_string):
if utc_offset:
raise ValueError("Can't specify a time offset from UTC")

return (
year,
month,
day,
hour,
minute,
second,
microsecond,
)
return (year, month, day, hour, minute, second, microsecond)


def rt2dt(array, units_in, units_out=None, dummy1=None):
Expand Down
53 changes: 13 additions & 40 deletions cf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,60 +537,37 @@
"orog": {
"surface_altitude": "altitude",
"surface_height_above_geopotential_datum": "height_above_geopotential_datum",
},
}
},
"atmosphere_sleve_coordinate": {
"ztop": {
"altitude_at_top_of_atmosphere_model": "altitude",
"height_above_geopotential_datum_at_top_of_atmosphere_model": "height_above_geopotential_datum",
},
},
"ocean_sigma_coordinate": {
"depth": _D1_depth_mapping,
},
"ocean_s_coordinate": {
"depth": _D1_depth_mapping,
},
"ocean_s_coordinate_g1": {
"depth": _D1_depth_mapping,
},
"ocean_s_coordinate_g2": {
"depth": _D1_depth_mapping,
},
"ocean_sigma_z_coordinate": {
"depth": _D1_depth_mapping,
},
"ocean_double_sigma_coordinate": {
"depth": _D1_depth_mapping,
},
}
},
"ocean_sigma_coordinate": {"depth": _D1_depth_mapping},
"ocean_s_coordinate": {"depth": _D1_depth_mapping},
"ocean_s_coordinate_g1": {"depth": _D1_depth_mapping},
"ocean_s_coordinate_g2": {"depth": _D1_depth_mapping},
"ocean_sigma_z_coordinate": {"depth": _D1_depth_mapping},
"ocean_double_sigma_coordinate": {"depth": _D1_depth_mapping},
}

# --------------------------------------------------------------------
# Define the canonical units of formula terms, as described in
# Appendix D: Parametric Vertical Coordinates of the CF conventions.
# --------------------------------------------------------------------
formula_terms_units = {
"atmosphere_ln_pressure_coordinate": {
"p0": "Pa",
"lev": "",
},
"atmosphere_sigma_coordinate": {
"sigma": "",
"ptop": "Pa",
"ps": "Pa",
},
"atmosphere_ln_pressure_coordinate": {"p0": "Pa", "lev": ""},
"atmosphere_sigma_coordinate": {"sigma": "", "ptop": "Pa", "ps": "Pa"},
"atmosphere_hybrid_sigma_pressure_coordinate": {
"p0": "Pa",
"ps": "Pa",
"ap": "Pa",
"a": "",
"b": "",
},
"atmosphere_hybrid_height_coordinate": {
"a": "m",
"b": "",
"orog": "m",
},
"atmosphere_hybrid_height_coordinate": {"a": "m", "b": "", "orog": "m"},
"atmosphere_sleve_coordinate": {
"ztop": "m",
"a": "",
Expand All @@ -599,11 +576,7 @@
"zsurf1": "m",
"zsurf2": "m",
},
"ocean_sigma_coordinate": {
"eta": "m",
"depth": "m",
"sigma": "",
},
"ocean_sigma_coordinate": {"eta": "m", "depth": "m", "sigma": ""},
"ocean_s_coordinate": {
"eta": "m",
"depth": "m",
Expand Down
9 changes: 1 addition & 8 deletions cf/constructlist.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,7 @@ def __docstring_method_exclusions__(self):
See `_docstring_method_exclusions` for details.

"""
return (
"append",
"extend",
"insert",
"pop",
"reverse",
"clear",
)
return ("append", "extend", "insert", "pop", "reverse", "clear")

# ----------------------------------------------------------------
# Overloaded list methods
Expand Down
25 changes: 25 additions & 0 deletions cf/data/QUESTIONS.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Questions and answers
=====================

A place to record random thoughts about the daskification of
`cf.Data`, possibly prior to starting an issue on GitHub.

----

Q. When we run something that executes all of the lazy operations
(like `cf.Data.is_masked`), should/could we replace the dask array
with a "persisted" version of the computed data? If we did this, we
would want to have the ability to cache persisted chunks to disk,
as they came into being on each thread (see, for instance,
`chest`). To do this or not do this could be controlled by a
configuation setting.

A. ?

----

Q.

A. ?

----
24 changes: 24 additions & 0 deletions cf/data/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
`cf.Data` developer notes
=========================

Hardness of the mask
--------------------

Any `cf.Data` method that changes the dask array should consider
whether or not the mask hardness needs resetting before
returning. This will be necessary if there is the possibility that the
operation being applied to the dask array could lose the "memory" on
its chunks of whether or not the mask is hard.

A common situation that causes a chunk to lose its memory of whether
or not the mask is hard is when a chunk could have contained a
unmasked `numpy` array prior to the operation, but the operation could
convert it to a masked `numpy` array. The new masked array will always
have the `numpy` default hardness (i.e. soft), which may be
incorrect.

The mask hardness is most easily reset with the
`cf.Data._reset_mask_hardness` method.

`cf.Data.__setitem__` and `cf.Data.where` are examples of methods that
need to reset the mask in this manner.
12 changes: 7 additions & 5 deletions cf/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
from .cachedarray import CachedArray
from .netcdfarray import NetCDFArray
from .filledarray import FilledArray
from .umarray import UMArray

from .filledarray import FilledArray

from .gatheredarray import GatheredArray
from .raggedcontiguousarray import RaggedContiguousArray
from .raggedindexedarray import RaggedIndexedArray
from .raggedindexedcontiguousarray import RaggedIndexedContiguousArray
from .subsampledarray import SubsampledArray

from .gatheredsubarray import GatheredSubarray
from .raggedcontiguoussubarray import RaggedContiguousSubarray
from .raggedindexedsubarray import RaggedIndexedSubarray
from .raggedindexedcontiguoussubarray import RaggedIndexedContiguousSubarray
# from .gatheredsubarray import GatheredSubarray
# from .raggedcontiguoussubarray import RaggedContiguousSubarray
# from .raggedindexedsubarray import RaggedIndexedSubarray
# from .raggedindexedcontiguoussubarray import RaggedIndexedContiguousSubarray

from .data import Data
1 change: 0 additions & 1 deletion cf/data/abstract/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
from .array import Array
from .compressedsubarray import CompressedSubarray
from .filearray import FileArray
Loading