Skip to content

Refactoring wavelet organisation indices (woi)#32

Merged
martinjanssens merged 7 commits intocloudsci:masterfrom
martinjanssens:refactor-woi
Sep 14, 2021
Merged

Refactoring wavelet organisation indices (woi)#32
martinjanssens merged 7 commits intocloudsci:masterfrom
martinjanssens:refactor-woi

Conversation

@martinjanssens
Copy link
Collaborator

Begun refactoring this by using a single function that computes both the wavelet transformation and the metrics. We could make this more similar to the Fourier metrics or to the object metrics, as mentioned in #28 (comment), and we should probably make a decision on that before I continue with this.

@martinjanssens martinjanssens changed the title Started refactoring woi Refactoring wavelet organisation indices (woi) Sep 7, 2021
Copy link
Collaborator

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting work on this!

"""
Computes the three Wavelet Organisation Indices WOI1, WOI2, WOI3 proposed by
Brune et al. (2018) from the stationary/undecimated Direct Wavelet Transform
of a scalar field.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include the DOI or URL to the paper (https://doi.org/10.1002/qj.3409)?

# Compute wavelet coefficients
scale_max = int(np.log(cloud_scalar.shape[0]) / np.log(2))
coeffs = pywt.swt2(cloud_scalar, wavelet, scale_max, norm=True, trim_approx=True)
# Bug in pywt -> trim_approx=False does opposite of its intention
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you log this on the pywt repo? Sounds like they might want to know that :)

separation_scale : int, optional
Which power of 2 to use as a cutoff scale that separates 'small' scales
from 'large' scales. The default is 5; i.e. energy contained in scales
larger than 2^5=32 pixles is considered 'large-scale energy'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pixels

@martinjanssens
Copy link
Collaborator Author

Nice job finding that R-package, I don't think that was around when I wrote this script the first time! Consequently, my implementation is just an attempt to match what I could get from the original paper's text. At a glance, the two approaches have some methodological differences, though I don't think they're major. The R-package does handle non-periodic BCs very nicely by mirroring the fields, and chooses to zoom to an appropriate level (the field needs to be of a shape that is a power of 2), while we're padding periodically to get there. I think I'd like to include these two aspects, at least, so I'll try to rewrite accordingly. Also, do you have an opinion on whether we want this to be a single function that returns three indices or three functions returning one index each?

@leifdenby
Copy link
Collaborator

At a glance, the two approaches have some methodological differences, though I don't think they're major. The R-package does handle non-periodic BCs very nicely by mirroring the fields, and chooses to zoom to an appropriate level (the field needs to be of a shape that is a power of 2), while we're padding periodically to get there. I think I'd like to include these two aspects, at least, so I'll try to rewrite accordingly.

That sounds great!

Also, do you have an opinion on whether we want this to be a single function that returns three indices or three functions returning one index each?

I'm about unsure what to do here. It's quite nice to only have one number returned by default since it makes analysis further down the pipeline simpler. I would probably like to have a dataset with woi1, woi2, etc in it together with other variables I was studying. I can see two ways of dealing with this:

  1. The convention in numpy seems to be return single numbers by default and then more information can be requested (for example the full argument in https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html). Are one of the three woi coefficients generally more interesting/meaningful? If so we could return one of them and have parameter called coefficient with call options like .woi(cloud_mask, coefficient=1), .woi(cloud_mask, coefficient=2) and .woi(cloud_mask, coefficient='all').

  2. Another option would be to have functions called woi1(...), woi2(...), woi3(...) following the pattern of object geometry properties the calculations could be cached.

What do you think?

@martinjanssens
Copy link
Collaborator Author

I'd be curious to hear what you think of this when you have time, it's basically trying to do your second option (I need to improve the tests still). :)

_CACHED_VALUES = dict()


def _get_swt(cloud_scalar, pad_method, wavelet, separation_scale):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) We can put this kind of thing into a decorator at some point. I would use functools.lru_cache but that doesn't work with just numpy arrays because they can't always be serialized.

Copy link
Collaborator

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply awesome, nice work! I think we should get this merged in. Remember to add an entry to the changelog

@martinjanssens martinjanssens merged commit c924528 into cloudsci:master Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants