Skip to content

Refactor Simple Convective Aggregation Index (SCAI)#42

Merged
leifdenby merged 12 commits intocloudsci:masterfrom
leifdenby:scai
Feb 28, 2022
Merged

Refactor Simple Convective Aggregation Index (SCAI)#42
leifdenby merged 12 commits intocloudsci:masterfrom
leifdenby:scai

Conversation

@leifdenby
Copy link
Collaborator

@leifdenby leifdenby commented Jan 19, 2022

  • refactor code to make cloudmetrics.objects.metrics.scai.scai(object_labels=...)
  • implement all tests for figure 2 in Tobin et al
    • 2a implemented but test not passing, incorrect value
    • 2b implemented but test not passing, incorrect value
    • 2c
    • 2d

NOTE: example from Figure 4a in White et all (COP) is different from example from Figure 2a - looks like the authors copied the example incorrectly.

Currently have two implementations that aren't consistent. One based on
cloudmetrics original code and one based on implementation in typhon
package. Neither give the values published in Tobin et al
…g a pixel size of 55. Added support for empty input files (returns nan), completed docstring, and removed prints. scai2 still needs fixing (can it be removed?).
@martinjanssens
Copy link
Collaborator

I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in scai.py, named scai2; does that function serve a purpose? And if not, should we remove it before merging this and just have a single scai function?

@martinjanssens
Copy link
Collaborator

On a semi-related note: There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!

@leifdenby
Copy link
Collaborator Author

leifdenby commented Jan 25, 2022

I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in scai.py, named scai2; does that function serve a purpose? And if not, should we remove it before merging this and just have a single scai function?

Thank you!
Yes, absolutely! I'll do that. Sorry for the mess. That as me trying to debug what as going wrong...

There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!

Yes, I've been thinking about that too. I couldn't work out from the definition of SCAI if it's possible for the pixel resolution to somehow be factored out (so that SCAI for different choices of resolution are simply related), but it doesn't seem to be (even though the number itself is non-dimensional). I like the way you've made the default be dx=1. I guess the main concern would be that if the value of dx and L are of very different magnitude then the range of SCAI values is reduced (and thereby reducing the resolution in the measurement with SCAI)?

People tend to just look at relative SCAI values no and not absolute values, right? Or rather: do you know how one should compare SCAI values for two masks with different resolution but the same data?

I.e. how would we get the following masks to have the same SCAI values? (at least my intuition says that the should have the same SCAI value, but maybe I'm wrong...)

00000
00110
10000

0000000000
0000000000
0000111100
0000111100
1100000000
1100000000

I'm going to add a test for this and see what it gives :D

@leifdenby
Copy link
Collaborator Author

@martinjanssens I've added a test for resolution-doubling and I'm confused because to the same SCAI value it seems to be opposite to what I'd intuitively expect. In the example where I've created a zoomed copy of the mask (so that each pixel in the zoomed example is half the size of that in the original) I appear to have to double the resolution, not halve it, to get the same SCAI value. 3d14538#diff-82f87aac44e8909eeb356deac9900ee3788adb946231ccaa4a46a53e8f14f8c3R41 I'm a I being an idiot?

@martinjanssens
Copy link
Collaborator

Great, I'm really happy SCAI has you confused, it has had me confused since I first looked at it! Thanks a bunch for adding this resolution sensitivity test.

I think what's going on with the resolution doubling is the following. If I understand the authors' thought process correctly, SCAI has two components, N/Nmax and D0/L, which are multiplied.

  • D0/L tries to measure the typical cluster size relative to a typical domain scale L. In your resolution doubling test, D0 stays the same for dx -> dx/2 and n -> 2n (with n representing field.shape[0] = field.shape[1]), as you would expect.
  • N/Nmax tries to measure the amount of separate cloud clusters in the scene (N) relative to the max number of possible clusters (Nmax). Nmax would be proportional to n*dx. When doubling the size of the field, Nmax will thus increase by a factor 4, independent of dx or L, and require D0 to increase by the same factor for SCAI to remain constant. I think this is what has us confused.

To me, it seems like the issue here is with the formulation of SCAI itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the same L, dx and n over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adapt SCAI in our formulation in such a way that it no longer relies on n, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?

@leifdenby
Copy link
Collaborator Author

To me, it seems like the issue here is with the formulation of SCAI itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the same L, dx and n over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adapt SCAI in our formulation in such a way that it no longer relies on n, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?

Thanks for this detailed explanation @martinjanssens !

My gut feel is that we should keep our implementation as close to what was published as possible, and add a note to indicate that SCAI isn't resolution independent. That's what I've done and I've also added the same note in the test. Could you have a look and let me know what you think? Thank you!

@leifdenby leifdenby mentioned this pull request Feb 1, 2022
4 tasks
Copy link
Collaborator

@martinjanssens martinjanssens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this way of solving it very much! I think leaving dx=1 as an optional input is a good solution too. I'm happy to have this merged - thanks for the critical thinking on this one :)

@leifdenby
Copy link
Collaborator Author

leifdenby commented Feb 28, 2022

TODO

  • add changelog, indicating that we have added a test for checking the changes of SCAI with resolution

@leifdenby leifdenby merged commit 1760a6f into cloudsci:master Feb 28, 2022
@martinjanssens
Copy link
Collaborator

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants