Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data resolution and completeness Check #5

Open
Yassminaa opened this issue Dec 5, 2022 · 6 comments
Open

Data resolution and completeness Check #5

Yassminaa opened this issue Dec 5, 2022 · 6 comments

Comments

@Yassminaa
Copy link

Hi @malmans2

I have used the tool on ERA5 (single levels and vertical levels). It probably worked on the complete file of hourly file, but when one timestep in between is missing,

  • The check failed in the temporal resolution, not the completeness check (see the attached log file log.test-completeness.txt )

  • For monthly data. I got the following error regarding the temporal resolution: ("ValueError: invalid unit abbreviation: month")
    where I set the temporal resolution as 1 month in the configuration file.

** For Vertical coverage: In the generated template, I couldn't see any check for the completeness of the vertical coverage.
when some levels are missed from the file, the test passes.
I assume I should define somewhere the min/max level for the coverage check, or levels values, but I couldn't find this in the template. May you please guide me on how I can do this, if the check is included in the tool?

Thank you,

@malmans2
Copy link
Member

malmans2 commented Dec 5, 2022

Hi @Yassminaa:

The check failed in the temporal resolution, not the completeness check (see the attached log file log.test-completeness.txt )

This looks correct to me. If one time step is missing the temporal check fails. Completeness only checks for NaNs (i.e., if there are NaNs anywhere or in unmasked regions when a mask is provided).

For monthly data. I got the following error regarding the temporal resolution: ("ValueError: invalid unit abbreviation: month")
where I set the temporal resolution as 1 month in the configuration file.

Good catch, thanks! The checker currently only works with regular time deltas (e.g., 30 days). I'll make a couple of changes to allow you to specify monthly, yearly, ...

** For Vertical coverage: In the generated template, I couldn't see any check for the completeness of the vertical coverage.
when some levels are missed from the file, the test passes.
I assume I should define somewhere the min/max level for the coverage check, or levels values, but I couldn't find this in the template. May you please guide me on how I can do this, if the check is included in the tool?

The vertical check is done through cdo -zaxisdes. In the [vertical_resolution] section you can place any of the attributes that you expect from cdo -zaxisdes. You can run cdo -zaxisdes from terminal to see all the attributes inferred by cdo. For example I get this from a single level era5 file:

cdo -zaxisdes test_2012-12-02.grib
#
# zaxisID 1
#
zaxistype = surface
size      = 1
name      = sfc
longname  = "surface"
levels    = 0
cdo    zaxisdes: Processed 1 variable [0.00s 16MB]

The test passes if I set in the configfile any subset of the following:

[vertical_resolution]
zaxistype = "surface"
size = 1
name = "sfc"
longname  = "surface"
levels = 0

From terminal, run cdo -zaxisdes with your multi-level file and update the config file according to your needs.

@malmans2 malmans2 mentioned this issue Dec 5, 2022
@malmans2
Copy link
Member

malmans2 commented Dec 5, 2022

@Yassminaa the issue with monthly data should be fixed now. Could you please test the latest version and let me know?
Note that the config file slightly changed (resolution is now named frequency):

[temporal_resolution]
# Check temporal resolution.
#
# See pandas frequency aliases:
# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
#
# Arguments:
#   * min: first time (optional)
#   * max: last time (optional)
#   * frequency: time frequency (optional)
#   * name: name of time dimension (optional, default: "time")
#
# Example:
min = 1900-01-01
max = 1900-01-02
frequency = "1D"
name = "time"

@Yassminaa
Copy link
Author

@malmans2

I have tested the updated code,

  • For the monthly data, the check gives the following error if I set the frequency = "1M" in the configuration file:
    " ERROR temporal_resolution
    ERROR frequency: {'2678400000000000 nanoseconds'}
    INFO Checking variable_attributes "
    But if I set the frequency = "31D" for the data file from Jan to Feb, it passes. The problem in this set is that the number of days per month varies over the year (31,30,28,) So won't be easy to proceed through it.

  • For the vertical resolution, I followed your explanation, and it worked well.

@malmans2
Copy link
Member

malmans2 commented Dec 6, 2022

Good news!
For the monthly data, I think you are just using the wrong alias. Check: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
For example, if you have Jan and Feb 1st, you need to use "1MS" .
Let me know how it goes...

@Yassminaa
Copy link
Author

Yes. Now passed. Thanks @malmans2

Would also be great if you put an example in the configuration file for the alias of time-frequency

@malmans2
Copy link
Member

malmans2 commented Dec 6, 2022

The link to pandas aliases was already there, but I also added an example for monthly data as they are quite common and the alias is somewhat tricky:

[temporal_resolution]
# Check temporal resolution.
#
# See pandas frequency aliases:
# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
#
# Arguments:
#   * min: first time (optional)
#   * max: last time (optional)
#   * frequency: time frequency (optional)
#   * name: name of time dimension (optional, default: "time")
#
# Example:
min = 1900-01-01
max = 1900-02-01
frequency = "1MS"
name = "time"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants