Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Clean and Init tasks, move linking jobs to run in the background #303

Merged
merged 4 commits into from
Apr 30, 2024

Conversation

ibanos90
Copy link
Collaborator

@ibanos90 ibanos90 commented Apr 29, 2024

Description

This PR refactors some of the tasks. The goal is to reduce the number of jobs to be submitted, hence, computing and waiting time for the each cycle. Clean tasks are removed and the content is moved to their associated application tasks (e.g., CleanHofX content is moved to HofX.csh). InitVariationals.csh and InitEnKF.csh scripts for the YAML and model stage preparation stage are also removed and their content is moved to PrepJEDI.csh (only one preparation task is submitted). For RTPP, Clean and Init tasks and both removed and merged with RTPP.csh. Here, the LinkExternalAnalysis and LinkWarmStartBackgrounds tasks are set to run in the background instead of Casper. These tasks use very little memory and wall-time, then running them in the background should be fine (but let's monitor that).

Issue closed

None

Tests completed

Tier 1:

  • 3dvar_OIE120km_WarmStart
  • 3denvar_OIE120km_IAU_WarmStart
  • 3dvar_OIE120km_ColdStart
  • 3dvar_O30kmIE60km_ColdStart
  • 3denvar_O30kmIE60km_WarmStart
  • eda_OIE120km_WarmStart
  • getkf_OIE120km_WarmStart
  • 4dhybrid_OIE120km_WarmStart
  • ForecastFromGFSAnalysesMPT

Tier 2:

  • 3dhybrid-allsky_O30kmIE60km_SpecifiedEnsemble_VarBC

associated task. Remove init task for RTPP
decrease the number of jobs to be submitted. Move ExternalAnalysis and
LinkWarmBackground to run in the background
@mos3r3n
Copy link
Collaborator

mos3r3n commented Apr 29, 2024

good job! @ibanos90

@ibanos90
Copy link
Collaborator Author

good job! @ibanos90

Thanks @mos3r3n!

Copy link
Collaborator

@junmeiban junmeiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's faster than before. Thank you so much @ibanos90 !

@junmeiban
Copy link
Collaborator

Now it works for scenarios/3dhybrid-allsky_O30kmIE60km_SpecifiedEnsemble_VarBC.yaml. Many thanks @ibanos90 !

@ibanos90
Copy link
Collaborator Author

Now it works for scenarios/3dhybrid-allsky_O30kmIE60km_SpecifiedEnsemble_VarBC.yaml. Many thanks @ibanos90 !

Thanks a lot for the thorough review @junmeiban!

@ibanos90 ibanos90 merged commit 7c02ee3 into develop Apr 30, 2024
@ibanos90 ibanos90 deleted the maintanance/removeCleansomeInit branch April 30, 2024 14:51
ibanos90 added a commit that referenced this pull request May 7, 2024
…ound (#303)

This PR refactors some of the tasks. The goal is to reduce the number of jobs to be submitted, hence, computing and waiting time for the each cycle. `Clean` tasks are removed and the content is moved to their associated application tasks (e.g., `CleanHofX` content is moved to `HofX.csh`). `InitVariationals.csh` and `InitEnKF.csh` scripts for the YAML and model stage preparation stage are also removed and their content is moved to `PrepJEDI.csh` (only one preparation task is submitted). For RTPP, `Clean` and `Init` tasks and both removed and merged with `RTPP.csh`. Here, the `LinkExternalAnalysis` and `LinkWarmStartBackgrounds` tasks are set to run in the background instead of Casper. These tasks use very little memory and wall-time, then running them in the background should be fine (but let's monitor that).

None

 - [ ] 3dvar_OIE120km_WarmStart
 - [ ] 3denvar_OIE120km_IAU_WarmStart
 - [x] 3dvar_OIE120km_ColdStart
 - [ ] 3dvar_O30kmIE60km_ColdStart
 - [x] 3denvar_O30kmIE60km_WarmStart
 - [x] eda_OIE120km_WarmStart
 - [x] getkf_OIE120km_WarmStart
 - [x] 4dhybrid_OIE120km_WarmStart
 - [ ] ForecastFromGFSAnalysesMPT

- [x] 3dhybrid-allsky_O30kmIE60km_SpecifiedEnsemble_VarBC
# ================================
if ("$retainObsFeedback" != True) then
set member = 1
while ( $member <= ${nMembers} )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks incorrect. I'm seeing Variational[1-5] fail intermittently, while it is trying to write output files.
I see this fail most often when running test/testinput/eda_OIE120km_WarmStart.yaml

When this is running for any given member, it will remove the output for all the other members, so output files for other members can be deleted while they are being written.

Instead of looping through all of the members in a loop it should only remove files for it's own member number, ArgMember

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jim-p-w, thanks for reporting this issue. When I tested it for this PR it was working correctly, but I will take a look.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the task will fail for the eda task in the case that the jobs end at different times, the files should be removed for each member independently without the loop. Thanks again for reporting this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR #310 for fixes to Variational.csh and EnKF.csh
See issue #309 for what needs to be done for EnsembleOfVariational.csh (I haven't yet gotten a working test that runs EnsembleOfVariational.csh)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants