Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies in schedules API #835

Open
3 tasks
fabianp opened this issue Feb 29, 2024 · 5 comments
Open
3 tasks

Inconsistencies in schedules API #835

fabianp opened this issue Feb 29, 2024 · 5 comments
Labels
good first issue Good for newcomers

Comments

@fabianp
Copy link
Member

fabianp commented Feb 29, 2024

  • For most schedules, the total number of steps is specified through the transition_steps parameter, but in some cases (e.g., optax.cosine_decay_schedule, optax.warmup_cosine_decay_schedule but confusingly not optax.cosine_onecycle_schedule) it's called decay_steps instead.
  • The name sgdr_schedule is not descriptive of what the schedule actually does.
  • Most warm-up learning rates like linear_onecycle_schedule and cosine_onecycle_schedule specify the length of the warm-up phrase using parameter pct_start , but warmup_cosine_decay_schedule instead specifies it through a parameter warmup_steps

In the documentation:
5. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section

@Abhinavcode13
Copy link
Contributor

I'm ready to pick this up! @fabianp

@fabianp
Copy link
Member Author

fabianp commented Jul 15, 2024

excellent @Abhinavcode13 ! which task would you like to pick up? I would suggest to pick one of the remaining 3 and start with that. Smaller PR are better for everyone :-)

@Abhinavcode13
Copy link
Contributor

Sure

@Abhinavcode13
Copy link
Contributor

  • For most schedules, the total number of steps is specified through the transition_steps parameter, but in some cases (e.g., optax.cosine_decay_schedule, optax.warmup_cosine_decay_schedule but confusingly not optax.cosine_onecycle_schedule) it's called decay_steps instead.
  • The name sgdr_schedule is not descriptive of what the schedule actually does.
  • Most warm-up learning rates like linear_onecycle_schedule and cosine_onecycle_schedule specify the length of the warm-up phrase using parameter pct_start , but warmup_cosine_decay_schedule instead specifies it through a parameter warmup_steps

In the documentation: 5. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section

FYI: I would look up the second one first.

@fabianp fabianp added the good first issue Good for newcomers label Sep 25, 2024
bhargavyagnik added a commit to bhargavyagnik/optax that referenced this issue Sep 28, 2024
@bhargavyagnik
Copy link
Contributor

bhargavyagnik commented Sep 28, 2024

I can take this up, If @Abhinavcode13 is busy. I have some questions:

  • We need to ensure backwards compatibility too, right? I mean after changing transition_steps for cosine_decay and other functions, we need to keep decay_steps too right? Or raise an error/warning to update the change?
  • While reviewing the codebase, I observed that for warmup_exponential_decay_schedule, the exponential decay has transition_steps = transition_steps and while warmup_cosine_decay_schedule has transition_steps = transition_steps - warmup_steps. Just wanted to ensure that transition steps passed after warmup should be as current or is there something wrong in any one?

The warmup_exponential_decay_schedule has a structure like :

 schedules = [
      linear_schedule(
          init_value=init_value,
          end_value=peak_value,
          transition_steps=warmup_steps,
      ),
      exponential_decay(
          init_value=peak_value,
          transition_steps=transition_steps,
          decay_rate=decay_rate,
          transition_begin=transition_begin,
          staircase=staircase,
          end_value=end_value,
      )

While warmup_cosine_decay_schedule would go from

  schedules = [
      linear_schedule(
          init_value=init_value,
          end_value=peak_value,
          transition_steps=warmup_steps,
      ),
      cosine_decay_schedule(
          init_value=peak_value,
          transition_steps=transition_steps - warmup_steps,
          alpha=alpha,
          exponent=exponent,
      )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants