-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus: unexport unavailable metrics #125492
base: dev
Are you sure you want to change the base?
Conversation
Hey there @knyar, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you.
Any thoughts on whether this should eventually be the default? Might be something best to decide & announce now, and change the default value from true
to false
a few releases down the line.
Yes, I think it would be good for this to become the new default. How would we announce this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be good for this to become the new default. How would we announce this change?
I've looked at the developer site but have not been able to find a recommended process for changing the default values of configuration variables.
My suggestion would be to leave a comment mentioning future change of default in the code and in the docs (as part of home-assistant/home-assistant.io#34632). We'll remove the comment in the same PR that will flip the default value, and will mark it as a breaking change.
Thanks, I've updated both PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Hopefully one of the Home Assistant maintainers will be able to review & merge this soon.
95b5180
to
75dc1c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think there's a good argument to be made that an interim config option isn't necessary. Anyone making queries against entities which could become unavailable has to handle both cases right now, in case the entity is ever unavailable at start, so I think that not emitting metrics for unavailable entities is the correct behaviour -- put another way, I can't think of a case where you'd want to emit a NaN.
That said, giving an opportunity for people to test this in "production" for a while with an easy path to revert is a good thing. I'm just not sure it outweighs the drawbacks :)
Thanks! Yes, let me take a look at removing the option entirely. That would definitely simplify things. |
I've updated both PRs. |
I don't have any objections to just doing this with no transition period, but maybe it reaches the level of a "Breaking change" now? I can imagine use cases for which the current behavior is useful - for example, if you have a sensor or another device that is only intermittently available, having its last-known state reported to Prometheus might be more helpful than having metrics regularly disappear and reappear. In the new world, one would need to apply one of the |
I'm going to move this to draft because I want to think a little more about a couple things. |
The problem with the current PR is that going unavailable will unexport ALL the metrics, but I think we want to keep most around, especially I remembered I have a dashboard that shows which entities are unavailable, using |
Sorry, I did not intend to create churn with my comment 🙈. I'm happy to see this proceed in either direction. @knyar my point was that someone writing a "proper" query against an intermittent metric would have to handle missing metrics anyways, as those might occur during restart. I absolutely might be missing something here, though. |
No problem! I just realized that the original PR had some unintended effects, based on a few misunderstandings I had. |
Ok, I think it's good now. The |
When sensors go offline, this component would continue to report its last value, until Home Assistant itself restarts, or the sensor returns. The `entity_available` metric can be used to filter out unavailable metrics, but this is slow with current versions of prometheus (see prometheus/prometheus#9577). Now, the component will automatically withdraw metrics when the entity becomes unavailable, which matches the behavior on restart and makes it easier to see missing metrics without using an `unless`.
This introduces a new option to the prometheus integration to automatically unexport metrics for unavailable entities.
Proposed change
When an entity becomes unavailable, this component will continue to report the entity's last value, until Home Assistant itself restarts, or the entity returns. These stale metrics can be hard to notice, especially when the particular metric rarely changes (or changes slowly).
The
entity_available
metric is provided to let queries filter out unavailable metrics, but this is slow with current versions of prometheus (see prometheus/prometheus#9577). And regardless of performance issues, includingentity_available
increases the complexity of promql expressions and is easy to forget.Now this component will automatically withdraw metrics when the entity becomes unavailable, which matches the behavior on restart and makes it easier to see missing metrics without using an
unless
.Type of change
Additional information
Checklist
ruff format homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
.To help with the load of incoming pull requests: