Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff #146

Open
jacobbednarz opened this issue May 29, 2017 · 2 comments
Open

Exponential backoff #146

jacobbednarz opened this issue May 29, 2017 · 2 comments

Comments

@jacobbednarz
Copy link

Something I've been looking into lately is how we can combat the stampeding herd effect we occasionally incur once a system has recovered and it is able to receive traffic again. One approach I've explored is using expotential backoff and I was looking to find out if this is something you'd consider adding to semian? I think semian is a sensible place to put this because it already has knowledge of the tickets/quotas, error rates and could use it's already available data to make decisions on how much to push out the backoff by without needing to query another resource.

Also open to hearing about how you've addressed this at Shopify if you've got a good handle on it in other ways 😄

@sirupsen
Copy link
Contributor

Do you mean for the circuit breakers? I would assume that increasing the size of the window (timeout) would cause this to decrease since there should be some randomness involved in when the windows would open. If it's a heavily queried resource, perhaps adding some randomness for jitter to the window could work?

By exponential backoff, are you referring to the size of the circuit breaker window, or something different? Is this a problem for your datastore due to the sudden throughput (would it ever be larger than steady state?) or the connections established per second? Which datastore are you running into trouble with?

BTW Jacob did you roll out the new Semian with quota for bulkheads? We're used it in production now for weeks, and it's 👌

cc @jpittis

@jacobbednarz jacobbednarz changed the title Expotential backoff Exponential backoff May 30, 2017
@jacobbednarz
Copy link
Author

Yep, for the circuit breakers. I haven't tried adding any jitter to the window but could definitely trial some ideas on that instead. The issue we have is that when we bring MySQL back online there are bunch of services will be waiting on it and with cold caches, it is a big sluggish to respond. We've got a few things in the pipeline to mitigate it but I'm sure we'll hit it eventually with another datastore. I don't think it's getting overloaded with connections, just that everything will be hitting cold cache and it needs rebuilding.

No quotas yet but it's on my list to look at in the coming weeks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants