Conditional check activated or not, depening on "master" being up or down.
R
R the Company
We have some situations whereby there is no need to check services B, C and D if service A is down (as they are dependent on A being reachable to start with). This gives quite a few notifications that are not really needed, and it would be good to prevent them for clarity's sake even.
Just suppose you ping a web server and 5 sites on that server. If the ping fails, checking the sites is no use, as they will not be available by default.
I hope I explained that correctly.
An old check solution we used had that, and we have been missing that ever since. Would love to get this.
Log In
Adrien Rey-Jarthon
R the Company Well it's not as simple as you may think unfortunately :) do you want the "children" check to be considered up or down during parent check downtime ? shall it show up in their uptime metrics and downtime history ? and do all other users who want this feature wants the same behavior ? If I just "do not do the check" then all children services will be considered 100% up which would be pretty decieving especially if shown on status pages. (or down if you're unlucky and the children check failed first), this behavior would be erratic and inconsistent.
Muting only the alerts would be simpler and more dependable indeed, so in that case the checks would still run and generate downtime on all children checks but no alerts will be sent for them.
R
R the Company
Adrien Rey-Jarthon Ok, never mind.
I know that our old solution used to work like that without a hitch (it just wouldn't do the child checks if the master was down and they would show as unverified, and restart the child checks once the master was online again) and it saved many notifications and stress.
But I'll retract the request, if it's difficult.
Adrien Rey-Jarthon
R the Company for sure it can, but depending on what number they present and what status pages they offer the impact may be more or less important. I wouldn't feed comfortable showing a 100% uptime status page for a site which was down a couple hours for example. The devil is in the detail here and how to handle the "unverified" time as good as possible. That's why it's not a "simple" suggestion as there's no single answer on the "best" way to do it, it's various options and choices each with their drawbacks.
No need to retract the request, it's here to gather interest and refine the idea. If enough people want this and are OK with accepting the same drawbacks, it may be implemented some day.
Would the option of just muting alerts instead of disabling monitoring entirely work for you for example ? (so still sending pings and recording downtimes for all sites, but not sending alerts).
W
Wynd Labs
Adrien Rey-Jarthon for my janky 2010s in-house monitoring system that had this feature, it only suppresed the alert.
So a "child" check would not alert if its "parent" check was already alerting.
e.g. I know the web sites will be down if the core routers aren't pinging.
The interface is hard though. Also not making it a footgun is hard.
U
Unifex
Good idea !
Adrien Rey-Jarthon
Thanks for this suggestion, I understand the need but the complexity of this feature and the low number of people who will understand and use it (properly) may not be enough to be worth implementing.
Especially as there's a lot of edge cases to handle which may not suit every use-cases (e.g. what downtime to consider for the "paused" checks, what status/error to show on the status page, what if the first check recovers but not all the children, what if the first check if flapping but the children are always up, or always down, etc..)
I'll keep the suggestion open to see if more people are interested but I believe the best course of action would be for you to implement this on your end using the alert webhooks to pause or mute the monitoring of other checks: https://updown.io/doc/how-to-automate-check-disable
R
R the Company
Adrien Rey-Jarthon Hey :-)
I understand it possibly sounded more difficult than I meant it.
All it means is that a check can be depedent on another check being up, in that case do the check, or down, and in that case do not do the check.
So, it really is a pretty much black and white scenario and not that complex, I would think. Servers Alive does/did this and this saved many unnecessary alerts ;-)
Reach out to me if you want to discuss but I hope this cleared up how simple it really should be ;-) and I'll start rallying support ;-)
U
Unifex
Adrien Rey-Jarthon I would think its only about the alerts, no about the checks, so you would only mute alerts depending on a master alert.
R
R the Company
UnifexCould be the case too, but I think not even doing the alert may save lots of traffic too ;-) But either would work, I guess :-)