Or, "How to stay calm through a storm-like drizzle"
I'm currently expanding our monitoring and notification systems. For a long time now, you've been able to set up DevicePilot to automatically monitor your devices' data stream and trigger a notification - in the form of emails, Slack messages, Zendesk tickets and others - for abnormal situations.
Say, your device has stopped sending data back to the cloud or its internal temperature has gone above a certain threshold. Those are things you probably want to be made aware of.
Or... maybe not? This simple approach works when you have hundreds of devices. But now, say, you have thousands of devices per operations team staff member deployed all over the world. How do you make sure you don't lose your Zen (🙏) whilst keeping your service delivery standards? Let's enumerate a few down-to-earth guiding principles.
First line support is to be done by machines
If a problem can be solved automatically by simply notifying another computer, we will build the integration for you (in case we don't support it yet!).
Never notify a human if nothing can be done...
Alarm fatigue will decrease your reactivity to actual problems. DevicePilot silences those 2am notifications for sites that aren't open at night. Didn't you know that you can now define business hours?
...And only notify the people that matter
No need to involve anyone who can't solve the problem, right? Filters have always been a strong concept in our application. You can use them to narrow down the scope of devices and alert only the person in charge for that smaller set.
If you want, don't notify unless it's really bad
You can set notifications to only be triggered when your service level has fallen below the agreed acceptable level. If a site has four devices and one is broken, does that really warrant a first class callout? Well, it depends - if the other three devices are currently in use then maybe yes.
The emerging trend is that no connected device is an island. It's often part of a bigger deployment of similar co-located devices. As such, monitoring the data stream of individual devices in isolation breaks with scale.
We've got you.
What the world needs is for DevicePilot to extend their advanced cohort analysis tool with an automated wing. (Don't worry, reader - I'm on it.) We will run cohort for you at an appropriate cadence and send out a notification when your agreed service level is in danger of being breached.
Not when a single petty device has decided to stop working in the middle of the night in the desolate countryside. Stay tuned.