Service Assurance for a device estate using streams and tables

By Pilgrim - October 22, 2019

The challenge

Perfect Printing sells industrial printers which it deploys onto several customer sites across the UK. The printers are sold “as a service” by the day, which means that Perfect is responsible for keeping them working.

Any customer with at least two printers on-site is defined as a VIP customer (so Oxford below doesn’t qualify). VIP customers receive a signed Service Level Agreement (SLA) promising that:

 80% of our printers on your site will have an uptime of >90% on any given day

Each printer sends a status report to the Cloud every 5 minutes.

printers

The Operations Manager at Perfect wants to see a simple table showing the number of days over the past week that each VIP site has not met its SLA - information which can allow action in time to take remedial action to avoid breaching the monthly metric on which her salary depends!

The structure we’ll need is:

RELATIONAL: Count how many VIP sites have how many days with <80% good printers

RELATIONAL: Identify good printers as those with >90% uptime

STREAMING: Find the up-time ← we’ll look at this first

Finding up-time

First, let’s do a streaming query to find the up-time of each printer. We define “up” for a printer as having heard from it in the last 15 minutes (900 seconds), without any faults reported so our metric of uptime is:

      DURATION(`$ts > ago(900) && fault == FALSE`) / DURATION()

If we (GROUP BY 1d, site, ID) we get:

$ts
site
$id
uptime
0d
Edinburgh
Printer 1
1.00
0d
Edinburgh
Printer 2
1.00
0d
Edinburgh
Printer 3
1.00
0d
Oxford
Printer 4
1.00
...
     
1d
Edinburgh
Printer 1
0.74
1d
Edinburgh
Printer 2
1.00
1d
Edinburgh
Printer 3
1.00
1d
Oxford
Printer 4
1.00
...
     

i.e. a table with a row for every unique combination of day/site/printer, reporting the uptime of that printer on that day.

Thresholding, counting

We then take the output of this streaming query and apply a couple of SQL queries to it.

  1. First relational query:
    1. Threshold the uptime into good/bad
      1. Anything greater than 90% is up
      2. Anything lesser is down
    2. GROUP BY timestamp ts (i.e. by day) and by site, then COUNT the number of rows on each day and site, to get the number of “good printers” on that site.
  2. Second relational query:
    1. Select only sites with at least two printers
    2. Identify when the number of good printers on the site is less than 80% - we call that a “bad day” for that site.

Voila, here’s the table the COO wanted. A live “list of shame” for the operations team’s dashboard as they work towards achieving their goal of “zero bad days on all sites”:

  bad_days_in_past_week
Edinburgh 1
Cambridge 0
London 3

Conclusion

Combining streaming queries with relational queries makes it easy to track and manage up-time of a connected device estate. Next, read about using it to optimise utilisation of mobile pallets across a supply chain.

Request access to private alpha

 

Comments

See how DevicePilot can make the difference

 

Industry leaders trust DevicePilot to help them improve the quality of the service they deliver at scale.

  • Eliminate revenue loss
  • Deliver a better service with the same human resource
  • Focus on growth and not firefighting
  • Get customer satisfaction through the roof

Book your personalised demo now and discover how DevicePilot can help you scale your connected business

Erik in a circle-1

Erik Fairbairn, CEO at POD Point:
Achieved 99% uptime across device estate

"We're totally data driven at POD Point, and if we can answer a question using data then we think that’s the best way - there’s no guesswork and you can use the facts.

Our DevicePilot dashboards have really let us get that actionable insight out of our devices."