Pilgrim
24th October 2016

Introduction

There are many challenges to address in the early stages of delivering a new connected product:

1) First you must design and build all the parts of your solution: the hardware, the software, the service and the application – and integrate them.

2) Then in early user trials you’ll find and address bugs in your technology, and flaws in your proposition. Both these initial two stages are all-absorbing. But today I’d like to convince you of the vital importance of thinking-ahead to the third and final stage which is key to the success of most IoT propositions:

3) Scaling!

The wonderful day finally arrives when your boss – or your customer – is finally convinced that your connected application does indeed work, and that users will indeed pay for it. Job done! It’s just a matter of scaling-up now – right? In mountain-climbing, sometimes you can’t see the next peak until you surmount the current one. Likewise, in connected devices it is often only at this point of apparently having achieved readiness-to-scale that the truth dawns on you that scaling-up brings a whole new set of challenges … which perhaps didn’t even feature in your original plan.

From IoT R&D to IoT Production

During the earlier stages, pretty-much everything could be done manually – indeed that’s the best solution. The adaptability of humans to cope with unforeseen problems is just what is needed in these early stages where you’re doing lots of learning, and close human supervision of early users not only coaxes them through teething problems, but also provides you with invaluable feedback to adjust the technology and proposition.

But manual doesn’t scale. You can’t hire 10x more people every year as your estate of deployed devices grows by 10x a year. That’s why scaling-up is fundamentally different from the previous challenges.

With only 10 or 100 early users, it is entirely feasible to:

  • Run your devices on a flaky set of servers which need constant care-and-feeding
  • Visit the customer yourself, to install, maintain and diagnose equipment
  • Deal with customer problems reactively, e.g. sending a new set of batteries or a repeater
  • Test new code by loading it onto devices and testing them manually
  • Log-in to individual devices one by one, to monitor and upgrade them

If any of this sounds familiar, then it’s time to gird your loins and consider how you’re going to make the transition from IoT R&D to IoT Production.

Five processes you’ll have to change, to scale

Above we listed five manual processes which won’t scale. Whilst it’s by no means an exhaustive list, it’s a good place to start, so let’s see how we can make these processes scalable.

1. Servers

Server scaling is an art in its own right. As well as architecting so that you can “scale-out” (Google it), you also need to look at your human processes, because you need to achieve much higher availability (reliability) as well as greater scale. You should avoid a 1990’s “Ops/IT” mindset. A modern CI/CD workflow with immutable servers and a “devops” mindset is probably the right way to go – servers are now cheap, disposable, commodity items. People-time is the precious resource, so sweat the machines not the people.

2. Site visits

If your kit needs on-site installation, and perhaps repair, then to scale you’ll need to out-source this to one or more third-parties. So you’ll need to build the processes to interact with those third-parties efficiently. On a site visit, an installer should not be able to claim they’ve installed or fixed it unless your central system confirms this, enforcing the correct process, and therefore quality.  And by remotely diagnosing problems before the repairer is called, you can ensure that they arrive with the right parts to fix the problem, perhaps before the customer even knows there is a problem.  Fantastic customer service, delivered cost-effectively.

3. Customer support

There is always a role for a person at the end of a chat process to support the customer and catch the long tail of infrequent, unanticipated problems. However, it’s essential that you don’t try to use support staff to catch the frequent, anticipated problems too – or they’ll drown in a sea of customer ill-will. An example of how to get this right, from my previous company AlertMe, is wireless repeaters. A certain percentage of customers (e.g. 1%) experience wireless connectivity problems in their homes – that’s radio for you, can’t pretend it won’t happen. Trying to deal with this reactively with customer support would have been a disaster. So we built a proactive system to automatically spot the problems and dispatch a wireless repeater, turning a bad customer experience into a delightful one (“wow, they just automatically fixed my problem, immediately”!), at lowest possible cost.

4. Testing

In an ideal modern workflow, every developer will be producing probably multiple user-visible features or fixes every day. Once you’ve got a bunch of developers, they produce a constant stream of new code which needs testing. Of course developers will write unit tests when they add features, and regression tests when they fix bugs, and in a CI/CD process these are automatically run by your build robot whenever new code is checked in, to try to preserve quality.

But the big problem is that everything interacts with everything else, compounded in IoT-land by interactions of your connected device with the infinitely complex real world in which it is being deployed, which will test your product more thoroughly in ways you can never imagine. So a vital part of your process is functional/integration testing of the whole product. Attempting to do this manually rapidly becomes impossible – it can take easily tens of person-days to exercise even a simple connected device through all its states (e.g. what happens if there’s a radio interruption in the middle of a code upgrade?), which really affects the periodicity and latency of your releases.

The key to solving this problem is to make it possible to instantiate your devices virtually. In other words, the software embedded in your products should also run in the cloud (it “stubs out” its hardware when it realises it’s not running on a real product). That means you can exercise all its features completely automatically, you can fast-forward time – and you can spin-up thousands or millions of virtual devices and throw them against your service to prove that it still works at scale – all automatically. This is a big effort – but, like most testing, it’s much cheaper in the end than the alternative of leaving your customers to test your product at scale.

5. Upgrading

Upgrading firmware is the totemic in-field device process, though by no means the only one. You are fooling yourself if you think you won’t have to do it. I’d be very happy to take a bet with you that your device will contain not only multiple functional bugs, but also security holes too – and you’ll also need to track evolving standards. So upgrading is a requirement, yet it’s fraught with peril.

You can’t upgrade if a battery is flat or a device is offline or in-use. And when you do start an upgrade, there are many ways it can fail, possibly “bricking” your devices. Through all of this you need to maintain a clear “wood for the trees” view of whether a particular upgrade is working (and indeed whether that new code fix is indeed better than the old). Therefore it’s vital to have a centralised process for triggering and monitoring upgrades, and probably for throttling the process so that you discover any systemic problems before you’ve touched too many devices.

Identify your systemic failure modes

A theme has emerged from the above. It is common during the trial stage to find that much of your kit isn’t working properly – between 10% and 50% not working at any given time is not at all unusual.  Some of this is because you have various generations of prototype hardware and software in the field. You will be tempted to believe that once all your customers are running the latest versions, everything will be fine.
You’d be wrong.
In our experience, once the above users have proper production hardware and software, you might get down to say 5%-10% unhappy customers at any moment in time. If you have one million customers, that’s 50,000 unhappy customers – screaming at you on the phone, abusing you publicly on Twitter, returning your product to the shop. Your business and your brand won’t survive that, and you can’t just hope it won’t happen.

So as well as all the tweaking and bug-fixing, the other thing you must be doing during trials is identifying all the predictable failure modes of your product – and deciding how to address each at scale.

Sometimes this can be dealt with purely in software (e.g. making devices retry connections, or fall back to a boot-loader if a firmware upgrade fails). But other problems require processes to resolve them, and it’s those processes that need to be made scalable.

What is scalability?

We’ve said that manual processes don’t scale. But some processes are inherently manual – for example, battery-replacement – so how can we make those scalable? The key is to recognise the parts of each process where you (the vendor company) are the bottleneck. Yes, battery replacement requires people to do it, but the number of those people (your users) will grow with the number of devices you ship. So it’s a scalable problem if any centralised part of the process is automated.

So if for example you have an automated process which spots batteries that are low, and sends reminder emails to users (and/or even dispatches new batteries to them from an automated warehouse), then you’ve addressed the problem in a scalable way. We used both these approaches for battery-replacement at my previous startup AlertMe which scaled to millions of devices.

Conclusion

There is much more to delivering a connected product than initially meets the eye. Connecting your product turns you into a service provider – for life – and it’s essential to think-ahead to the scaling implications of this.

Here I’ve considered how to address scaling successfully, with five specific examples. From this we also saw how a vital outcome of your trials must be identification of the systemic issues that your product will have in the field, and a plan to address them. That plan will likely involve automating any centralised processes.

About DevicePilot

DevicePilot is a cloud based software that allows you to easily locate, monitor and manage your connected devices automatically at scale, with proactive management throughout your entire device lifecycle. Additionally it can provide critical insights – through your IoT connected product you can better understand your customers and how the products are being used, while increasing loyalty and creating new revenue opportunities. DevicePilot can manage any device, on any platform, with easy to use RESTful APIs for rapid and easy integration with your existing systems.

The company draws on the significant experience of its founders who successfully scaled their previous connected-device businesses to 1 million+ end-customers in areas as diverse as mobile phones, IPTV set-top-boxes and the connected home.