Historically, Streams and Tables were distinct
Companies deploying connected products want to derive business value from the continuous streams of live telemetry that these products product. However streams are a very different type of data from the static tables that classic relational databases were designed for, and this can lead to some challenges.
Analytical tools - and the databases that power them - have historically fallen into these two distinct categories:
- Streaming tools include time-series visualisation such as Grafana on top of a time-series database (TSDB) such as Influx. They take homogenous streams of telemetry and do maths and statistics on them, e.g. taking the temperature from a hundred sensors and producing hourly averages.
- Relational tools include Business Intelligence (BI) solutions such as Tableau, Looker, Qlik on top of SQL databases. They generally take static data tables and “join” across them to produce a summary. For example, joining a list of sales transactions with a table of customer accounts to work out which region produced the most sales in the last quarter.
Each of these tools is very good at its own job … but not so good at the other’s job: Streaming tools are great at doing maths on columns but they struggle to aggregate across the streams (in SQL terminology they can’t do “joins”). And SQL databases struggle to cope with streaming data because it’s continuous and live and we want to ask time-based questions of it.
Combining streams and tables
Companies deploying connected products often find they need to combine both types of analysis, because they need to take large amounts of technical, streaming data but then produce small amounts of highly valuable business data that humans can understand - the so-called perishable insights on which every modern digital business thrives.
Solutions to combining streaming and relational analytics are starting to appear from a plethora of open source projects such as Apache Spark, Flink, Calcite, Kafka Streams and Beam as well as from commercial SaaS enterprises such as PipelineDB and now DevicePilot. Meanwhile the relational RDBMS folks, confident that 5-letter acronyms with no vowels will catch-on real soon now, have coined the equally-unwieldy term RDSMS (for Relational Data Stream Management System).
In each paper of this series we describe a real-world example requiring a combination of streaming and relational analytics to solve a business challenge, and outline how to create the analysis:
- Measuring building energy and occupancy to uncover waste
- Measuring device uptime across multiple sites to deliver service assurance
- Optimising the utilisation of mobile pallets
- Optimal stock replenishment for vending machines