How to build a universal smart IoT gateway by reducing the data sampling rate

How to build a universal smart IoT gateway by reducing the data sampling rate

【51CTO.com Quick Translation】Although there are many ways to build an IoT data deployment architecture for you, what works for one enterprise may not work for another. Although there are many components to choose from depending on the size and complexity of the IoT project, they often form a similar architecture, namely: deploying collectors or IoT gateway devices for each sensor, collecting data from multiple sensor nodes, and then forwarding it to the enterprise's upstream data collection.

These gateway or collector devices typically use ZWave devices to connect to the Internet for data uploads, or bridge various Bluetooth devices to WiFi and use other network connections.

[[216508]]

However, most of these gateway or collector devices tend to be "dumb" gateway types. They do nothing but forward data to the upstream collector. So can we turn the IoT gateway into a smart device? Allowing you to perform local analysis and data processing on the collector device before sending the data. If we can achieve this, it will definitely be very useful!

Building a Gateway

Before I decided to build (another) IoT smart gateway device, I had already (sort of) built an ARTIK-520 device running InfluxDB. However, this ARTIK-520 is not the best, and when building IoT devices, people often pursue the principle of the cheaper the better. Although this is not always the case, when you build more and more gateways, you need to consider the cost factor.

I dug out the Pine-64 I bought a few years ago and started my own experiment. You must be asking: why Pine-64 instead of Raspberry Pi? Because Pine-64 is only half the cost, it only costs $15 instead of $35 for Raspberry Pi, it's that simple.

And my Pine-64 has the same configuration of ARM A53 quad-core 1.2GHz processor and 2GB of RAM. Compared with the 1GB of RAM in the Raspberry Pi, I will get a more powerful GPU in various uses. It also has built-in WiFi, but no dongle. I chose a ZWave board so that I can communicate with sub-GHz IoT devices.

One benefit of using a device like this as an IoT gateway is that you are only limited by the size of the microSD card you use. For example, I only used a 16GB SD card, but the Pine-64 can support cards up to 256GB.

How can I get TICK up and running on Pine-64? I recommend you use the Xenial image to get Pine-64 up and running. Since it is the "official" Ubuntu version for Pine-64, it works great with InfluxDB. Don't forget to run the following command:

  1. apt-get upgrade

Once it's up and running, you'll want to make sure all components are updated.

Next, you need to load Influx's various repositories into apt-get:

  1. curl -sL https://repos.influxdata.com/influxdb.key | apt-key add -
  2. source /etc/lsb-release
  3. echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | tee -a /etc/apt/sources.list

You may need to run them with sudo, but I cleverly used "sudo bash" here to get it started and everything set up.

Next, you need to add a “required” package to access the InfluxData repository:

  1. apt-get install apt-transport-https

Then:

  1. apt-get install influxdb chronograf telegraf kapacitor

Now we are ready to move on to the next step!

Load testing equipment

My original idea was just to see how such a small device would handle the added load, so I downloaded “influx-stress” from GitHub (https://github.com/influxdata/influx-stress) and ran it on the device.

  1. Using batch size of 10000 line(s)
  2. Spreading writes across 100000 series
  3. Throttling output to ~200000 points/sec
  4. Using 20 concurrent writer(s)
  5. Running until ~18446744073709551615 points sent or until ~2562047h47m16.854775807s has elapsed

Wow, that's 200,000 points per second! It turns out it really puts some serious stress on the Pine-64!

As you can see, it quickly approaches the end of its 2GB memory usage, and the CPU usage is 100%. Of course, in real life, such a load is almost impossible for a gateway device, which generally only collects data from dozens to hundreds of sensors.

Local analysis

As you can see from the dashboard above, I was able to easily run analysis locally on the Pine-64. Also, it has an onboard HDMI port and a full GPU, which makes it pretty easy to access the dashboard locally and monitor in real time. As I mentioned above, if the device could handle more work, it would be more useful.

Ideally, you might want to collect all the data on a gateway device and implement various local analysis, alarm and other functions. However, in the real world, this is not what a gateway/collector should have. We should "move" various processing jobs out, that is, forward data upstream.

Reduce the sampling rate of IoT data

It would be easy to simply use a gateway device to forward all data upstream. But if you need to deal with network connectivity issues, or want to save money and bandwidth, you will want to downsample the data before forwarding it. Fortunately, most practical IoT devices are capable of performing various local analysis, local alarm processing, and data sampling before forwarding it upstream. And it is not difficult to implement!

First, let's build our own gateway device that can forward data upstream to another instance of InfluxDB. While there are several ways to do this, since we will be downsampling the data through Kapacitor, we will do it directly using the kapacitor.conf file. In the kapacitor.conf file, there is already an [[influxdb]] entry with a section with "localhost", so we just need to add a new [[influxdb]] section to serve the upstream instance. It should look like this:

  1. [[influxdb]]
  2. enabled = true  
  3. name = "mycluster"  
  4. default = false  
  5. urls = ["http://192.168.1.121:8086"]
  6. username = ""  
  7. password = ""  
  8. ssl-ca = ""  
  9. ssl-cert = ""  
  10. ssl-key = ""  
  11. insecure-skip-verify = false  
  12. timeout = "0s"  
  13. disable-subscriptions = false  
  14. subscription-protocol = "http"  
  15. subscription-mode = "cluster"  
  16. kapacitor-hostname = ""  
  17. http-port = 0  
  18. udp-bind = ""  
  19. udp-buffer = 1000  
  20. udp-read-buffer = 0  
  21. startup-timeout = "5m0s"  
  22. subscriptions-sync-interval = "1m0s"  
  23. [influxdb.excluded-subscriptions]
  24. _kapacitor = ["autogen"]

This only solves part of the problem. Now we need to actually sample the data and send it. In the above, I used Chronograf v1.3.10, which has a built-in TICKscript editor, so I clicked on the "Alerting" tab in Chronograf and created a new TICK script, and then selected the telegraf.autoget database as my data source:

Since I don’t actually collect sensor data from this device, I’m using CPU usage as the data here and using my own TICKScript to downsample. Below I wrote a very basic TICKScript to downsample the CPU data and forward it upstream:

  1. stream
  2. |from()
  3. .database('telegraf')
  4. .measurement('cpu')
  5. .groupBy(*)
  6. |where(lambda: isPresent("usage_system"))
  7. |window()
  8. .period(1m)
  9. .every(1m)
  10. .align()
  11. |mean('usage_system')
  12. .as('mean_usage_system')
  13. |influxDBOut()
  14. .cluster('mycluster')
  15. .create()
  16. .database('downsample')
  17. .retentionPolicy('autogen')
  18. .measurement('mean_cpu_idle')
  19. .precision('s')

The script simply collects the CPU measurement from the "usage_system" field every minute, calculates the average, and writes that value up to my upstream InfluxDB instance. On this gateway device, the CPU data looks like this:

In the upstream instance, the downsampled data is as follows:

You can see that the data is basically the same, just at a slightly lower granularity. ***, I set the data retention policy on the gateway device to 1 day. This way I can keep some historical data locally without "filling up" the device:

Now I have this IoT gateway device that collects data from local sensors, presents analytics to local users, issues local alerts (as long as I enable Kapacitor alerts), and downsamples the local data and sends it upstream to my enterprise InfluxDB instance for further analysis and processing. On the gateway device, I have fine-grained millisecond data. Meanwhile, my upstream devices receive slightly less granular minute data, giving me enough insight into what's happening with each local sensor without having to pay the bandwidth costs of uploading all that data.

Using this approach, I can also connect to and store minute-level data in a regional InfluxDB instance, and I can forward more downsampled data to this InfluxDB instance that aggregates sensor data across the enterprise.

Although I can send all the data along the entire "link" to the final enterprise data aggregation, if I really aggregate data from thousands of sensors in this way, the corresponding storage and bandwidth costs will inevitably be consumed by a large amount of useless fine-grained data.

in conclusion

Here, I want to emphasize again: Only timely, accurate, and actionable IoT data can be truly useful. So the older your data is, the less actionable it is; and the less actionable it is, the less precise you need it to be. By reducing the data sampling rate and setting a data retention policy that gradually extends over time, you can ensure that real-time data is highly actionable and accurate, while also ensuring long-term data trends and analysis.

Original title: Architecting IoT Gateway Devices for Data Downsampling, Author: David G. Simmons

[Translated by 51CTO. Please indicate the original translator and source as 51CTO.com when reprinting on partner sites]

<<:  Rethinking data center cabling practices to improve energy efficiency

>>:  Three things you need to know about IoT and fog computing in 2018

Blog    

Recommend

How 5G will impact IoT technology

According to many predictions, the Internet of Th...

MEC – Are we getting closer?!

Multi-access edge computing (MEC) or previously m...

CloudCone: $14.11/year KVM-1GB/20GB/5TB/Los Angeles data center

CloudCone also launched promotions during the Dou...

EtherNetservers: $12/year-1GB/30GB/2TB/2IP/Los Angeles data center

There are not many merchants who still sell OpenV...

IDC survey: Only 9% of enterprises plan to use 5G for IoT deployment

5G promises to be ten times faster than existing ...

The Three Realms of Industrial Internet

The Industrial Internet platform is now very popu...