Culture of Innovation: Data Management on the IoT Edge

There’s been a trend over the past decade of bringing compute into centralized data centers. And that looks like public cloud. This has been driven by requirements around taking advantage of the economies of scale that are available at the centralized data centers, like being able to make use of centralized power and centralized cooling, and locating data centers in places that might be less expensive.

But at the same time, we’re seeing a counter trend where compute and data are moving outside of the data center. And that falls under the general category of edge computing. You’re moving compute or data resources closer to where that data is being produced, or to where that data or compute is being consumed. 

Use cases on the edge

I don’t necessarily mean things like light bulbs and thermostats in your home, but industrial use cases like factory automation, or autonomous driving back-end data processing, or data management, from sensor networks for business critical applications in the field. 

We see a lot in the energy sector, where they have very valuable applications which bring in data from thousands, tens of thousands, millions of data sources – and make decisions and actions on that. 

How open hybrid cloud meets the edge

Our current platform strategy is that you have these devices that are generating data or can act on instructions, and they all go to your application running in Red Hat OpenShift in either your private or public cloud. 

As a company, we’ve moved to this concept of open hybrid cloud, meaning computers running in multiple locations, possibly a public cloud and a private cloud at the same time, where you’re sharing resources. 

The question is: as these application platforms get more complex, you might have an application running in your central data center, or in individual factory locations or manufacturing locations, or thousands of retail locations, or small remote offices. How do I build one single application that runs across this entire infrastructure, that can operate when a network connection goes down, so I can still manage inventory at my single location, or I can manage what the process flow looks like in my single location? 

It’s “how do we make it easier for our customers to build more complicated applications or build applications that have to run in multiple locations at the same time?”

The advantage of edge

You’re able to deploy new types of applications that you haven’t been able to before, and able to get advantages like better response time, or handle lower latency or higher bandwidth. You’ll be able to reduce their costs and will be able to have greater resiliency in the application. 

If they’re using Openshift as a platform for factory automation, they won’t have to depend on extremely reliable network connections to ensure that there’s no interruption of service. 

Real world examples

Consider traffic cameras, for example. You have tens of thousands of cameras around the city that are looking for traffic congestion. And you might be using machine learning techniques to identify these cases. 

The way you might have built this in the past is you would have these video streams from all these cameras streaming back to a central location. You do all that processing, and then you perform whatever actions on it. 

One very simple example of edge computing is moving any of that image processing out to the camera itself. Or one step in, so it’s connected to, let’s say, a cell phone base station. That open base station could be running a small OpenShift cluster in the future. 

So you could do all the image recognition, just one hop from the camera — or on the camera itself if it’s an edge device. Then you’d only have to send a very small amount of information back to your central data center. You’d only have to send information about what you had found, or only if you discovered something that was actually a concern.

Overcoming latency for augmented reality

There are some great demos using augmented reality. For example, augmented reality headsets, where you might use a headset to overlay maintenance schematics over a piece of equipment that you’re looking at. So, one way you might do that is have all the data that’s necessary and all the compute power that’s necessary within that headset itself. But that would increase the cost of that device, would increase the weight, and it would really reduce the battery life.

Instead, you could potentially have that going back to a computer that’s in the same facility or, again, one hop away at a network location, where it would still be close enough that you would be able to have interactivity. Versus trying to coordinate any sort of visual rendering with a data center thousands of miles away, that would just be too slow to have that be effective. 

Those are the sort of use cases that these models really enable.

The challenge of multiple clouds

The challenge is in managing all the separate locations. 

I have an application that needs to run in a public cloud location, and potentially in a private cloud location. It needs to run at a regional data center or a network provider’s data center. It needs to interact with edge devices that may be directly in a single facility, or a small cluster at a single facility. There is an almost uncountable number of challenges there about how you manage data reliability. Do we just have caches of data at different locations? Or do we have to try to synchronize that data across all of them? How you manage software updates to these nodes, which may be mostly disconnected. How do you manage the reliability of updates to these devices? 

You have a computer that’s on an oil rig, and you need to be able to do software updates to it, and you need to be able to secure that device. But if you do bad software updates, it crashes, and then you have to rent a helicopter and send someone out there to hit the reset button. 

You don’t have trained, trained engineers available or even untrained staff available to do any sort of maintenance, and you are deploying servers that are in a locked closet, but otherwise not particularly controlled. They were just on a shelf somewhere. 

Security is one of the top challenges

So suddenly you have much greater security concerns about the entire platform, beyond servers that happened to be in the data center, where you have at least some assurance of physical security, some assurance that no one’s going to be able to tamper with the device. 

There are a whole host of challenges that building these more complicated architectures create. I think it’s definitely one of the top challenges we face. 

It depends on how distributed this architecture is, for any given application. But if you’re ever depending on these edge devices, which may have keys that allow them to access your corporate corporate network, because they have to be able to send data back to your centralized system, you know security is a huge risk there. These are devices that are out in the field and have less physical security. 

A use case for one of our customers is that they have computers running our software on every train locomotive in North America. At every switch, and every train crossing are these shacks, at the side of the railroad tracks, hundreds of miles from civilization. Maintenance is an issue and security is an issue because potentially someone could walk up and tamper with these systems. So you need to make sure that the platform is secure from, you know, from the CPU off to avoid any sort of potential security risk.

The benefits of working with hybrid clouds

The benefits get back to the flexibility of the applications that you can deliver the reduction of operational costs, by reducing bandwidth requirements, or server requirements, and the reliability and resiliency of the application. 

The responsibility of IoT

I think that what’s changing is ubiquitous network connectivity. Right now there are at least three companies that are planning to launch thousands of satellites, so that you’ll have constant connectivity no matter where you are on the globe. 

Devices that used to be disconnected — where it didn’t matter if they weren’t up to date, it didn’t matter if he had security vulnerabilities, because there was no way to to compromise them, no way to change the way they were behaving — now they’re connected. Everything that was once an embedded use case, is an edge computing use case. 

So, suddenly all the value we provide as a company becomes critically relevant to these devices. Already there are situations where, you know, cheap, cheap video cameras that are connected to the internet get compromised, and then act as botnets to take down important services. So anyone who’s building anything that’s connected to the internet, no matter how small, now has just a much greater responsibility to make sure that for the length of that device, it is up to date and secure. 

That’s something that we can fundamentally help with.