Story image

Is distributed analytics the solution to a connected world of data?

22 Aug 16

It’s hardly breaking news to suggest that monolithic approaches to data management and analytics are not fit for purpose in a connected, big data world. The interesting question is how to extend the data and content architecture, and the accompanying analytics, to encompass new and unfamiliar sources without throwing away millions of dollars of existing investment. Step forward distributed analytics, a new way of thinking about how to extend capabilities out into the data landscape by providing appropriate data management and analytics “in the moment” while only transporting the data and insights back to the core that are necessary.

A concert of machine learning, analytical agents, and the cloud

I am not going to claim that distributed analytics is going to happen overnight, nor even that all the technology required to achieve the vision is available yet, at least not in an enterprise-friendly package. However, I believe that as a collection of design and approach principles it has some of the most solid foundations for how enterprises (across the industries) should start planning for a much more connected, data-dependent world.

Briefly, the idea that already difficult to manage data warehouses and inflexible data governance processes can be adapted to incorporate much greater amounts of data, at much higher speeds, is largely discredited. While it might be theoretically possible, the costs associated with applying a legacy approach to a relatively new and growing problem would be too high. The conundrum is that legacy has cost enterprises a small fortune in investment, a fortune that is still far from paying its returns, and core data-related business processes such as financial management and regulatory reporting require a heavily governed approach to ensure accuracy (if not always timeliness). Some argue that it is time to throw away these technologies and start afresh with data lakes and the cloud, among other options. I argue it’s possible to have both: a locked-down core of technology matched to much more flexible technologies that augment it – something I referred to as the elastic architecture in our research agenda for this year.

But what does that mean? The – slightly disappointing – answer is that no two distributed analytics solutions will look the same, but they will share characteristics. In the short term, the data lake will become a standard among most organisations. By its very nature, it should be a mix of both data landing zone and longer-term data storage, as well as, increasingly, home to some pretty complex data science-led analysis. Likely best located in the cloud (public, private, or – much more likely – a hybrid deployment that helps span the bridge between on- and off-premises), the data lake could be considered a buffer between a relatively well-organised internal data architecture and the more chaotic world beyond. This would, in essence, augment existing investments in information management technologies and provide the route to “bring the data home” if deemed necessary.

Longer term, I see a future in spreading analytic capabilities outside that core, beyond the warehouse and the data lake and out toward the machines and devices generating the vast amounts of data that are causing the “problem.” I talk in terms of analytical agents, packages of software loaded locally that don’t rely on heavy compute or memory. Using local resources backed up by machine learning-driven optimisation running at the centre, these analytical agents would only send the data back to the core that is necessary for things like exceptional events, improving the machine learning algorithms and regulatory reporting. Immediate proximity means that optimisation of physical processes could happen in near real time, without relying on the transport of vast quantities of data from and to the edge.

Orchestrating these capabilities is only just the beginning, and the technology is still emerging and maturing, but as an approach that looks to the future without abandoning existing requirements and the investments that support them, distributed analytics provides practical steps forward for the enterprise.

Article by Tom Pringle, Ovum analyst

Survey reveals CX disconnect is risky business
Too much conversation and too little action could lead companies to neglect, lose, and repel their very lifeblood, according to Dimension Data.
Should AI technology determine the necessity for cyber attack responses?
Fujitsu has developed an AI that supposedly automatically determines whether action needs to be taken in response to a cyber attack.
Police making progress into Cryptopia breach
New Zealand Police say they are making ‘good progress’ into the investigation of an alleged cryptocurrency theft from Christchurch-based crypto exchange Cryptopia.
NEC concludes wireless transport SDN proof of concept
"Operation and management of 5G networks are very complicated and require automation and closed-loop control with timely data refinement and quick action."
Trend Micro’s telecom security solution certified as VMware-ready
Certification by VMware allows communications service providers who prefer or have already adopted VMware vCloud NFV to add network security services from Trend Micro.
Top cybersecurity threats of 2019 – Carbon Black
Carbon Black chief cybersecurity officer Tom Kellermann combines his thoughts with those of Carbon Black's threat analysts and security strategists.
Avaya introduces private cloud delivery of its UCaaS solution
Avaya is supposedly taking a flexible hybrid approach to the cloud with these new solutions.
Data growth the growing case for managed colocation
The relentless growth of data could see colocation take on a new importance, says Jon Lucas.