Story image

Data quality and the democratisation of AI

29 Oct 2018

Constructing machine learning and artificial intelligence models is no easy task. Although there is a lot of noise around AI, the actual art and application of this technology does require a depth of understanding about mathematics, as well as the technical know-how to develop meaningful AI models and algorithms. 

Expanding volumes of raw data about people, places and things, together with increasing computing power and real-time processing speeds is making immediate AI applicability and business benefits a more viable reality. 

But before IT leaders attempt to successfully deploy or conquer an enterprise-wide AI strategy, they must have the capability to bring large datasets together from disparate and varied data sources into a secure, centralised and scalable governed data repository. 

As machine learning is processed by training and feeding machine systems information in an organised and structured manner, AI is only as intelligent as the data behind it.

An example of this is Microsoft’s Tay – a Twitter AI chatbot that was supposed to engage in casual and playful conversation with its followers, but instead, it tweeted inappropriate and racist comments. The chatbot did so because it was given negative sentiments from Twitter trolls and preventative filtering was thrown out the window once it was launched - a clear example of how AI failed because of a bad data set. 

This demonstrates that there needs to be a higher emphasis placed on the importance of data quality and governance. Data is not perfect. Even highly educated, talented data scientists struggle with disparate data, so why would machines be any better? The solution is to allow data governance and quality to pave the way to AI democratisation. 

Data quality and governance for your “crown jewels” 

Democratising AI is not the easiest thing to do but companies are eager to speed up the process. In fact, according to a survey, 81% of IT leaders are currently investing in or plan to invest in AI, as CIOs have mandated that AI needs to be integrated into their entire technology stack. 

But before businesses can get to an AI proof of concept or invest in operational AI applications, they need to have a data quality and governance strategy in place. 

With data quality, you ensure that your “jewels” are cleansed and in perfect condition. Quality is not a one-and-done process and data can come from everywhere. Data management should be continuous to make sure the quality and integrity of data remain so you can make smarter business decisions. 

If data quality issues are addressed within an organisation, the business can gain a competitive edge. But to improve data quality, data needs to be accurate, complete and consistent.

Data governance, on the other hand, requires a team armed with the responsibility and the right tools to manage the system that protects those sacred jewels top enable strategic planning. A well-planned data governance framework covers strategic, tactical, and operational roles and responsibilities. It defines who can take action, upon what data, in what situations, using what methods. 

A sound data governance approach can and should involve more than one platform or project. It should also contain a set of rules and standards for data related matters. 

Data democratisation, dirty data, and the data champions in charge of it all 

Data governance comes with great responsibility. It’s no surprise today that companies are in a rush to become data-driven, but this can lead to incomplete, inaccurate data or “dirty data” that is riddled with errors and missing values. Studies show that dirty data is the most common problem for workers in the data science field.

It is, therefore, imperative to get a sense of how dirty the data is. Whether you need to update date formats, capitalisation or punctuation, it’s important to get a quick understanding of what you’re dealing with.

Systems infused with AI capabilities are smart, but they are still computer programs. As noted with Microsoft’s Tay chatbot, you can’t feed the system dirty data and expect to train a model or build a foolproof platform. An AI model can’t be trained using the wrong type of data. Like the saying goes, “garbage in, garbage out.” 

While data-literate professionals and scientists typically own the keys to the data kingdom, the proliferation of new data streams coming from sensors, social media, the cloud, IoT, and so on, is uncontrollable. 

This is why we are seeing new data-focused roles emerge within enterprises, such as data analysts, data scientists and data stewards. These new roles are blurring the lines between enterprise data and consumers, presenting a challenge related to corporate data quality, reliability and trust that must be addressed by IT organizations. 

There is a solution though. While we need data experts helping to maintain data integrity, it’s critical to democratise data in order to distribute information across all teams. Instead of having business units go through IT teams to get the data they need, we can empower all units (marketing, business analysts, IT, sales) to take action on business insights.

For example, the marketing department can analyse click streams from the website, or finance teams can get vendor billing details. Business users can unleash and access data confidentiality, take an active role and feel engaged.

An AI strategy is a data strategy 

It’s hard to dispute that data is the “new oil” of today’s fast-paced, digital society. But when dealing with machines, the quality of the analysis and the outcomes that emerge from it depend on the quality of the data you feed into the algorithm.

Businesses cannot and should not even begin to think about creating and applying their own AI models or algorithms without the power of secure and clean democratised data that is integrated into mission-critical systems, as the results can be disastrous and extremely costly.

Enterprises need a vision for data governance in their organisation that evolves over time can provide value to the business.

Fully considered data strategies are key to implementing the right AI strategies, and we need those who understand data best to maintain the data quality and integrity necessary to fuel the types of automated, intelligent insights that AI can provide.

Then, and only then, can organisations fulfil their AI dreams. 

Article by Steve Singer, A/NZ Country Manager at Talend

Attacks targeting Cisco Webex extension explode in popularity - WatchGuard
WatchGuard's Internet Security Report for Q4 2018 also finds growing use of a new sextortion phishing malware customised to individual victims.
SAS partners with NVIDIA on deep learning and computer vision
“By partnering with NVIDIA, we combine our strengths to augment human intelligence and realise the true potential of AI.” 
Why businesses must embrace automation to ensure success
“For many younger workers, the traditional view of a steady job at one company, perhaps for life, simply doesn’t reflect reality."
TYAN unveils new inference-optimised GPU platforms with NVIDIA T4 accelerators
“TYAN servers with NVIDIA T4 GPUs are designed to excel at all accelerated workloads, including machine learning, deep learning, and virtual desktops.”
Worldwide spending on security to reach $103.1bil in 2019 - IDC
Managed security services will be the largest technology category in 2019.
Microsoft appoints new commercial and partner business director
Bowden already has almost a decade of Microsoft relationship management experience under her belt, having joined the business in 2010.
How Cognata and NVIDIA enable autonomous vehicle simulation
“Cognata and NVIDIA are creating a robust solution that will efficiently and safely accelerate autonomous vehicles’ market entry."
Kinetica launches a new active analytics platform
"With the platform now powered by NVIDIA DGX-2, customers can build smart analytical applications that combine historical data analytics and ML-powered analytics."