IT Brief New Zealand logo
Technology news for New Zealand's largest enterprises
Partner content
Story image

Scaling AI: Making machine-learning models more effective and efficient

By Contributor
Mon 29 Nov 2021

Article by Infosys SVP & industry head, global markets, Raja Shah.

High-quality, clean and appropriately labelled data is undeniably crucial in today's world. 

Companies are increasingly dependent on the ability of AI and machine learning (ML) models to provide real-time insights that drive business and customer engagement outcomes. 

With an exponential increase in data, AI and ML algorithms are integral to leveraging this data effectively. This is key to enabling everything from self-driving cars, cashier-less shopping services and even cancer detection.

In the realm of the telecom world specifically, we see AI and ML being used for various use cases that enhance customers' experiences of solutions and services. This includes speech recognition and activated commands which have become almost must-have smart features in today's fast-paced world.

And with increasing reliance, the quality of data and data models that minimise unconscious bias from human data labellers is even more important. With customer behaviour and genome analysis more prevalent for customer mapping, telecoms can confidently hyper-personalise offerings when data is effectively cleansed.

As the data is crucial, the design and testing process, which includes data cleansing and labelling, must be extensive to minimise bias in data. The industry is awash with new and dedicated data labellers, such as San Francisco-based start-ups Scale AI and Sama.

Google and Amazon also complete gargantuan manual labelling tasks, especially in the legal and healthcare industries, but often charge businesses a particularly high fee.

Across all these data labelling services, there is no guarantee that the output will be comprehensive, unbiased, or free from noise, which adds a risk of flawed outcomes and inefficiencies. The length of time required to successfully clean and label data is often too long for agile companies.

At Infosys, we understand that 25-60% of ML projects costs come from manual labelling and validation of data. Expenditure on these tasks seems to be increasing, with little guarantee of quality. AI consultancy Cognilytica estimates enterprises will collectively spend US$4.1 billion on data labelling by 2024. 

So, what's a faster and more effective way to reduce bias and deliver clean data for hungry ML algorithms? 

An approach that combines intelligent learners and programmatic data creation is required. By allowing AI to do the heavy lifting for deskilled data labelling, overall bias can be reduced and efficiency and effectiveness can ultimately be boosted. Here are some of the ways this transformation can take place:

Active Learning 

During the active learning process, an intelligent learner examines unlabelled data and picks parts of it for further human labelling. Using a classifier can help control what data is selected and helps address areas that haven't been optimised for machine learning. This makes the labelling process active rather than passive and, in turn, increases data quality.

Active learning was recently used in the legal industry to label contractual clauses. Through the process, data accuracy increased from 66% to 80%, even when using fewer data points, while the cost and time involved were also significantly lower. 

In a situation where an AI-based decision appears biased, it is easier to interrogate and find the reason why. The result of a Netflix recommendation, for example, is based on a set of rules driven by user data. If the rules appear to be displaying biased results, while complicated, the machine learning model can be investigated to find out why and corrected to remove perceived bias.

Distant supervision 

Using distant or weak supervision to programmatically create data sets is the best way to use AI at scale. In both approaches, a labelling function is programmed to create labels from input datasets. That means distant or weak supervision can combine noisy signals and resolve conflicting labels without any sort of reference to a "ground truth".

Distant supervision produces noise-free training data using distance knowledge bases. By looking across multiple data sources and databases, distant supervision can map the metrics for machine-based learning models.

The process has 98% accuracy, but there may still be noise in the label depending on the type and number of knowledge bases in the training data available. One challenge with this model is that finding distant knowledge bases can be difficult, and ML engineers need expert domains to help them uncover the appropriate information. 

When data needs to be sourced from unreliable sources, it is best to use weak supervision.  

Synthetic data generation 

When data and labelling functions don't yet exist, there's an option to make up the data. 

Amazon took this approach at its new Go Stores, which are small convenience stores where no check-out is required. Amazon created virtual shoppers using graphics software, which in turn trained computer vision algorithms about how to learn what real-world shoppers select off the shelf. 

NASA's Perseverance mission to Mars also saw the entire Martian landscape synthetically captured using synthetic data generation. 

Like the virtual shoppers, synthetic data has the same representative characteristics as the real-world data from which it is derived. The data must have exposure to converse use cases and outliers to reduce uncertainty and ensure it is fair, safe, reliable and inclusive.

This can be seen in the case of churn prediction. Churn prediction is about analysing relevant data to identify factors indicating that a given customer is a flight risk. If you know which customers are about to cancel their subscription or terminate their contract, you can take proactive measures and prevent them from leaving. This can be created without data being generated by calls which may be annoying to the customer and who may have already been contacted for other services by the same provider.

AI projects require quality labelling of data in a timely manner. At the moment, about one-quarter of the time devoted to a machine learning task is spent labelling – well above the 3% of time devoted to developing algorithms.

As large corporations seek to scale AI into every part of their business, they will likely struggle with the trade-off about how to make the process work effectively and efficiently. But active learning, distant supervision and synthetic data generation can do the heavy lifting and significantly reduce costs and increase the efficiency of deskilled data labelling while also improving the quality required to achieve powerful AI models into the future.

For more information on Infosys, visit:

Related stories
Top stories
Story image
Video: 10 Minute IT Jams - An update from Mendix
Mendix is a low-code platform used by businesses to develop mobile and web apps at scale, and Jornt joins us today to discuss how these offerings work, and what benefit they have in the development process.
Story image
Artificial Intelligence
Appier achieves historically high growth rate of 56% YoY
"Our strong momentum over the past two quarters underscores Appier's significant growth alongside our customers."
Story image
Lucid Software
Lucid Software expands enterprise offerings with enhanced slack apps
Lucid Software has expanded its enterprise offerings with enhanced slack apps for its Lucidspark and Lucidchart technology.
Story image
Kaspersky uncovers new attacks by advanced persistent threat group
The attacks involved modifications of the well-known malware, DTrack, as well as the use of a brand-new Maui ransomware.
Story image
Can biometrics help? 123% increase in Gen Zs scammed online
In the three years leading up to 2022, the number of Gen Zs who fell victim to online scams rose by 123%, according to Ping Identity.
Story image
How well do rangatahi understand cyber safety in Aotearoa?
Do rangatahi in Aotearoa understand the importance of being safe online, or has lifelong exposure to the internet resulted in widespread complacency?
Story image
Data analytics
Pressure on orgs to up their data analytics game - study
A recent report from Sisense highlights data transmission, analysis, and risk management remain top concerns for data professionals in APAC.
Story image
Artificial Intelligence
Gartner unveils key emerging tech to watch in 2022
"Such technologies present greater risks for deployment, but potentially greater benefits for early adopters," says Gartner.
Story image
Cloud Security
Tenable makes additions to Cloud Security portfolio
Tenable has announced additions to Tenable Cloud Security that represent the next step in assessing threats related to cloud vulnerabilities.
Story image
Gartner Magic Quadrant
Gartner names Lookout a Visionary in 2022 Magic Quadrant
Gartner has recognised Lookout as a Visionary in the 2022 Magic Quadrant for Security Service Edge (SSE) and one of the top three offerings in the 2022 Gartner Critical Capabilities for SSE report.
Story image
Exclusive: The Access Group shares the benefits of embracing SaaS
In today's rapidly changing working environments, efficiency and productivity are surefire ways to create business growth and success.
Story image
Privileged Access Management / PAM
The importance of stopping identity sprawl for cybersecurity
The 2021 Data Breach Investigations Report (DBIR) shows that 61% of all breaches involve malicious actors gaining unauthorised, privileged access to data by using a compromised credential. Unfortunately, it is often too late when the misuse of a credential is detected.
Story image
Dynatrace extends application security capabilities for runtime environments
Dynatrace has announced that it has extended its Application Security Module to detect and protect against vulnerabilities in runtime environments.
Story image
Artificial Intelligence
Is your chatbot bringing down the customer satisfaction score?
The top 10 reasons why chatbots are failing to meet customer expectations and what you must do to avoid that.
Story image
Ministry will no longer accept equipment from Chinese firm Hikvision
The Ministry of Business, Innovation and Employment (MBIE) says it will no longer accept equipment from a major Chinese surveillance camera maker.
Story image
IBM expands Power10 server line for business modernisation
IBM has recently announced a significant expansion of its Power10 server line with the introduction of mid-range and scale-out systems.
Story image
Avast reveals zero-day exploits targeting Chrome and Microsoft
Avast, released its Q2/2022 Threat Report today, revealing a significant increase in global ransomware attacks, up 24% from Q1/2022.
Story image
Ingram Micro
Ingram Micro NZ sees $74 million revenue growth in 2021
Ingram Micro New Zealand's latest financial report reveals that its revenue from contracts with customers increased by almost $74 million in 2021.
Story image
Why printing security plays a vital part in keeping Aotearoa safe
While internet printing, mobile printing and other similar technologies have no doubt made things easier to manage, it has also brought a whole new set of problems to the table.
Story image
Snyk announces plans to expand partner network in APJ
Recognising that partnerships are critical for growth, Snyk is building an entire partner ecosystem that will drive its expansion across APJ.
Story image
Dicker Data
Dicker Data brought on as Acronis partner for A/NZ
The news about the partnership comes in as cyber criminals continue to exploit gaps in traditional solutions and strategies in NZ and across the APAC region.
Story image
Investment in APAC cold storage to reach $5 in next decade
Investment in Asia Pacific’s cold storage market is expected to grow fivefold in the next decade, according to JLL.
Story image
Organisations exposing highly sensitive protocols to public internet
More than 60% of organisations expose remote control protocol SSH to the public internet, while 36% of organisations expose the insecure FTP protocol.
Story image
Application Performance Monitoring / APM
New Relic integrates offering with Atlassian’s Jira Software
New Relic has integrated errors inbox with Jira Software to allow developers to easily access and set up complete stack error tracking and software performance monitoring from within the tool.
AWS Marketplace
Learn how security orchestration, automation, and response (SOAR) enhances your security strategy.
Link image
Story image
Education sector seeing highest volumes of cyber attacks
When breaking down the numbers to education attacks by region in July 2022, A/NZ was the most heavily attacked.
Story image
Tech job moves
Tech job moves - Fastly, INX, Kinly, SmartBear & Vectra AI
We round up all job appointments from July 29 - August 12, 2022, in one place to keep you updated with the latest from across the tech industries.
Story image
New Zealand cloud provider challenges Google's claims on data control for region
A Wellington cloud services provider says Google's claim it will offer New Zealanders complete control over their own data is not true.
Story image
Datacom research explores reality of zero trust in A/NZ
Zero trust is fast emerging as global best practice in cybersecurity and local leaders are on board, with 83% considering it essential to security.
Story image
High level of Customer Identity & Access Management adoption
The study from Okta revealed that the pandemic has either accelerated or highlighted the need for digital-first strategies.
Story image
Why security needs to shape your journey to the cloud
It's estimated that 80% of workloads could be in the cloud in the next few years. How can you make all that data secure?
Story image
Latest VMware threat report reveals truth about deepfakes
"Cyber criminals have evolved. Their new goal is to use deepfake technology to compromise organisations and gain access to their environment."
Story image
Enterprise Resource Planning / ERP
Why the right ERP (and partner) is crucial to an innovative and successful business
Enterprise Resource Planning (ERP) is a foundational step to ensuring a robust business model; here's why choosing the right one could be vital to ensuring long-term success and innovative results.
Story image
Garmin expands NZ footprint with new Auckland distribution centre
The facility at Goodman’s Highbrook Business Park will be fully operational from October 2022 and features 3,586sqm of warehouse space.
Story image
Cloud and data protection big challenges for NZ businesses
"This surge towards a cloud-first approach meant security and safety became afterthoughts - there's no point being the fastest car on the racetrack if you crash.”
Story image
Cyber attacks
Dramatic uptick in threat activity with exploits growing nearly 150%
"While it’s not a surprise given increased attack opportunities like remote work, it’s still a worrying development and one we cannot ignore."
Story image
Digital Transformation
Top tips for making your finance transformation program a resounding success
Planning to make 2023 the year you embark on a wholesale finance transformation program? It’s a move that will stand your enterprise in excellent stead as you navigate the complexities of the post-Covid business landscape.
Story image
Data Protection
Advancing genomic sequencing and public health with digital infrastructures
Right before our eyes, we've witnessed the development of the COVID-19 vaccine in record time. An enormous achievement in an otherwise lengthy task that previously took, on average, 10-15 years.
Story image
Hybrid Cloud
The essential guide to digital transformation by SolarWinds
Digital transformation is a buzzword thrown around all the time by companies, but what does it actually mean and why is it important? SolarWinds breaks it down.
Story image
Why enhancing bot protection for web and API endpoints matters
The trouble with bots is that they aren’t all bad. Unfortunately, this can make it challenging to detect malicious bots that find their way into your system and threaten your business.
Story image
Augmented Reality
TeamViewer remote access software integrated into RealWear Cloud
TeamViewer has announced a major expansion of its partnership with RealWear, a leading provider of assisted reality wearable solutions for frontline industrial workers.