How big data is enhancing the worlds food chain
Mapping the human genome has been one of humanities most complex challenges. The feat was initially achieved in 2001. As it stands, it is about 90 gigabytes of data when stored.
By comparison, mapping the plant world's genome is a much more difficult challenge. Plants have a much larger genome structure. For example, just the genome for wheat is seven times larger than a humans.
In humans, one would generally map both parents and then the resulting child. While in the plant kingdom an agricultural breeder may want to analyse hundreds of seeds or potential parents across lots of varieties for one project.
This adds up to a lot more data and complexity.
In comes Computomics. They're a German data analysis company. They offer data services to the worlds plant breeders.
Traditionally breeders would select the best plants in a field based upon physical attributes and cross breed varieties to eventually settle on a new species. According to the US government's FDA, this process on average takes 13 years.
The worlds growing populations demand for sustenance, and the challenges of climate change make 13 years way too slower timeframe.
In genomics, these attributes of each plant are called phenotypes. Examples of which could be colour, mineral content, yield, moisture content, days to maturity and many other external characteristics.
Computomics speeds up this traditional development process by giving phenotypes a score and rolling this up into an overall plant or seed score. They can then work with the breeder to analyse all the potential breeding or crossbreeding opportunities to enhance to create a breeding strategy.
To make this vastly complex challenge even more complicated, Computomics needs to track soil, weather, geography information as well as other factors with the breeding programmes.
With two-thirds of the world's calories coming from just rice, wheat or corn, you can see the massive potential for this data analysis to impact world food production.
The challenge for Computomics has been computing power and storage. Initially tracking the human genome cost in excess of $3 billion USD. This price point wouldn't work for plant breeders.
In August of last year, they trialled a SuperDomeFlex server solution from HPE. At the time, no cloud-based solution enabled them the terabytes of memory they would need. SuperDomeFlex allows multiple servers to be connected as one at an affordable price.
In 2015 Computomics raised 1.1 million Euro of funding. A significant chunk of this was later invested in turning the successful trial of the HPE equipment into a reality.
They can now assemble the massive datasets needed and provide this unique data analysis as a service offering to clients.
The next challenge is to extend this data analysis to the microbes in the soil impacting their plant breeder clients crops. This will be another magnitude of complexity again.