Picture: THINKSTOCK
Picture: THINKSTOCK

USING the systems available today to process the data that will be generated by the Square Kilometre Array (SKA), it will take the equivalent of two nuclear power stations — about seven gigawatts a year — to power it, says Ton Engbersen, DOME project leader at IBM Research.

South Africa — through SKA SA, which is a business component of the National Research Foundation — this week joined Dutch radio astronomy institute Astron and IBM in the €34m DOME project, which aims to create an IT road map for the SKA.

The SKA will be the world’s largest radio telescope, using thousands of antennae to receive signals from the depths of space.

"The SKA generates so much data, in the order of 14 exabytes (equivalent to 14-billion 1GB iPods) per day," says Dr Engbersen. "We need to focus on building an efficient computer system behind this so we can deal with this amount of data, without using power that no one can afford."

According to IBM, the four-year partnership would "research extremely fast, but low-power exa-scale computer systems aimed at developing advanced technologies for handling the massive amounts of data … produced by the SKA".

The South African component will focus on visualisation, desert-proof technology and software analytics.

Dr Engbersen explains: "Visualisation (is) how to deal with this data so astronomers can make sense of it." The large quantities of data will be processed, analysed and converted into images, which are easier for astronomers to understand, rather than an amorphous series of numbers.

Western Australia and South Africa’s Karoo, where the SKA will be built, are "extreme" environments, and the microservers used to process the data will need to withstand these conditions, IBM says, noting that they have to be "desert proof".

In terms of software analytics, South Africa’s MeerKAT, a 64-dish SKA precursor, will be "used for the testing and development of a sophisticated software programme", IBM says.

Dr Engbersen is more specific, saying the research will try to develop "software that can automatically make its own decisions about what data are important and not important".

However, the research will not only benefit radio astronomy, says DOME SA technical co-ordinator Simon Ratcliffe. "This project lays the foundation to help the scientific community solve other data challenges such as climate change, genetic information and personal medical data."

SKA South Africa director Bernie Fanaroff has previously said that the SKA will help position South Africa so that it will have the skills and expertise to play a role in the developing field. "I’m trying to get big data started in this country," he said. "It is one of the major spin-offs of the SKA, which will set the foundations for a big data."

Says IBM: "Every day, we create 2.5-quintillion bytes of data — so much that 90% of the data in the world today has been created in the past two years alone.

"This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cellphone GPS signals, to name a few. This data is big data."

And the volume of data is expected to keep increasing, say experts.

"Practically everything we do, that data is stored somewhere by someone," says Alan Smeaton, a professor of computing at Dublin City University who spoke at the European Union Science Conference in Brussels last week. "The act of collecting this personal data leads to the possibility that the secondary data will be more useful (than) the primary data."

The act of mining this data is called analytics. The information and patterns extracted in this way can be more useful to industries than the primary data.

"Big data has the ability to transform industries," Dr Smeaton says, citing health, financial services, advertising and education.

Christina Peters, chief privacy officer at IBM, says: "Why is big data so important right now? Organisations are generating, collecting, using far more data than they can make sense of using traditional methods."

One example is tracking information and GPS: if your daily route information, attained by GPS, can be collated with those of thousands, if not millions, of other people’s, it would be possible for authorities to work out optimal ways to deliver services such as public transport, Dr Smeaton says.

However, there is a great deal of tension in the big data space — aside from how to compute it all.

A session at the EU Science Conference called "Data Protection in an Era of Big Data" looked at privacy concerns around big data. The EU is trying to legislate the use of private information by companies.

Dr Smeaton argues against using legislation, which he describes as a "blunt instrument". While he acknowledges there are "natural threats" such as profiling, tracking and discrimination, he says "analytics can have enormous benefits".

But the reality is that the technology is not there yet. The SKA — and projects such as DOME — will go a long way to developing the computer systems necessary to deal with huge quantities of data.

• Wild was a guest of Intelligence in Science at the EU Science Conference in Brussels.