“Machine learning will help companies and administration to take better decisions”
Jordi Vitrià is one of the authors of a new book on big data, deep learning, and artificial intelligence published as part of a series by National Geographic (first in Spanish, then in English and Italian). He makes the case to Penta readers that “Every human being must have the right to protect from other people’s eyes some aspects of his or her life that are perfectly legal.”
“In God we trust; all others must bring data.” These words are attributed to US statistician and lecturer William Edwards Deming aptly open the new book, “The Power of Data: From big data to deep learning” (El poder de los datos: Del big data al aprendizaje profundo), in the National Geographic series “Our mathematical world” (El mundo es matemático).
UB professor and BGSMath Faculty member Jordi Vitrià is one of the authors of the volume. Big data, deep learning, artificial intelligence: these are all buzzwords that we hear every day. Vitrià believes there is a good amount of myth about them. “Data can be used to lie, but at the same time there is no truth without data,” he writes in chapter 4 of the book. But “you always have to have a skeptical attitude towards data,” because data don’t speak for themselves, as he and his coauthors, Oriol Pujol Vila and Santi Seguí, explain in the book.
Tell us what “big data” is.
Technically, it means to be able to process big amount of data. But the reality is that very few companies are truly able to do that. The majority of companies use a smaller amount of data. Take banks, for example. A big bank has, say, a million clients. Each client produces, say, 5,000 data points. Is this big data? No. Traditional data that used to have a value for a bank were not ‘big data’. Now, if you have all the clicks a client gives on your page, or the path of the mouse they are using, that is big data. That can also have a value for a company.
So, who has ‘big data’ then?
Big companies. Google, Facebook, big supermarket chains, big insurance companies. In Spain people who work in a digital environment are big tourism companies who manage many different webs for example. Or administration with distributed sensors. These are all data that are cheap to collect. But what is really important is not whether data are much or little, it’s their analysis and their output that count. This is what we call ‘data science’, which is a more interesting concept than generically ‘big data’.
In the book you also talk about deep learning.
In 2012, when ‘deep learning’ became popular, it seemed it would be more revolutionary than it actually is. Essentially it is a type of machine learning based on learning data representations. But the truth is that there are many problems where traditional statistics still work much better.
Are there fields where deep learning really makes a difference?
Deep learning has been revolutionary mainly in three fields. Image analysis – without it, self-driving cars, iPhone face recognition or many applications in medical imaging would not exist; natural language analysis – we now have fantastic tools that can automatically and naturally summarise texts; and finally, time series forecasting problems, which is, predictions based on a sequence of data points taken in time. All of this has to do with ‘learning’, but it’s not all the learning there is.
What about artificial intelligence?
The state-of-the-art of this technology is that we have algorithms that are capable of predicting what is going to happen to a specific phenomenon based on the data that represent it. This is a very interesting tool, but it’s very far from a thorough definition of “intelligence”.
You are saying that what we call “artificial intelligence” really it is not?
We are not even capable of formulating the big questions about what “intelligence” really is. We know certain things. “Learning” is part of it, but not all of it. Also, we take for granted that for a robot to be “intelligent” or to be “like us”, it needs to have some characteristics that I am not sure machines need to have, such as diseases or death. What we do have, and I think this is going to be a game-changer, is predictive analytics.
What is predictive analytics?
Systems that are capable of predicting something based on a series of data. What you are really doing is automating some decisions a company or an administration has to make. This is a huge change. So far, machines first changed manual labor, then blue-collar work, then the accounting work, then white-collar work, so to say. But the change reached the directive level: now we have systems that can tell us if some decision can work or not. This is not “intelligence”, it is a small part of it. But it can change the way companies make decisions, and this is very relevant. It’s a decision-making process based on mathematics, statistics, probability, computer science models. This is a true revolution.
As an expert in data, are you worried about the recent scandals about Facebook data?
I think the human factor is key to understand why big data have come to social visibility. Big data worry us when they affect people’s lives. We should all be aware that when we accept a cookie, that cookie is estimating our sex, age, what our political views may be, our favorite sports… And the reason is because they want to offer us the best ad. They don’t really “know” it. They estimate it. Many companies do the same. Privacy rules have to be very clear, and the European Union is promoting some legal changes in that respect with the General Data Protection Regulation, which becomes enforceable in May 2018. What worries me is people’s unconsciousness about this issue. It is naïve to expect companies to be ethical. There must be a more widespread awareness about out right to privacy. Many people say, if I haven’t done anything wrong, why should I worry? It is the wrong approach. Every human being must have the right to protect from other people’s eyes some aspects of his or her lives that are perfectly legal. Privacy is the right I have to show the image of myself I want, both in my real and in my digital life. Total transparency in a society is impracticable.