Philosopher, economist and data scientist Seth Stephens-Davidowitz on why big data isn’t good or bad, the power of the tech giants and why your data is probably actually safer with Google.
It’s difficult to acknowledge the speed at which things are changing when one is living through a revolution of the magnitude we are currently experiencing.
Since the emergence of the Internet as a worldwide phenomenon, and after five billion mobile devices were put in people’s pockets, the nature of our relationship with the world has changed dramatically. The impact stretches down to every aspect of our lives.
Seth Stephens-Davidowitz was one of the first people to really begin to study the power and the impact of the Internet from within. His 2017 bestseller, Everybody Lies: Big Data, New Data, And What The Internet Can Tell Us About Who We Really Are serves as an introduction into a new world of analysis and research that we are only beginning to fathom.
A philosopher, economist, and former Google data scientist, Stephens-Davidowitz is helping to shape our understanding of humanity’s relationship with arguably the most pervasive and powerful human creation ever: the world wide web.
Thinking about the Cambridge Analytica scandal – how massive data sets of personal information were used to manipulate people to a certain extent – and how that marked the end of a sort of utopian Silicon Valley Dream, how powerful is data at this level?
I think Cambridge Analytica was a bit of an overblown scandal. I think they weren’t quite as sophisticated as they claimed to be. They were better marketers than data scientists.
People freaked out from Cambridge Analytica with the idea that people are using political advertising to manipulate votes. That’s what political advertising has always tried to do, manipulate voters. Television manipulates voters, speeches manipulate voters, everything manipulates voters. People freaked out on Cambridge Analytica because it’s new and different and they used these huge words like “psychographic profiling” and “harvesting data” but it wasn’t all that different from traditional market research.
[Donald] Trump won for a lotof reasons. I don’t think Cambridge Analytica was a major factor in that. Brexit happened for a lot of reasons. I don’t think Cambridge Analytica was a major factor to that. I also think that people are so paranoid about privacy that they forget the good use of data.
My book focuses more on all the amazing things we can do with this data. There’s a study that we can predict based on the symptoms people search whether they are likely to get a pancreatic cancer diagnosis in the future. People don’t talk about that. I’m doing work on how you can see what people search for before they search for suicide. On surveys, people don’t want to admit mental health and suicidal thoughts but on Google, they type all these things and we can analyze this data and find unique insights that can help people.
There’s an element in which we are all trying to manipulate people; we do it in the news to get more readers. There’s a famous T ED Talk that showed how YouTube would generate a filter bubble that was more aggressive with white supremacists. That’s the same thing, they are trying to optimise their ad revenue but by mistake they made you a white supremacist. There’s still an issue about the power of this type of data.
Big data isn’t good or bad, it’s powerful. Even in an election. You can be a [Mauricio] Macri fan, you can be a [Alberto] Fernández fan. If you are a Macri fan, you say it’s really scary that Fernández uses data to get more people to support him. If you are a Fernández fan, you can say it’s really creepy that Macri used data to get people to support him.
Trump could use Cambridge Analytica or run a test on Facebook but Hillary Clinton could do the same thing. She didn’t as much as she probably should have. [Barack] Obama used a lot of sophisticated digital strategy to get elected, a lot of people thought that was an excellent thing. People freak out when it’s not their candidate who use the tools. I don’t see any reason that this type of analysis would give an advantage to bad politicians.
Traditional polling and surveying is in crisis, as we have seen a series of big upsets including Trump, Brexit, and even Macri’s first election. Here, ahead of the primaries, the figures indicated something completely different to what happened, leading to a run on the currency. What is your take on traditional polls?
One of the major crises is that fewer people answer polls. Apple is thinking of changing its iPhone setting so that if you get a call from a number that you don’t know, it automatically goes to voicemail. It will be the death of phone polls. Because so few people answer polls now, they try to weight polls to make it more representative.
There’s a lot of wiggle room in how you choose to weight a poll. What tends to happen is that the pollsters used a weighting that gives a result that makes sense to them. They don’t want to put their neck on the line, so if everyone says Macri is going to win and then the pollsters get their data set and they need to weight it, they don’t want to be the only idiots that said Fernández is going to win. It’s becoming less of legitimate science as fewer people answer then.
There’s also the issue of lying in polls. You know the poll is anonymous, you don’t have an incentive to tell them the truth, a lot of people don’t feel like telling a random organisation who they are voting for.
Today, through tools like Google Trends and others based on analysing web search data, anyone can create their own projections on people’s intentions, right?
Some people just think we don’t need polls anymore, we can just search whether Fernández is searched more than Macri and whoever is searched more will win the election. That’s not true. Some people search Fernández because they like him, others search for him because they hate him. The clues are more subtle.
The top Google search about Alberto Fernández now is about the son of Alberto Fernández and whether he is a drag queen. That’s something that probably the campaign of Alberto Fernández wants to take more seriously as it tends to be searched a lot in conservative parts of Argentina.
Knowing what’s on people’s minds is more difficult than what they say is on their mind – the last three months a 60 percent rise in searches related to the stock market, not surprising. But what are people searching when they search about the stock market. Do they search Fernández or Macri, who are they blaming for what happened? It seems it’s much more Macri running the stock market. Macri’s campaign might want to say: Can we somehow shift this issue and put this on Fernández?
From an ethical standpoint, Google and other platforms have become the de facto guardians of truth. On one side you have the ranking algorithm that determines the way people get informed. There is power in that. Google, for example, openly says they want to favour Wikipedia because they think it’s a good way of people getting informed. In Argentina’s version of Wikipedia, certain types of profiles have a very strong pro-Kirchnerite bias. And at the same time there are cases of people being attacked on Wikipedia because of the way the community is structured.
My question is whether we have a dichotomy between the ethics of this being the way we validate truths and the fact that it’s a trillion-dollar commercial secret for Google. How do you understand that conflict?
It’s a huge problem. We haven’t reconciled it as a society. Who gets to decide what people see. And Google initially waved their hands and said it’s just an algorithm. So don’t blame us, it’s all machine learning, artificial intelligence and based on what people click on, what people are searching and over time they realised they had to make some adjustments, get rid of lower-quality news sources, getting higher quality news sources.
In the US there’s a big story that Hillary Clinton was involved in a paedophile case in a pizza parlour and most of us would say that Facebook should not show that story to people because it’s so obvious that it’s untrue. But that sets a dangerous precedent. Who gets to decide what stories are untrue? Someone could say Mark Zuckerberg has abused his power and he can say it’s fake. There are all kinds of ethical questions.
No human being could explain now how Google determines what shows up first on their search algorithm. Many lines of code driven by artificial intelligence and machine learning. It’s a black box that nobody totally understands. In an old news story we knew how the news were chosen, it’s more difficult to regulate a process like this one
I’m going to debate with you on that one. Google and all platforms understand that this is the way they generate their power. It’s a combination that they are pushing for an improvement of the user experience but at the same time they are pushing for making more money. They are making decisions that affect how people live their lives.
But that’s not new for media companies. There has always been a conflict of interest for media companies between informing the people and making money. There are examples of media companies not doing investigating reporting on companies that pay their bills or politicians that were favourable of them. What is new is the opacity of the process that makes it difficult to regulate.
And the level of power?
Is it new though? In the US, for example, it used to be that Walter Cronkrite was considered the most powerful person in the world. Everybody watched his news show. He largely ended the Vietnam war by turning against it. I’m not totally sure I agree. I guess that was limited to one country while Facebook and Google are powerful around the world. There has always been powerful single entities with incredible abilities on mass media.
It’s very clear if you analyze google search data that conspiracy theories are widespread after major horrific events. That’s really troubling as a society that so many people after a child gets slaughtered think that is a fake story. If Google had existed during the moon landing, the next day the search would have been moon landing hoax. If it existed after JFK was shot, the top search would have been JFK shooting hoax. I don’t know if there’s anything new about conspiracy theories but now we are more aware about how common they are.
Google, Facebook and the major platforms are in the middle of a public debate given their size and power, both in terms of business and in terms of being the channel through which people get information, even in terms of mental health. What should governments be doing in terms of regulation with Big Tech?
It’s possible that they have become too powerful. I don’t know if there’s an obvious way to do this. Nationalising them in some way would not be a crazy idea but it would pretty far down the road.
Some of your concerns are a little overblown. I’m not sure that innovation in social media has been too crippled by Facebook. People are pretty happy to create an app knowing that Facebook might buy it for a billion dollars. People don’t view that as a huge loss. Even Google Search is a competitor, Bing is a competitor. Google has a strong incentive to keep that product good and I think, despite a lot of people’s criticism, it has gone better over time.
It’s not clear that breaking the companies up or having them controlled by utilities by the government would bring more innovation.
What about the loss of privacy?
Even the privacy, I always feel your data is safer with Google and Facebook than anybody because they actually have, Google has billions of dollars of incentives to not leak data. If they leak data everyone would just stop using the service. And they can hire some of the best computer scientists in the world to figure out how to best protect your data. Your data is more vulnerable with the smaller site. If you sign up for a site like Ashley Madison, that had the slogan “life is short, have an affair,” it’s such a small company that they can’t devote large resources to protect privacy. No employee of Google can just look at individual search data and leak data, they have systems in place to prevent that.