Injecting noise: students build tools to protect individual privacy from artificial intelligence
(This article has also been translated to Russian here.)
Artificial intelligence is giving companies unprecedented insights into our personal information. Can individuals protect their privacy by using applications that add fake data to our online behavior?
Recently, the credit-reporting agency EquiFax leaked the social security numbers for almost half of all Americans, causing much anxiety about privacy. But there may be an even larger breach of individual privacy that is happening every day.
With the help of the latest artificial intelligence algorithms, many individuals and companies may already know more about you than your social security number. In some cases, companies may know something about you that you didn’t even know about yourself — and in others, companies may think they know something about you — and be completely wrong.
With this in mind, students at the UC Berkeley Sutardja Center for Entrepreneurship & Technology are building tools designed to protect individual privacy in the always-connected Internet age. The challenge is this: Can students create applications that cloud our online behavior by adding noise (e.g. fake clicks, mouse movements, scrolls, searches, GPS locations, etc.) to prevent tech giants and others from using this data to learn personal information about us?
“Data Injection is extremely important when it comes to data security,” said Taj Shaik a computer science student whose team is working on noise injection tools for Amazon. “Some of these companies are collecting potentially valuable data, and if this data falls in the wrong hands, it becomes particularly hard to control.”
Large for-profit companies have been harvesting our data for years. While many are happy that Google or Facebook will use data to serve them a personalized ad, some may be more alarmed if, for example, they learned that they were denied a loan because they used the caps lock in an online loan application.
And it is small details like this — so-called “weak data” — that companies are gleaning to make generalizations about every one of us. For example, small behaviors like switching between apps more frequently than most or over-checking email may indicate one is depressed, whereas posting bright, colorful photos on social media may indicate that one is happy.
“Today, all of us have two brains: the one in our heads, and the mobile phones in our palms. That second brain is a two-way source of information. Not only are our actions guided by input we receive from our mobile devices, but at the same time, these devices are reporting our further actions back to profit-driven enterprises 60x60x24x7,” says Shomit Ghose, partner at ONSET Ventures and industry advisor for the project. “These profit-driven enterprises then gain an even keener insight about us and are able to even more effectively guide our subsequent actions. It’s a danger to surrender our cognition to corporate algorithms in this way.”
Recently, researchers at Stanford announced that they designed a machine learning system that could identify with 81% accuracy whether photos of men (taken from online dating websites) were straight or gay. Until now, there hasn’t been a way for computers or humans to determine this information about anyone without it being explicitly volunteered. This is beyond merely just a cool demonstration of the abilities of A.I.: It may, for example, pose a risk for individuals living in countries where homosexuality is punished.
Many have pointed out an important potential flaw in this particular research. While engineers and researchers are good at using artificial intelligence and machine-learning algorithms to uncover patterns in all types of data, they aren’t always able to understand the causes of those patterns. While this research may show that facial features can be used to predict sexual orientation, studies like this are notoriously hard to replicate because there could be some other feature of the data that is actually causing the pattern. For example, in this case, the researchers point out that they noticed that individuals identifying as homosexual tended to upload higher resolution photos than those who were heterosexual — meaning the pattern may have had nothing to do with faces at all.
This illustrates potentially larger problems with using artificial intelligence systems to understand personal information: bias and lack of validation. At a recent conference, Google’s head of AI, John Giannandrea, made the case that bias in training data is the greatest safety threat for artificial intelligence. For example, a machine-learning tool designed to predict the likelihood of a prisoner re-offending called COMPAS was found to be biased against minorities. Giannandrea mentioned that machine-learning algorithms are making “millions of decisions every minute.” It’s easy to imagine scenarios where humans are discriminated against because of biased training data or by simply being an outlier in an overall trend.
Even if artificial intelligence algorithms were able to collect our data in a fair, unbiased way, there are still concerns with for-profit companies using that data to exploit us.
How? Enter behavioral economics. In 2002, Daniel Kahneman won the Nobel Prize in Economics for his work on prospect theory, which shows scientifically that for most people, losses really do hurt more than gains. This, along with much other evidence, revolutionized economic theory by showing that people do not always make completely rational, self-interested decisions. Just a few weeks ago, one of Kahneman’s disciples Richard Thaler also won the Nobel Prize in Economics for further work in this area.
From behavioral economics, we know that people make many irrational “mistakes” due to cognitive biases — and companies have learned how to take advantage of them.
One classic cognitive bias, which has been used by salespeople for much longer than it has had a name is anchoring, or people’s tendency to give more weight to the first information that they learn. This bias can be exploited in many ways, such as at a discount sale where a price is crossed out and a sales price is given. Another example of anchoring is the political tactic of asking for much more than you actually want, so that when a compromise happens, it is more likely to be in your favor. These tactics often work well, and the list of cognitive biases that companies can exploit is quite large.
Beyond concerns about biased data and companies exploiting flaws in human nature to manipulate customers, there is additional concern about companies having access to all of this data about us in the first place. Allowing companies and the individuals that work there this kind of access to our personal information is giving them an extraordinary amount of power.
What happens when the next breach is at Google or Facebook? Hackers may learn far more about us than just our social security numbers.
For students in the collider, the project is challenging on many levels. Besides working on the technical challenge of building software that can inject noise in a way that actually protects privacy, many would also like to preserve the effectiveness of the services that they use.
“Say we inject noise into Google searches — even Google searches are used to locate you — If we inject noise like that, then Google will get worse,” said computer science and mathematics student Anna Leskova who is working on tools to spoof geolocation, “It won’t be the optimal search engine that it is now. So, it is an interesting problem.”
Students will present their projects at the Sutardja Center for Entrepreneurship & Technology at 4:30 on 11/27.