Big Data: Principles and Examples Vol. 4
In this volume, we conclude with Privacy and Security.
Privacy and Security
For our final examples I want to dig into the notions of privacy and security in big data settings. These are and always will be critical concerns.
We begin with financial information and e-commerce. In the early days of Amazon, there were a significant number of customers who were very concerned about the security implications of entering their credit card numbers online. The specific concerns varied, but they almost always involved the possibility of a gang of nefarious hackers gaining access to credit card numbers and using them to make fraudulent purchases.
At the same time, however, it was quite common for many of the same people who hesitated to use their cards online to dine in a restaurant, then hand their credit card to a waiter who would disappear with it for several minutes. When the waitier returned, he would ask for their signature on a form with three carbon copies. At least one of the signed carbon copies would usually end up in the dumpster behind the restaurant, just waiting for an ordinary thief with no hacking skills to find.
Why were the same people who feared the Internet hackers willing to risk the dumpster divers? Mainly it was trust and economics, the latter of which rarely exists without the former. Most of these people had been to hundreds of restaurants in their lives, and while a few had been the victims of fraud, it was not particularly common. Indeed the restaurant credit card ritual had been around so long than many of them had experienced it since childhood, when they saw their parents use credit cards at restaurants.
Online, on the other hand, nobody had very much credit card experience and thus most people had not developed a level of trust that their accounts were, as compared to when they were at restaurants, very safe from hacking. Over time, of course, this trust developed. Part of what drove it was the fact that fraud rates were low. But the more important part was that by giving their credit card numbers online, shoppers got real tangible value. They didn’t have to go visit a bookstore only to find that the book they wanted was out of stock. They could be guaranteed to get the book with just a few clicks.
At the farther end of the privacy and security spectrum, there are certain kinds of data that are far more sensitive than credit cards. One such example is medical records. People are justifiably interested in keeping them as private and secure as possible, lest some information they contain leak in a way that could have very severe consequences, such as loss of or inability to obtain insurance or employment.
However, there are certain circumstances where the value of those same medical records is essentially unmatched. If I were to come down with a serious medical condition, I would want to share every possible element of my medical history with a team of specialists whose ability to diagnose and treat my condition might depend on data they contain. My life could literally be at stake.
The key here is that when we look at the economic consequences of data entering either the right or the wrong hands, massive value or cost can be created. This is a direct consequence of Principle 4, that data have economic value. When we think in these terms we have a much better chance of reasoning effectively about what uses of data will or will not be tolerated by participants in a data economy.
A final variation of this we have to be aware of, is that as in the restaurant and online credit card case above, economic actors very commonly make decisions based on their perception of value or risk, rather than some underlying actual value or risk of loss. A contemporary argument that follows this path is the pro-NSA-data-gathering argument that tries to create the perception that the risk of loss from terrorism is so great that the costs to citizens for whatever loss of privacy occurs pales in comparison.
I sincerely hope that these examples have illustrated some of the key principles behind how we look at big data. There are many more issues to discuss, ranging from how and where we store the data, which is luckily getting easier and cheaper day by day, to how we deal with incomplete or even contradictory data sets, to the nuts and bolts of how we design good cost-effective experiments. All of these are ripe for deeper conversations, but in almost all cases major parts of those conversations can be phrased in terms of the principles above.