Sue
Halpern, a writer and scholar-in-residence at Middlebury College in Vermont, wrote recently in
the New York Review of Books about the extraordinary reach of data mining.
Her article is ostensibly a review of two books, but is actually her summary of the reach of various internet sites into our so-called private lives.
Her article is ostensibly a review of two books, but is actually her summary of the reach of various internet sites into our so-called private lives.
You
would be astonished at what the Internet knows about you. Every click on
Facebook, Google, or Amazon is added to your profile.
She
writes:
“A
few months ago The Washington Post reported that Facebook collects ninety-eight
data points on each of its nearly two billion users. Among this ninety-eight
are ethnicity, income, net worth, home value, if you are a mom, if you are a
soccer mom, if you are married, the number of lines of credit you have, if you
are interested in Ramadan, when you bought your car, and on and on and on.
“How and where does Facebook acquire these bits and pieces of one’s personal life and identity? First, from information users volunteer, like relationship status, age, and university affiliation.
“They
also come from Facebook posts of vacation pictures and baby pictures and
graduation pictures. These do not have to be photos one posts oneself:
Facebook’s facial recognition software can pick you out of a crowd.
“Facebook
also follows users across the Internet, disregarding their “do not track”
settings as it stalks them. It knows every time a user visits a website that
has a Facebook “like” button, for example, which most websites do.
“The
company also buys personal information from some of the five thousand data
brokers worldwide, who collect information from store loyalty cards,
warranties, pharmacy records, pay stubs, and some of the ten million public
data sets available for harvest.
“Municipalities
also sell data—voter registrations and motor vehicle information, for example,
and death notices, foreclosure declarations, and business registrations, to
name a few.
“In
theory, all these data points are being collected by Facebook in order to
tailor ads to sell us stuff we want, but in fact they are being sold by
Facebook to advertisers for the simple reason that the company can make a lot
of money doing so….
“In
fact, the datafication of everything is reductive. For a start, it leaves
behind whatever can’t be quantified. And as Cathy O’Neil points out in her
insightful and disturbing book Weapons of Math Destruction: How Big Data
Increases Inequality and Threatens Democracy, datafication often relies on
proxies—stand-ins that can be enumerated—that bear little or no relation to the
things they are supposed to represent: credit scores as a proxy for the
likelihood of being a good employee, for example, or “big five” personality
tests like the ones used by the Cambridge Psychometrics Centre, even though, as
O’Neil reports, “research suggests that personality tests are poor predictors
of job performance.”
“There
is a tendency to assume that data is neutral, that it does not reflect inherent
biases. Most people, for instance, believe that Facebook does not mediate what
appears in one’s “news feed,” even though Facebook’s proprietary algorithm does
just that.
“Someone—a
person or a group of people—decides what information should be included in an
algorithm, and how it should be weighted, just as a person or group of people
decides what to include in a data set, or what data sets to include in an
analysis.
“That
person or group of people come to their task with all the biases and cultural
encumbrances that make us who we are. Someone at the Cambridge Psychometrics
Centre decided that people who read The New York Review of Books are feminine
and people who read tech blogs are masculine. This is not science, it is
presumption. And it is baked right into the algorithm.
“We
need to recognize that the fallibility of human beings is written into the
algorithms that humans write.
“While
this may be obvious when we’re looking at something like the Cambridge
Psychometrics analysis, it is less obvious when we’re dealing with algorithms
that “predict” who will commit a crime in the future, for example—which in some
jurisdictions is now factored into sentencing and parole decisions—or the
algorithms that deem a prospective employee too inquisitive and thus less
likely to be a loyal employee, or the algorithms that determine credit ratings,
which, as we’ve seen, are used for much more than determining creditworthiness.
“(Facebook
is developing its own credit-rating algorithm based on whom one associates with
on Facebook. This might benefit poor people whose friends work in finance yet
penalize those whose friends are struggling artists—or just struggling.)…”
“If
it is true, as Mark Zuckerberg has said, that privacy is no longer a social
norm, at what point does it also cease to be a political norm? At what point does
the primacy of the individual over the state, or civil liberties, or limited
government also slip away?
“Because
it would be naive to think that governments are not interested in our buying
habits, or where we were at 4 PM yesterday, or who our friends are.
Intelligence agencies and the police buy data from brokers, too. They do it to
bypass laws that restrict their own ability to collect personal data; they do
it because it is cheap; and they do it because commercial databases are
multifaceted, powerful, and robust.
“Moreover,
the enormous data trail that we leave when we use Gmail, post pictures to the
Internet, store our work on Google Drive, and employ Uber is available to be
subpoenaed by law enforcement.
“Sometimes,
though, private information is simply handed over by tech companies, no
questions asked, as we learned not long ago when we found out that Yahoo was
monitoring all incoming e-mail on behalf of the United States government.
“And
then there is an app called Geofeedia, which has enabled the police, among
others, to triangulate the openly shared personal information from about a
dozen social media sites in order to spy on activists and shut down protests in
real time.
“Or
there is the secretive Silicon Valley data analysis firm Palantir, funded by
the Central Intelligence Agency and used by the NSA, the CIA, the FBI, numerous
police forces, American Express, and hundreds of other corporations,
intelligence agencies, and financial institutions.
“Its
algorithms allow for rapid analysis of enormous amounts of data from a vast
array of sources like traffic cameras, online purchases, social media posts,
friendships, and e-mail exchanges—the everyday activities of innocent people—to
enable police officers, for example, to assess whether someone they have pulled
over for a broken headlight is possibly a criminal. Or someday may be a
criminal.
“It
would be naive to think that there is a firewall between commercial
surveillance and government surveillance. There is not.
“Many
of us have been concerned about digital overreach by our governments,
especially after the Snowden revelations. But the consumerist impulse that
feeds the promiscuous divulgence of personal information similarly threatens
our rights as individuals and our collective welfare.
"Indeed, it may be more
threatening, as we mindlessly trade ninety-eight degrees of freedom for a bunch
of stuff we have been mesmerized into thinking costs us nothing.”