What is Data Harvesting, and Why is it an ‘Attack’?
by the DSA National Tech Committee
If you know anything about the business models of technology companies over the last decade, you’re probably already aware of data harvesting. It’s when an entity collects information about people without gaining those people’s informed consent to do so. It’s also a type of cyber-attack, one that every computer user is constantly subjected to, one that has been normalized to the point of being ‘just how the internet works’.
If you’re like I was a few years ago, you’re probably thinking sure, data harvesting is shady. But it’s a bit much to call it an ‘attack’. When a hacker is collecting my banking information to defraud me that’s an attack, and data harvesting is part of the process. But isn’t most data harvesting just Amazon learning to better advertise their USB cables? I mean, I dislike advertising as much as the next guy, gal, or en-pal,* but advertising’s not exactly an attack, right?
What I’ve since learned is that even these more mundane cases of data harvesting are always troubling, and often actively harmful. I’ll briefly mention a few anecdotes that have stuck with me, but they’re just the tip of the iceberg.
In 2012, Target’s predictive advertising used a teenager’s data to send her a coupon booklet. Harmless enough so far. However, her dad happened to open it and see coupons for baby clothes and cribs. That’s how he found out his daughter was pregnant.
In 2015, Ashley Madison, the dating site focused on extramarital affairs, was hacked and their users’ data was posted publicly online. Many marriages were (probably rightfully) ended as a result.
Between July and October 2019, the Firefox web browser blocked 450 billion trackers from collecting data on its users’ browsing history. The math came out to 175 trackers blocked per running version of Firefox per day. If a user was running Firefox on their laptop, desktop, and smartphone, that would add up to 525 trackers blocked for that user every day.
As of 2020, only 4 companies have had their total stock price exceed $1 trillion–Alphabet (Google’s parent company), Amazon, Apple, and Microsoft. Data harvesting is a key revenue source for all of these companies.
From the mid-2010’s until today, Myanmar has been engaged in a genocide against its Rohingya Muslim population. Facebook was warned that its platform was being used for anti-Rohingya hate speech as early as 2013, and since then Facebook has been a primary platform through which genocidal ideas spread. Facebook itself admitted it was too slow to respond.
You know those annoying tests you take to prove you’re a human, where it asks you to select all of the traffic lights or boats or whatnot? Websites store your response data and use it to train image-recognition AIs. Such AIs will be useful anywhere someone has the interest and money to have computers autonomously surveil their surroundings, such as the burguening smart car industry or the police-state industrial complex of ICE, the NSA, etc.
Everyone knows Amazon these days, but few outside of the tech industry know about Amazon Web Services (or AWS for short). This is a branch of Amazon that rents out fractions of their servers to whoever can pay, and some surprising people have ended up paying. Most of Netflix, Spotify, Pintrest, and Buzzfeed are run through AWS on Amazon hardware. Similarly, the CIA’s computer infrastructure, a prominent police body-cam database, and the US Navy’s logistics infrastructure are run on AWS. Amazon has market dominance in this industry, so if you have one computer that you want to communicate with others, and you don’t have the expertise or money to make it happen yourself, AWS is almost unavoidable.
The point of these anecdotes is not to give you anything approaching an overview of the true scale and stakes of data harvesting. Trust me, take what you assume it is and double it, at least.
Instead, my hope is that these anecdotes have left a germ of discomfort that might, now or in the future, grow into an awareness of what data harvesting really is. Mundane data harvesting, even when done the ‘proper’ way, reaches into the most intimate aspects of our lives (such as a teenager’s pregnancy). When this data is poorly secured, it can expose any number of secrets (just ask Ashley Madison’s users). Moreover, data harvesting is practically ubiquitous across the all levels of the internet (see Firefox’s 450 billion trackers blocked, and AWS’s monopoly on internet infrastructure). Our data can be put to the most disparate purposes, many of which we have no idea about (both smart cars and a ‘smart’ police state). And given the capitalist hellscape we’re living in, the companies overseeing this mundane data harvesting are actively incentivized to be as indescriminant as possible with our data (that’s how they make their trillions of dollars).
Instead of Vanderbilts or Morgans or Rockefellers, today we have Bezoses, Gateses, Pages and Brins, Jobses and Cooks. At the turn of the 20th century, the robber barons were the point of the capitalist spear subverting true democracy at home and abroad. If we’re not careful, these new data barons will do the same for the turn of the 21st century.
I mean, just think about this for a moment. Facebook was given a multi-year heads up on a brewing genocide, and Zuckerberg, Sheryl Sandberg, and their ilk couldn’t be assed to do a thing about it. All for what? To save a few bucks by not hiring the translators they’d need to properly oversee their platform. These are the sorts of people we as individuals, as organizations, and as nations are entrusting terrabyte upon terrabyte of data to. We need only lean in to see the blood on their hands.
If you’re worried that some of this blood might be yours–or that some of it might have gotten onto your own hands–I hope you will join me in my follow up article where I go over some practical handwashing advice (so to speak). These basic sanitary measures won’t cure us of data harvesting, but they can mitigate some of its harmful effects.
* I got the phrase ‘guys, gals, and en-pals’ from the Youtube channel Gutian. The channel-runner recently deleted the channel, but she may be coming back with a new channel soon.