Skip to main content

The Scope Of Big Data

Photo by koon boh Goh
Combining massive data warehouses, powerful supercomputers, and a global computing network infrastructure,  big data delivers a more tailored experience as we migrate ever more of our lives onto the internet. Big data is what adds value to connected our lives; data is fed to algorithms which in turn make an increasing amount of decisions on our behalf. But at what cost does this freedom from decision come?

It is common knowledge that Facebook extracts a fee from its users, but this fee is paid in data rather than in dollars. Despite overwhelming abuse of the use of customer data, people continue using the social networking behemoth. Recently, Facebook entered into negotiations with the FTC regarding a multi-billion dollar fine over its conditions for the company. These negotiations follow Facebook another multi-billion dollar fine in Europe. Facebook usage has not declined because it provides value to its users. Value is, of course, a subjective term but the consensus is that value is being provided.

Some use Facebook for quality endeavors like staying connected to people important in their lives. Others use it for less healthy reasons, involving misinformation, violence, or worse. Yet, even when used in unhealthy ways, people still enjoy the value it adds to the lives of its users. In exchange for this value, Facebook knows who you talk to, where you are, where you go, what your interests and preferences are, which political beliefs you are likely to support, and much more.

Technology is supposed to be neutral, with the relative good or evil dependent on how it is used. Technologists often use this trope to absolve themselves of the consequences of their creations. The creators of big data would prefer to remain oblivious to its negative effects, but they are increasingly being called to accept responsibility. In the United States, politicians on both sides are crying foul at the growing influence big data companies. Recently, presidential candidate Elizabeth Warren called for the breakup of several large tech companies, citing their potential monopoly powers.

Government crying foul of big data is ironic because the government itself is one of its biggest users. The National Security Agency (NSA) exists almost entirely to sift through the vast amounts of data that it collects on citizens. Data from the NSA is used by the US Cyber Command to wage information warfare. The NSA and the US Cyber Command operate hand-in-hand. Before you wage warfare you must have a target, which is the role of big data, as it separates friend from foe. Then, you must act on that data which the government does with aplomb.

Police in Canada, as recently reported by Motherboard, have been tracking citizen's behavior to identify negative and at-risk behavior of people deemed vulnerable. Behavior tracking is based off conversations between people, including children, with police, social workers, health services, and others. Unfortunately, the scandal is a repetition of many that have come before: this tracking is done without oversight or consent.

The consequence of this data gathering is serious, as it is used to predict whether or not a particular individual will become a criminal. According to the article "Data from the RTD (Risk-driven Tracking Database) is analyzed to identify trends—for example, a spike in drug use in a particular area—with the goal of producing planning data to deploy resources effectively, and create “community profiles” that could accelerate interventions under the Hub model, according to a 2015 Public Safety Canada report." This action could still be seen as a positive use for big data, as it aims to curb problematic behavior that is detrimental to both the individual and to society. However, the intended result is not at issue, the method in which it is achieved is the problem. Interventions with affected persons are the main use, but these interventions have led to jail time or involuntary hospitalization, for example. With such potentially dire consequences it is not unreasonable to demand oversight into the process.

Nefarious uses for the vast amounts of data being collected about our lives are all too plentiful. Often operating in secret without oversight, when discovered, the creators of these secret databases will espouse how they are defending the public good by fighting terrorists, for example. A source of consternation has been the No-Fly list. This is a list of people who are not permitted to fly into, out of, or within the United States. The prevailing concern is regarding how one is placed on this list, what can done to be removed from it, or even whether or not a person is on it. All of these are classified information.

Very recently, there has been some movement towards proper due process for the No-Fly list. Reuters reports that three plaintiffs were allowed to have a lawsuit proceed which seeks monetary damage for being included on the list. The lawsuit alleges that the plaintiffs are being denied their religious freedom in addition to having their livelihood put in jeopardy and being stigmatized in their community. The case of Tanvir v Tanzin et al, in the 2nd U.S. Circuit Court of Appeals is ongoing.

Financial transactions are inherently sensitive and are hardly immune from the realm of big data. In the wake of 9/11, the USA PATRIOT Act passed many provisions, one of which required financial institutions to "Know Your Customer" or KYC. Knowing your customer is intended to stem money laundering, and obviously to prevent money from going to terrorist organizations. For most people, KYC was little more than a one-time minor inconvenience requiring proof of identity to be provided to their financial institutions. KYC is not big data, per se. However, the outgrowth of KYC definitely falls within its purview.

Beyond knowing their customers, financial institutions have taken it upon themselves to profile their customers. Financial institutions have long profiled their customers, but in the past this was for business purposes. For example, someone with a lower income may be more interested in debt consolidation, or a high net-worth individual is more likely to be interested in private banking.

In the post-9/11 era, a new type of profiling has emerged and it strongly resembles what we know is happening up in Canada. Recently, TechCrunch reported on the watch list created by Dow Jones being leaked, due to an insecurity in their AWS Elasticsearch database. The data exposed outlines "...high-risk clients with detailed, up-to-date profiles on any individual or company in the database." It is known that in 2010 this database had approximately 650,000 records. The recent leak exposed 2.4 million records. According to TechCrunch, "Many of the individual records were sourced from Dow Jones’ Factiva news archive, which ingests data from many news sources — including the Dow Jones-owned The Wall Street Journal. But the very inclusion of a person or company’s name, or the reason why a name exists in the database, is proprietary and closely guarded." The parallels between the Dow Jones watch list and other big data repositories is obvious.

Interestingly, the data in the Dow Jones database was reportedly gathered from public sources, such as Factiva, a vast news ingestion service. Data in this database is used in decisions to approve or deny funding, and could result in shuttered bank accounts . It is reported that weak or sparse evidence can land a person in the watch list database. Dow Jones defends this database saying it " part of our risk and compliance feed product, which is entirely derived from publicly available sources." and the proceeded to blame the breach on an authorized third party.

There are several common threads in the discussion of the scope of big data. Little, if any, oversight of the data collected and its collection mechanisms is certainly paramount. Lack of visibility into what data is being collected is another concern. Additionally, there is much concern about how data is used, and re-used by third parties. Data security, or rather its lack of effectiveness, is alarming, as data breaches grow both in frequency and severity. There is an over-arching lack of accountability as big data brokers look to become as big as possible as quickly as possible, without regard for the harms they are inflicting on their users and society. Facebook is a popular punching bag of late, because that's the big data broker that most people are familiar with. There are so many more lurking in the shadows that most people don't even know exist.

Data regulation looms on the horizon, with GDPR from the EU leading the way. However, until people must be made aware of the value of their data and the ramification of its collection. We must embrace the notion that our data is our personal property which we should have control over. Until we do, we will be mired in confusion and real resolution on the issues of big data will elude us.

--Jay E. blogging for


Popular posts from this blog

The Growing Disruption Of Artificial Intelligence

Photo by Frank Wang Artificial intelligence may be as disruptive as the computers used to create it once were, and it could be even bigger. Given the disruption that social media has proven to be, one has to wonder if we are fully prepared for the life altering consequences we are building for ourselves. IBM has been a key player in the artificial intelligence arena for over two decades. Deep Blue was their first tour de force in 1997, when its team of developers received $100,000 for defeating chess champion Gary Kasparov in a game of chess. That watershed moment has its roots all the way back in 1981 when researchers at Bell Labs developed a machine that achieved Master status in chess, for which they were awarded $5000. In 1988, researchers at Carnegie Melon University were awarded $10,000 for creating a machine that achieved international master status at chess. Deep Blue, however, was the first machine to beat the world chess champion. Google has entered the fray as well,

Operator Overload

Photo by Oliver Sjostrom Life has changed dramatically since the start of the personal computer revolution in the late 1970s. We have seen computing go from the realm of military to industry, then to the home and now to the pocket of our pants. Connected computers have followed the same path, as the Internet has forever changed the landscape of computing, including how people interact with it. Along the way, we've seen computing go from being rather anti-social to being a completely mainstream component of popular culture. Let's pause for a moment and examine how technology migrated into being chic. In the late 1970s there was a lot of optimism around what computing technology could someday do for us and while many people were eager to learn, that number was still small. Computers were essentially souped up calculators and most people weren't eager to learn an arcane programming language or spend their free time noodling around with calculations. One pivotal use ca

On Homelessness

Photo by Quaz Amir It started yesterday, after work as I left my building, I saw them walking. A couple, hauling their belongings in a few neatly stacked boxes that looked like tackle boxes tied to a small luggage cart. The man had crossed the street, along with his dog who stayed faithfully by his side. An older woman was stuck at the intersection waiting for cars to stop. Before long, the cars did stop, she joined her partner, and I didn't spend much more time thinking about them that day. At my job today, I had a great morning. A coworker gave me a great idea for a quick but useful project, which I was able to finish before noon. I feel I am at my best when I am able to be productive. It gives me a sense of purpose for lack of a better word. Feeling good about myself, I set out to buy myself a hamburger for lunch and skip the more healthy option that I brought from home. I drove the short distance to the hamburger joint, the epitome of laziness. As I drove up, I saw the