Skip to main content

The Scope Of Big Data

Photo by koon boh Goh
Combining massive data warehouses, powerful supercomputers, and a global computing network infrastructure,  big data delivers a more tailored experience as we migrate ever more of our lives onto the internet. Big data is what adds value to connected our lives; data is fed to algorithms which in turn make an increasing amount of decisions on our behalf. But at what cost does this freedom from decision come?

It is common knowledge that Facebook extracts a fee from its users, but this fee is paid in data rather than in dollars. Despite overwhelming abuse of the use of customer data, people continue using the social networking behemoth. Recently, Facebook entered into negotiations with the FTC regarding a multi-billion dollar fine over its conditions for the company. These negotiations follow Facebook another multi-billion dollar fine in Europe. Facebook usage has not declined because it provides value to its users. Value is, of course, a subjective term but the consensus is that value is being provided.

Some use Facebook for quality endeavors like staying connected to people important in their lives. Others use it for less healthy reasons, involving misinformation, violence, or worse. Yet, even when used in unhealthy ways, people still enjoy the value it adds to the lives of its users. In exchange for this value, Facebook knows who you talk to, where you are, where you go, what your interests and preferences are, which political beliefs you are likely to support, and much more.

Technology is supposed to be neutral, with the relative good or evil dependent on how it is used. Technologists often use this trope to absolve themselves of the consequences of their creations. The creators of big data would prefer to remain oblivious to its negative effects, but they are increasingly being called to accept responsibility. In the United States, politicians on both sides are crying foul at the growing influence big data companies. Recently, presidential candidate Elizabeth Warren called for the breakup of several large tech companies, citing their potential monopoly powers.

Government crying foul of big data is ironic because the government itself is one of its biggest users. The National Security Agency (NSA) exists almost entirely to sift through the vast amounts of data that it collects on citizens. Data from the NSA is used by the US Cyber Command to wage information warfare. The NSA and the US Cyber Command operate hand-in-hand. Before you wage warfare you must have a target, which is the role of big data, as it separates friend from foe. Then, you must act on that data which the government does with aplomb.

Police in Canada, as recently reported by Motherboard, have been tracking citizen's behavior to identify negative and at-risk behavior of people deemed vulnerable. Behavior tracking is based off conversations between people, including children, with police, social workers, health services, and others. Unfortunately, the scandal is a repetition of many that have come before: this tracking is done without oversight or consent.

The consequence of this data gathering is serious, as it is used to predict whether or not a particular individual will become a criminal. According to the article "Data from the RTD (Risk-driven Tracking Database) is analyzed to identify trends—for example, a spike in drug use in a particular area—with the goal of producing planning data to deploy resources effectively, and create “community profiles” that could accelerate interventions under the Hub model, according to a 2015 Public Safety Canada report." This action could still be seen as a positive use for big data, as it aims to curb problematic behavior that is detrimental to both the individual and to society. However, the intended result is not at issue, the method in which it is achieved is the problem. Interventions with affected persons are the main use, but these interventions have led to jail time or involuntary hospitalization, for example. With such potentially dire consequences it is not unreasonable to demand oversight into the process.

Nefarious uses for the vast amounts of data being collected about our lives are all too plentiful. Often operating in secret without oversight, when discovered, the creators of these secret databases will espouse how they are defending the public good by fighting terrorists, for example. A source of consternation has been the No-Fly list. This is a list of people who are not permitted to fly into, out of, or within the United States. The prevailing concern is regarding how one is placed on this list, what can done to be removed from it, or even whether or not a person is on it. All of these are classified information.

Very recently, there has been some movement towards proper due process for the No-Fly list. Reuters reports that three plaintiffs were allowed to have a lawsuit proceed which seeks monetary damage for being included on the list. The lawsuit alleges that the plaintiffs are being denied their religious freedom in addition to having their livelihood put in jeopardy and being stigmatized in their community. The case of Tanvir v Tanzin et al, in the 2nd U.S. Circuit Court of Appeals is ongoing.

Financial transactions are inherently sensitive and are hardly immune from the realm of big data. In the wake of 9/11, the USA PATRIOT Act passed many provisions, one of which required financial institutions to "Know Your Customer" or KYC. Knowing your customer is intended to stem money laundering, and obviously to prevent money from going to terrorist organizations. For most people, KYC was little more than a one-time minor inconvenience requiring proof of identity to be provided to their financial institutions. KYC is not big data, per se. However, the outgrowth of KYC definitely falls within its purview.

Beyond knowing their customers, financial institutions have taken it upon themselves to profile their customers. Financial institutions have long profiled their customers, but in the past this was for business purposes. For example, someone with a lower income may be more interested in debt consolidation, or a high net-worth individual is more likely to be interested in private banking.

In the post-9/11 era, a new type of profiling has emerged and it strongly resembles what we know is happening up in Canada. Recently, TechCrunch reported on the watch list created by Dow Jones being leaked, due to an insecurity in their AWS Elasticsearch database. The data exposed outlines "...high-risk clients with detailed, up-to-date profiles on any individual or company in the database." It is known that in 2010 this database had approximately 650,000 records. The recent leak exposed 2.4 million records. According to TechCrunch, "Many of the individual records were sourced from Dow Jones’ Factiva news archive, which ingests data from many news sources — including the Dow Jones-owned The Wall Street Journal. But the very inclusion of a person or company’s name, or the reason why a name exists in the database, is proprietary and closely guarded." The parallels between the Dow Jones watch list and other big data repositories is obvious.

Interestingly, the data in the Dow Jones database was reportedly gathered from public sources, such as Factiva, a vast news ingestion service. Data in this database is used in decisions to approve or deny funding, and could result in shuttered bank accounts . It is reported that weak or sparse evidence can land a person in the watch list database. Dow Jones defends this database saying it " part of our risk and compliance feed product, which is entirely derived from publicly available sources." and the proceeded to blame the breach on an authorized third party.

There are several common threads in the discussion of the scope of big data. Little, if any, oversight of the data collected and its collection mechanisms is certainly paramount. Lack of visibility into what data is being collected is another concern. Additionally, there is much concern about how data is used, and re-used by third parties. Data security, or rather its lack of effectiveness, is alarming, as data breaches grow both in frequency and severity. There is an over-arching lack of accountability as big data brokers look to become as big as possible as quickly as possible, without regard for the harms they are inflicting on their users and society. Facebook is a popular punching bag of late, because that's the big data broker that most people are familiar with. There are so many more lurking in the shadows that most people don't even know exist.

Data regulation looms on the horizon, with GDPR from the EU leading the way. However, until people must be made aware of the value of their data and the ramification of its collection. We must embrace the notion that our data is our personal property which we should have control over. Until we do, we will be mired in confusion and real resolution on the issues of big data will elude us.

--Jay E. blogging for


Popular posts from this blog

Operator Overload

Photo by Oliver Sjostrom
Life has changed dramatically since the start of the personal computer revolution in the late 1970s. We have seen computing go from the realm of military to industry, then to the home and now to the pocket of our pants. Connected computers have followed the same path, as the Internet has forever changed the landscape of computing, including how people interact with it. Along the way, we've seen computing go from being rather anti-social to being a completely mainstream component of popular culture.

Let's pause for a moment and examine how technology migrated into being chic. In the late 1970s there was a lot of optimism around what computing technology could someday do for us and while many people were eager to learn, that number was still small. Computers were essentially souped up calculators and most people weren't eager to learn an arcane programming language or spend their free time noodling around with calculations.

One pivotal use case for t…

The Growing Disruption Of Artificial Intelligence

Photo by Frank Wang
Artificial intelligence may be as disruptive as the computers used to create it once were, and it could be even bigger. Given the disruption that social media has proven to be, one has to wonder if we are fully prepared for the life altering consequences we are building for ourselves.

IBM has been a key player in the artificial intelligence arena for over two decades. Deep Blue was their first tour de force in 1997, when its team of developers received $100,000 for defeating chess champion Gary Kasparov in a game of chess. That watershed moment has its roots all the way back in 1981 when researchers at Bell Labs developed a machine that achieved Master status in chess, for which they were awarded $5000. In 1988, researchers at Carnegie Melon University were awarded $10,000 for creating a machine that achieved international master status at chess. Deep Blue, however, was the first machine to beat the world chess champion.

Google has entered the fray as well, with th…

Law enforcement and DNA sequencing

DNA sequencing has risen in popularity in recent years to to the widespread availability of affordable testing kits. Obviously people are opting into participation by uploading their DNA data, in great numbers, but do they fully know how that data will be used?

The Golden State Killer, who terrorized California from the mid-1970s to the mid-1980s, was recently apprehended by working with a lesser known testing company, Family Tree DNA. Their kits are smaller and so is their database, but their database has an big advantage. The company boasts that it has the largest database Y-DNA database in the world. Y-DNA is very useful in tracing patrilineal ancestry, which is essentially data on who you are related to. This data is how the Golden State Killer was caught. Because some of his relatives had willingly participated in DNA kit testing, law enforcement was able to triangulate his identity.

Use of these databases by law enforcement is a new but already rapidly growing phenomenon. Gene …