Big Data is a nebulous and sometimes scary monster. As a student funded under a “Big Data” National Science Foundation initiative, I have yet to find any (what I feel are) legitimate answers to the question, “What is Big Data?” Falling into computer science by chance and not by trade, I have the sneaking suspicion that like many other CS terms, “Big Data” falls into an “intimidation” term category. I certainly remember walking into my first computer science class, surely laden with children of engineers and scientists speaking in acronyms I couldn’t comprehend. As if calculus wasn’t enough, here I was, thrown to the sharks of gamers and super-hacker-pros.
Difficulties in defining Big Data comes from the size of effort required to reign in new data sources: from collecting the data, to organizing the data, to analyzing the data, and protecting and governing the data. Big Data is hard to define and hard for people to admit it’s hard to define especially when they’re a part of the field. The most pervasive issue I see about Big Data is not the defining of the term. The issue is the protecting and securing of Big Data.
In the field of cryptography, the strongest notions of security come from the idea that certain things are too hard for the average “adversary” (computer) to compute. These are the kinds of guarantees sought out in securing the worlds most valuable information. We should also think hard about how we protect our personal sensitive data day-to-day. Chances are you’re utilizing cutting-edge cryptographic technology directly without even knowing it.
I refer to cryptography, privacy, and security in the same breath. In short, security practices are how you achieve privacy or correct use of data. Cryptography is used in security schemes and protocols. How often do you read the terms of service of an app you download? Likely it is breaching your privacy in at least a handful of ways depending on your personal definition of privacy.
Here we reach an even more fundamental issue: the definition of privacy means many different things – informally and formally. In continuing installments of this blog, I will explore formal definitions of privacy and security in the field of databases and how the cryptographic lens is creeping in on the field. A lot for academic scientists are becoming involved in the blending of databases and security.
Data can be scary; especially the handling sensitive data. I hope to reach beyond explanations of theories and definitions and explain real-world applications of these concepts as I see them to and take some of the “intimidation” away from the subject. I welcome discussion and comments as to how these concepts are already applied. Some of the first topics I will touch on include, but are not limited to: differential privacy, blockchain technologies, smart contracts, smart co-processors, and workflow provenance privacy.