New Data Scientist: Data is a Monster

BLG02x - image-editBig Data is a nebulous and sometimes scary monster. As a student funded under a “Big Data” National Science Foundation initiative, I have yet to find any (what I feel are) legitimate answers to the question, “What is Big Data?” Falling into computer science by chance and not by trade, I have the sneaking suspicion that like many other CS terms, “Big Data” falls into an “intimidation” term category. I certainly remember walking into my first computer science class, surely laden with children of engineers and scientists speaking in acronyms I couldn’t comprehend. As if calculus wasn’t enough, here I was, thrown to the sharks of gamers and super-hacker-pros.

Difficulties in defining Big Data comes from the size of effort required to reign in new data sources: from collecting the data, to organizing the data, to analyzing the data, and protecting and governing the data. Big Data is hard to define and hard for people to admit it’s hard to define especially when they’re a part of the field. The most pervasive issue I see about Big Data is not the defining of the term. The issue is the protecting and securing of Big Data.

In the field of cryptography, the strongest notions of security come from the idea that certain things are too hard for the average “adversary” (computer) to compute. These are the kinds of guarantees sought out in securing the worlds most valuable information.  We should also think hard about how we protect our personal sensitive data day-to-day. Chances are you’re utilizing cutting-edge cryptographic technology directly without even knowing it.

I refer to cryptography, privacy, and security in the same breath. In short, security practices are how you achieve privacy or correct use of data. Cryptography is used in security schemes and protocols. How often do you read the terms of service of an app you download? Likely it is breaching your privacy in at least a handful of ways depending on your personal definition of privacy.

Here we reach an even more fundamental issue: the definition of privacy means many different things – informally and formally. In continuing installments of this blog, I will explore formal definitions of privacy and security in the field of databases and how the cryptographic lens is creeping in on the field. A lot for academic scientists are becoming involved in the blending of databases and security.

Data can be scary; especially the handling sensitive data. I hope to reach beyond explanations of theories and definitions and explain real-world applications of these concepts as I see them to and take some of the “intimidation” away from the subject. I welcome discussion and comments as to how these concepts are already applied. Some of the first topics I will touch on include, but are not limited to: differential privacy, blockchain technologies, smart contracts, smart co-processors, and workflow provenance privacy.


submit to reddit

About Haleigh Wright

Haleigh Wright is the Assistant to the Publisher of The Data Administration Newsletter ( Haleigh graduated from The University of Pittsburgh with a degree in Scientific Computing in 2015, and she is now pursuing her PhD in Computer Science at the University of California, Santa Barbara where she is an National Science Foundation IGERT Network Science Trainee. Her research interests include computational sustainability, data analysis, data security and privacy, network science, and cryptography. After Haleigh concludes her studies, she plans to enter Industry, with particular interest in entrepreneurial ventures.

  • Jay Kang

    I’m looking forward to reading your other postings. Very interesting! :)