We are living in an age of information that, even thirty years ago, people could not have possibly imagined. Type up a document at work, and you can access it at home with ease. Enter a restaurant, and your smartphone can recommend what you will order based on your past behavior. The power of big data means that everything we do will become interconnected, evolving from the days when conveying information between individuals takes considerable time and effort. Such widespread availability of data is not without its caveats though, given a single mistake can harm millions of people. This is indeed what happened with the 2012 breach of Dropbox, a popular cloud storage service, where the credentials of 68 million users were leaked. Leading on from such kinds of major incidents which will inevitably repeat in the future, society must familiarize itself with the nature of data in the modern world and the consequences of a mishap, be it a malicious attack or a hardware failure.
Cloud Computing and Big Data
The internet as a vehicle to store, transfer and process information is growing at an incredible rate without any sign of slowing down as its size exceeds one zettabyte – 1,000,000,000,000,000,000,000,000 bytes (“Titcomb,” 2016). For comparison, the total words ever spoken by humans – all the ingenuity and creativity of humanity – is estimated to amount to only 0.005 zettabytes (Klinkenborg, 2003). The range of applications for cloud computing extends from private file storage to hosting network infrastructure, or to facilitate collaborative data analytics for scientific or commercial purposes. These, however, need to be secured so that privacy of the content can be maintained.
Revolutionary to data security is the invention of public key encryption, facilitating convenient transfer of information between two parties without sacrificing safety. Before public keys, people used secret key encryption, which can be compared to a locked chest. A message, or a file (essentially a long sequence of 1s and 0s), is shut within the chest, and anyone with the ‘key’ can access it. The algorithm is secure, but the downside of secret key encryption is that; for two parties to communicate, they must have exchanged the ‘key’ beforehand, and one must imagine the exchange process could take a considerable amount of time.
Public key encryption was developed to resolve the dilemma. Using this method, individual devices broadcast a ‘public key’ which can be accessed by anyone, and it is used to encrypt messages. A ‘private key’ is kept by said individual device to decrypt the messages (Menezes et al., 1996). Returning to the analogy of the locked chest, anyone can put something in, but only the owner can access what is already in the chest. Thus, communication can be carried out without needing to have exchange secret keys beforehand. Given such an explanation of public key cryptography, the obvious question is whether a one-way encryption algorithm even exists and whether it might be a point of vulnerability for an attacker.
The mathematics behind the current algorithm is complicated, but its core can be simplified to prime factorization. Public key holders receive a large number, and the private key consists of the unique prime factors of that large number. Hackers decompose small integers into its prime factors requires repeatedly dividing the number by primes less than it until the required answer is obtained. However, such a method does not scale well. For example, if this kind of brute force attack is carried out on the RSA-1024 algorithm, assuming every single atom in the observable universe can serve as a CPU and each atom can perform one calculation every millisecond, a guaranteed cracking will take over 10211 years, which is approximately 10201 times the age of the universe (“StackExchange,” 2012). Other industry standard algorithms are similarly difficult to crack with brute force.
Intelligent hackers do not need to employ brute force nonetheless. They can take advantage of algorithmic weaknesses, as seen when vulnerabilities were revealed in MD5 and SHA-1, both hash functions previously thought to be secure (Black et al., 2006). Furthermore, the advent of quantum computing could render cryptography as we know it to be useless. Methods such as Shor’s Algorithm (Shor, 1997) have already been developed which can factorize integers in polynomial time. If quantum computing can be made mainstream, society could only hope for the invention of a new way of encryption before then.
However, despite the potential for security breaches because of malicious intent, the vast majority of incidents are accidental. That is to say, most digital misfortunes, especially the loss of files, can be blamed on human error. In fact, since much of humanity’s information is stored on the internet – and the proportion is only growing (“Cisco, “2016) – the real misfortune is often the permanent loss of innovation and creativity, a result of careless digital storage or technological illiteracy.
Economic Consequences on Individuals & Businesses
There is a certain irony to the situation when the backup of your data fails, but as anybody who has had the misfortune of facing such a scenario would know, it is no laughing matter. Yahoo’s multiple data breach scandals in 2013 and 2014 that have purportedly affected an upward figure of 32 million users and the Amazon cloud crash that wiped out 11 hours of data for many users are examples of horror stories that call into question the safety of online cloud services abound. Interestingly, usage of cloud computing has only increased over the years and is expected to reach a global outlay of $173 billion by 2026 (Columbus, 2017). This trend is similarly reflected in the workplace, where in Australia alone, statistics indicate that 86% businesses have embraced cloud computing (Porter, 2017).
The benefits of cloud computing are irrefutable. Storing information “in the cloud,” as the term goes, allows for greater and more convenient access to the given data across various platforms. For example, users of team-oriented services such as Google Docs, Basecamp or any other application in the plethora of available cloud-based tools can surely vouch for the convenience and efficiency they provide (“Benefits of cloud computing,” 2017). A glaring issue arises, however, from the risk above of data loss, where the exact repercussions aren’t exactly quantifiable (Whitaker, 2016). To begin with, users have to contend with the sheer frustration of losing something that may have taken hours to accomplish. Worse yet are the instances when something irreplaceable, say the only copy of a photograph, ends up disappearing into the void.
For businesses, the risks are even greater. Financial loss expectedly forms a large part of the equation. In a survey on behalf of the EMC Global Data Protection Index, data loss and downtime amounted to an approximate $65.5 billion losses for Australian businesses in 2014 alone (Connolly, 2014). Barring the direct costs of repair and restoration fees, businesses have to contend with productivity losses and reputation damage in the aftermath of a data loss scandal (Smith, 2017). If the figures above weren’t sobering enough, a separate study by The Diffusion Group shows that 60% of companies that suffered incidences of data loss face closure within six months; 72%, within 24 months (Hardoon, 2017). The possibilities of unintentionally breaching a contract or getting blacklisted within a particular industry are also added worries (“The True Cost of Data Loss for Businesses,” 2017).
In large part, the problem of data loss stems not from an inherent vulnerability in cloud technology but rather, users not taking sufficient precautions with their data. In the Diffusion Group study mentioned above, a startling 40% of the businesses surveyed did not use any backups, and even amongst those that did, the backups made were not fully recoverable (Hardoon, 2017). At a glance, such findings might seem puzzling, given the colossal amount businesses stand to lose in the event of data loss. However, if you consider how basic tasks such as accessing one’s email or shopping online still create issues for 66% and 59% Australians respectively, perhaps it shouldn’t be surprising (“Infographic: Australians and digital literacy, a consumer snapshot – Stories,” 2017). Digital literacy is an increasingly important skill in the current era of cutting-edge technological advancement. However, the demand for adequately technologically-savvy individuals far outstrips supply (Swan, 2015). This seems to be one of the main issues faced by businesses who adopt the cloud, where users are either incapable or unaware of proper precautions required to mitigate risks of data loss (Smith, 2017). As such, while it’s all well and good to encourage businesses to hop onto the cloud revolution bandwagon, perhaps closer attention ought to be given to how a particular cloud service might fit in and be integrated into the business’ workflow.
The Future of Data
Reading the news, big data has caused a tsunami of headlines, captivating public opinion through privacy debates, fear, and creating ease of consumerism like never before. In a recent Ted Talk, Stuart Lacey, highlighted two real-life anecdotes of current issues with data. The first – Facebook shadow profiles. Profiles created for potential users, by Facebook, whom Facebook has stored big data for. The users have not accessed Facebook or given them permission to use their data. The second story is of Samsung shipping out their new TV’s with the camera and microphone already enabled, allowing data to be tracked but also theoretically allowing someone to access both of these without the owner’s knowledge. Both stories have one central problem in common – privacy. With the exponential rate of big data application, how are we supposed to protect our privacy? How do we decide which data companies can access and which they cannot? Lacey says this will not be necessary for the future, arguing that the future of data will be to cut out the middleman (“The Future of,” 2015).
In the case of big data these middlemen are companies like Google and Facebook who provide us a service for free, but in return store and sell our data without our real consent. After goods and service providers buy this data, they use it to create targeted advertisements to sell us a product. How will Facebook and Google be cut out of the equation? Lacey argues that customers will have the option to provide their metadata to companies whom they believe have products they may want. These companies can then offer goods and services at a discounted price instead of passing off data storage and advertisement fees into their costs. However, this is a long-term solution. It will take a revolution similar to what Uber did to the taxi industry to interrupt the current marketplace. Since we are still very far away from reality without giant corporations controlling our data, it’s important to ask, in the short term, what control do users have over their data? Several companies have been developed to help users regain control, creating services called “data lockers” that enable users to store and protect personal data, allowing users to have greater control over who can analyze it (Palmer, 2013). This solution is a necessary building block to the future that Lacey speaks of.
It’s also important to consider another implication of data – its interaction with other future technologies such as Bitcoin. According to Andreas Antonopoulos, a leading expert in Bitcoin, Bitcoin is the financial manifestation of big data (Jeff, 2015). Bitcoin transactions themselves are big data as all transaction in the block chain can be viewed. If Bitcoin and the blockchain or even other cryptocurrencies become familiar, this will create analytics on a scale never seen before. In turn, this data could create more certain revenue streams, charging you for watching thirty minutes of a Netflix show or reading one-third of an article rather than on a subscription basis. Regardless of future big data scenarios or information technologies, there are several certainties we can have. The volume of data will grow, data analysis will improve, machine learning will increase, and privacy will still be an issue.
The CAINZ Digest is published by CAINZ, a student society affiliated with the Faculty of Business at the University of Melbourne. Opinions published are not necessarily those of the publishers, printers or editors. CAINZ and the University of Melbourne do not accept any responsibility for the accuracy of information contained in the publication.
Bachelor of Commerce student majoring in Accounting and Finance. Inclined towards topics that explore the intersection between current affairs, technology and the business world.
Master of International Business student in Melbourne Business School. Prior to his time in the MBS, he was working at the Institute of Social and Economic Research-Universitas Indonesia and has published several working papers on the implication of economic and social developments toward the perceived level of corruption in both developing and developed countries.