Humanity has passed the industrial age and is well into the information age. Data has become the lifeblood of our society and our economy. Like it or not, we are all interconnected by a vast network of information arteries that allows instantaneous communication. In recent years, self-organized mass social events such as the Occupy Movement, the Arab Spring, the Yellow Jackets, and the Zimbabwe uprising have seen a game of citizens using social media as a powerful democratic organizational tool. It is so powerful that governments sometimes respond by shuttering the internet or social media. Such is the power of the internet and real-time mass information flow.
Today the internet is about to enter a new era. With the dawn of IoT (Internet of Things), Blockchain and AI, machines are going to join this network in a way that will result in massive step change in the internet and data landscape. IoT is expected to reach 75.44 billion units worldwide by 2025 (Statistica.com), eclipsing computer sales. The micro-scale implications of this seismic macro-scale shift will be profound. As explained before, technology that moves this rapidly brings huge socio-economic-ecological changes, both beneficial and harmful. It is a power that is transforming lives, but it comes at a price.
If information is the currency of modernity, then decision-making is how we spend it. So, the question is: are we spending wisely? The explosion of technology is generating reams of data, but all the information in the world is of no use to us if we cannot sort through it, make sense of it or trust it. These massive mounds of unusable information are the digital equivalent of a hoarder. How many of us have emails that have never been pruned, with useless messages, spam or outdated information sitting somewhere in the cloud? How much unused data are companies collecting through social media data analytics and machine to machine systems? Here are some sobering facts about our growing data mountains, published by Forbes magazine in 2015:
- 90% of the world’s data has been generated in the past two years (Sintef, 2013)
- By 2020, we will have 6.1 billion smartphone users globally and 50 billion smart connected devices
- Google uses 1,000 computers to answer a single search query, taking no longer than 0.2 seconds to complete (with 3.5 billion searches a day in 2019, this is a major reason why IT is becoming a serious power hog)
- A typical Fortune 1000 company will generate $65 million of additional revenue by increasing data accessibility by 10%
- Retailers who leverage big data can increase operating margins by up to 60%
- 73% of organizations have already invested or plan to invest in big data in 2016
- Only 0.5% of all data has been analyzed and used
The first problem we will have to contend with is how to deal with the sheer volume of data. The unintended consequences of this mountain of data reach into many dimensions. For instance, the findings of Swedish researcher Anders Andrae shows that all of this data traffic could have a profound impact on total electricity usage and subsequently and carbon emissions.
If there are no interventions, this amount of data usage could consume a fifth of humanities’ electricity supply by as early as 2025. For organizations, inexpensive digital technology, high bandwidth internet, and the coming of IoT, A.I., and blockchain will create more data than we can deal with. How will we manage and make sense of all this data? This is important because it won’t be of much value to us if we can’t. As the above graph shows, we could be wasting vast physical resources if that data is not used effectively. We have to develop super-efficient hardware, data-miserly software, and data habits to apply whole new fields such as big data science, data analytics, and machine learning efficiently. This will produce valuable policy, business, and personal insights to support effective decision-making.
The second problem is information quality. With so much data coming from so many sources, data quality is rapidly becoming a significant issue. One of the most apparent data quality issues is the rapid emergence of the phenomena of fake news, a term which has become part of the lexicon of modernity. It reflects the ease with which anyone can use commonly available digital tools to create false information. The power of digital media tools now allows anyone with a bit of skill to manufacture any news, image, or video, and distribute it through a fake social media account. While anyone can fabricate a deception on the internet, it becomes problematic when the lie is state sponsored. What is even more disturbing is that after being exposed, state actors deny the accusations with plausible deniability. As a result, those bad actors hiding in the shadow areas of the dark web are actively shaping the information that vulnerable consumers digest, and advancing a narrative that aligns to their ulterior political motives.
Fake news has been thrust into the limelight by the US investigation of Russian interference in the 2016 US elections. An indication of the seriousness of the problem is the growing numbers of fact-checking software that has become available. The Duke Reporters’ Lab shows that from 2014 to 2018, the number of fact-checking programs tripled from 40 to 156. These tools may, unfortunately, become necessary parts of the future web. Still, they do not treat the problem, only the symptoms.
All these recent failures of our political decision-making process may be indicators that the very form of democracy we have been practicing may be fast becoming outdated. It is a wakeup call to adapt to the rapidly changing digital information landscape with new systems, or else risk a broken democracy.
Due to the fact that these cyber-attacks are not just limited to governments, it’s not just in politics where information quality issues have profound effects. In business, the spoils will go to those who learn how to effectively employ big data, and the analytics and AI engines that decipher what it all means. There are useful patterns hidden in all that data which can help businesses increase their bottom line. Acting on data that will allow a company to engage with customers more effectively, or tweak machines or system operations that will result in performance increases that will increase profits substantially for potentially little or no extra cost.
Oppositely, acting on bad data can have negative consequences. An Experian Data study found that bad data had a direct impact on the revenue of almost 90% of American businesses. IBM research showed that US organizations believe that 32% of their data is of poor
quality and accounted for an average of 12% revenue decline. IBM’s Big Data & Analytics Hub estimates that poor information quality costs US companies $3.1 trillion dollars annually (2016). A Gartner study found an almost similar number – 27% of the data in Fortune 1000 companies were reported to be of poor quality. The tool developed in this book will ensure that these issues are avoided.
The term “fake news” may be new, but the idea certainly isn’t. It can be argued that it has been around ever since humans began making general claims about the nature of reality. Two millennia ago, Aristotle observed that maggots seemed to generate on dead animal carcasses spontaneously and barnacles would form on the hull of boats, giving rise to the theory of the spontaneous generation of life. Even as late as the 1700s, Aristotle’s philosophy was upheld as truth. It took scientists like Louis Pasteur and the invention of the microscope to disprove the long-held theory.
Other theories, taken as credible at the time, have long since vanished. The Phlogiston theory of Johan Becher in 1667 held that any substance that combusted held a material without any detectable properties called, you guessed it, phlogiston. The luminiferous aether was another mysterious substance thought to pervade the entire universe, and even a vacuum was the medium which allowed light and electromagnetism to travel. It was impossible for scientists of the time to conceive of vibrations happening without a material medium.
If one gets the impression that scientific theories seem to be wrong quite often, it’s actually an accurate one. This should come as no surprise for anyone versed in science, for the veracity of scientific models is constantly being tested by new observations. The noted quantum physicist Richard Feynman said “We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress. “Theories are best guesses. They are predictive models constructed from a set of general assumptions, which can be wrong. Empirical scientists are working around the clock, 24/7 to unearth new observations in every nook and cranny of science. It comes as no surprise that some of those observations will contradict the predictions of the current model. This highlights the inherently risky business of science. With each new prediction, the chances of the model being wrong becomes more likely. If we are to trust the history of science, many of today’s currently accepted theories will be consigned to the garbage heap in a century’s time. Given this built-in transient nature of scientific knowledge, we can make the reasonable but counterintuitive claim that all science is ultimately wrong. We can guess that all scientific knowledge has a shelf life, we just don’t know what the expiry date is. Or maybe we do. In his book The Half-Life of Facts: Why Everything we Know Has an Expiration Date, Harvard mathematician Samuel Arbesman argues that all so-called “facts”, including scientific ones behave like radioactive substances and have a measurable half-life. Abersman provides some intriguing evidence to support his claim. “Facts”, Abersman claims, are changing all the time. Abersman’s unique contribution is that he has uncovered a predictable pattern to the way facts changes, grows and decays. Abersman is part of a new field of quantitative, meta-study of scientific ideas called scientometrics, which grew out of the field of quantitative library sciences called bibliometrics. There, the unit of measurement is the research paper. Back in the 70s when digital memory was not yet widely available, librarians notice the rapid growth of scientific knowledge and were concerned about the limited shelf space on their bookshelves. So they began to measure which scientific research papers and fields were growing the most rapidly. Abersman investigated the field of medical research and found that in the research on hepatitis and cirrhosis, scientometric research in the 1960s had already discovered the half-life of this field to be approximately 45 years. In other fields such as social sciences, the half life is even faster, due to the uncertainties of studying human behavior. In some physical science fields, meanwhile, the half-life can be much longer because the knowledge is very quantitative and well defined. Abersman also looked at the growth of knowledge and cites figures for doubling times of knowledge in various fields: medicine – 87 years, mathematics – 63 years, chemistry – 35 years, genetics – 32 years. Because there is so much to know, the way we deal with that is by specializing in niche areas.
Arbesman cites the 1960 research paper “The Dollars and Sense of Continuing Education,” a paper written by author Thomas Jones who calculated the effort it took an engineer to stay up-to-date, assuming a 10 year half life of knowledge. He calculated five hours/week for 48 weeks a year to stay current. A typical degree requires 4800 hours of work. Within 10 years, 2400 hours of that would have been obsoleted. A 40 year career requires 9600 hours of additional study to keep current. Modern estimates of half life are half this or less with higher number of hours of study. This is impractical, and the leading technology firms know it. Hence, leading tech firms like Google, Facebook and Amazon are biased to hiring recent graduates instead of retraining older tech workers.
What scientometric studies like these show is that scientific truth is ultimately always only provisional, but that doesn’t make current scientific knowledge useless. On the contrary, there is always a pragmatic utility in the present. The unavoidable cautionary tale is that there is always a price attached to it, and nature may recall her debt of the unknown at any moment through unintended consequences, progress traps.