Innovation in Data Management and Cloud Computing is a welcome disruption of the Database industry. Yet, not all that glitters is gold. Increasing complexity and chaos of Engineered Data Systems may also be disrupting productivity.
As we binge-watched Sons of Anarchy, celebrating the not so easy-rider world of outlaw bikers with light beer and Cheetos, one of the show's characters lamented:
" ..Einstein once said that any intelligent fool can make things bigger and more complex.. But it takes a touch of genius - and a lot of courage to move in the opposite direction.. "
The show was instantly paused and a lively debate ensued. Turns out that data geeks and solution architects, all one-percenters in our own right, have a lot more in common with the culture of motorcycle clubs then we knew. We meet in groups to consume large quantities of cheap beer. We are a brotherhood of data enthusiasts, united by the problems we solve, and we even have club colors and uniforms (khakis, blue oxfords and card-keys). We may not ride together but we surf (the web) in packs. We fight passionately over topics most people don't care about. Opinions and architectures get squashed by rival clubs. New members are patched-in if they have the technical chops to go toe-to-toe with seasoned technology hard-liners; our own men of mayhem. And we endlessly tinker with hardware or software just for the sake of technological elegance. But this debate was not about that.
For starters, the show misquoted Einstein. He never said this. The actual quote comes from E.F Schumacher's 1973 book Small is Beautiful: A Study of Economics As If People Mattered. Its an interesting read for those interested in humanism and so-called Buddhist Economics. But his main observations have turned out to be quite prophetic. Among other things, Schumacher argues that technological "solutions" and innovations that degrade social structure are useless. He also points out that cultivation of intelligence results in an increased reliance on external resources that you cannot control.
In modern terms, technological innovation often empowers the individual but not the team, thereby destroying the social structure of a business (its #1 value). Isolated employee groups tend to form rival factions with common goals that require their own process and data controls, resulting in multiple copies of data and multiple versions of 'truth'. As more knowledge and decision-making information is required, reliance on external data sources increases and control goes to zero. Thanks to individual empowerment through innovation, data gets bigger while data systems become more complex and unreliable.
So there we were, a brotherhood of intelligent fools, arguing that bigger is better and trying to understand where all this data flow chaos has come from. How was it possible that small, and relatively cheap devices like smart-phones and tablets offered a broad choice of applications giving us a real-time, 360 view of our lives; while at the office, the same knowledge workers struggled to develop simple analytic applications that handled just a fraction of the data? Why were we drowning in process, faulty documentation and bad data, having spent millions to get it right? What happened to productivity?
This was turning into a 4 beer discussion and we have clearly hit the Ballmer Peak. We needed pizza and white boards and the room was already awash in the soft mescaline glow of a Python console downloading the latest Pandas and SciPy extensions. Jax Teller would have to sit this one out. I sent the prospect out for 2 large pies and called for a Mayhem Vote.
Emergence of Dataflow Chaos
To understand how we got here, I had to go back to the early days of IT, a simpler time when streams just meant running data for the business - moving information between Systems of Record and On-Line Analytics (OLTP) data stores. Data management was somewhat centralized back then, and focused on relational databases. As such, the database became the primary tool for data integration. Need to prep data for analysis? Build a new database and load all your stuff into one schema. The practice became so common, an entire industry of data warehousing, data marts and ETL tools emerged to assist the process. Great for short-term delivery dates, but probably not so good for long term business value. Data copying works well for the individuals but not for the larger group.
Over time, as the number of data copies increased cost of ownership became astronomical. The advent of messaging technologies and data replication tools set a lot of data in-motion. Broad adoption of Service Oriented Architecture helped rationalize sprawling, distributed systems and complex data flows became the new normal. Lack of adequate tools for managing and modeling data in-flight made chaos, redundancy and waste a part of every data integration project.
Enterprise Data Systems began to suffer from three common afflictions:
As centralized data management gave way to engineered 'big data' systems assembled from a complex web of open source and commercial products, the so-called data sprawl set in. This was the result of data copies, and syndicated information sources scattered across many locations within an enterprise. As Schumacher predicted, the desire for more accurate analytics created a need for more diverse, external data.
Business intelligence solutions no longer relied on data that was mostly internal, often co-opting useful information from disparate sources. While freely available to users such data often contained sensitive, non-vetted information. As such, its availability was not openly advertised within an organization. This resulted in a so-called Dark Data phenomenon, wherein useful information critical to decision making was simply not well-advertised. Such 'security through obscurity' practices are quite common and contribute to other forms of data chaos.
Drift is the gap in data completeness, accuracy and consistence. Its what happens when copies of data independently change over time, and it has become a fact of modern life. Versions and variety of data flows from new digital systems, IoT sensors, enterprise sources and mobile app communications all contribute to Data Drift. Unexpected alteration in schema and semantics that occur across these systems undermine data quality, often leading to a Tower of Babel effect. Data Drift is the silent project killer. It often surfaces after a system goes into production or worse, when it is discovered by customers.
In the real-time world of instant digital gratification, Data Latency is the delay in information delivery that often causes critical data to lose value. Imagine looking up driving directions while speeding down the freeway. The right turn-off is often in the rear-view mirror. You're trying to take the right exit but you're stuck in a Recalculating loop. New directions become useless as fast as you look them up. Data Latency results in stale information and hind-sight analysis. From missing game scores or financial market opportunities, to real-time fraud detection, equipment and resource management, Data Latency can have a huge impact on the value of things were trying to measure, manage or use. If it ain't real-time, it's history.
Together, data sprawl, drift and latency create a nightmare for managing and visualizing data movement across the enterprise. With complexity added at each architecture layer, data anarchy is the new normal. And when it comes to streaming data, lack of tools, and best practices, can make an architecture ugly, fast. It may in fact result in every component stopping simultaneously and crashing at the speed of light. Yes, crossing the streams is still bad.
Small is Beautiful
So, at the end of our detour, we discovered that the rise of Big Data sources, multi-cloud architectures and real-time applications created multiple dimensions of growing complexity. These days, critical information doesn't stay in one place for too long. Either it's coming in from some data stream, being enriched and refined or being prepared for send-off to another system. When it comes to data in-motion, it seems that streams of anarchy are here to stay.
Okay. So Einstein never said making stuff smaller was a touch of genius. But he did say that light is both a wave and a particle. And your critical data is a lot like that – it is both at-rest and in-motion. Working with information as it flows thru an organization requires a new form of technical elegance.
Any technology that can handle dual-state data would reduce the number of moving parts and reliance on data engineering experts to get the job done. It would enable collaboration between data architects, business users and operations teams, enhancing the social structure of an organization and thereby improving group productivity. Developing such tools may indeed require bold moves and a lot of smarts. But in the end, simplified architecture and a unified approach can have far reaching benefits for businesses and their customers.