We started Twing Data because we believe that many of today’s data challenges can be solved by focusing on the fundamental building blocks behind our data systems: the actual SQL queries being run. Regardless of how they are generated, they are the source of truth and power our applications, businesses, and processes. By developing a deep understanding of the intent behind these queries, we'll be able to address many of today’s data challenges.
The industry has already invested half a century in SQL — it's time we go even further. Countless products exist to make the data we have more accessible and usable. The companies that succeed will be the ones that are able to take advantage of their data by going deeper down the stack, where the work actually happens, rather than sitting further up the stack.
Take a concrete example: data and semantic modeling. Nearly everyone has seen examples where two reports have contradictory information for the same metric because there were two different calculations done by two different teams. To handle these problems, companies have embraced the semantic layer to define these metrics in code and have a source controlled and audited source of truth. Unfortunately, the semantic layer only captures part of the data usage and even if addressed at a moment in time, there's no guarantee that the definitions will not diverge again. By seeing the metrics being calculated by active queries we can identify these discrepancies.
Similarly, modern data pipelines are complex and have data coming from multiple applications, vendors, and technologies. Data teams need ways to orchestrate this work and define the relationships between these sources in order to transform the data and make it useful. There are countless tools aimed to solve both of these problems but the solution already exists: SQL. Engineers, analysts, and data scientists have been running queries against databases before any of these tools existed, and by understanding the logic behind these queries, we can automatically build up the definitions and relationships of the data and semantic layers.
Now think of cost and performance optimization. We all want to reduce cost while improving performance, and we can do so by examining the queries that are running. It's very likely that our databases will have specific query patterns that account for the bulk of the usage and cost. Real value comes from understanding what these workloads are and what data they need. It's very possible that some fields are unused entirely, or that there are high cardinality fields that are barely used. In that case, it might make sense to create multiple fact tables designed for specific use cases. But we can only understand these tradeoffs when we have a deeper understanding of the underlying queries.
Lastly, we need to understand workflows. By examining the full set of queries, we can understand the relationship between them. We may discover that some tables are being updated hourly, but only used weekly to generate a report. Additionally, we can start to understand how customers use the data. If a sales rep looks at a particular dashboard, what do they do with those insights? Do they do any additional slicing or dicing? Who are the internal stakeholders, and how do they access the data? These insights allow data and analytics teams to think more like product managers and solve more meaningful problems.
We believe that over time, the data warehouse and BI tools will converge. While the glue connecting them will still need to exist, the connecting pieces will become increasingly automated. And it will be a welcome change: analytics and data teams will spend less time stringing systems and vendors together, and more time building valuable products and insights for the business. Our goal is to accelerate this trend by creating solutions that integrate the data warehouse with BI tools.
Thanks for keeping up with Twing Data. We’re excited to build better for data teams, and hope you’ll stay tuned.