In our first instalment, we journeyed through the evolution of data storage from traditional Datawarehouse to data lakes, leading to the emergence of the Lakehouse architecture. In the 2nd Instalment, we unpacked the inner workings of the Data Lakehouse architecture, exploring the architecture principles, data ingestion and data storage components of a modern Lakehouse. In this 3rd instalment, we will learn about generating insights from a Lakehouse.

Data Insights from a Lakehouse

Data Analytics reminds us of a treasure hunt. It’s all about traversing through a forest of data to uncover valuable insights and the river (read domain) helping us to make sense of these insights. This journey is especially interesting in the context of a Lakehouse solution, which caters to two main groups of people: –

    • Technical Folks: These are tech wizards in your team who love diving deep with on-demand queries, exploring data, and even writing complex machine learning algorithms.
    • Functional Team Members: Think of them as the guides of the treasure hunt who understand what the forest is whispering with their reliance on standard reports and self-service business intelligence tools to make sense of things.

    •  

    Now, let’s learn about the two flavors of analytics you’ll typically find in a Lakehouse setup:

    1. Descriptive Analytics

     This type of analytics performs analytics on historical data by providing different points of view on the data. These points of view are created by aggregating and filtering the quantitative data (measure), and slicing the data across attributes of functional dimensions, such as profit centers, legal entities, customers, vendors and products. These analytics are delivered through excel files or visualization tools. Some examples of descriptive analytics are :

      • Imagine it’s the end-of-quarter chaos in the finance department. The team is scrambling to reconcile accounts and generate accurate financial statements. They’re staring at transactions in their ERP system, trying to figure out what’s still open. This is where descriptive analytics comes to their rescue! A technical analyst runs an ad-hoc SQL query and voilà – they have a clear picture of the status, helping the finance team close their books in record time.
      • Then there’s the routine stuff – generating reports for various teams and regulatory authorities. These reports are like clockwork, often with fixed formats that rarely change.

      •  

      2. Advanced Analytics

      This is where things get a bit futuristic. Think of advanced analytics as a crystal ball using machine learning to predict the future or unravel complex data relationships. This type of analytics deploys machine learning methods to perform analysis of data and predict what may happen based on historical data or extract complex mathematical relationships from data to generate insights. The process is quite a journey:

      a. Exploratory Data Analysis: It’s like mapping the terrain before treasure hunt begins.

      b. Data Preparation: Here, we’re crafting the tools (features) for the journey.

      c. Model Development: This is the experimental phase, trying different paths (algorithms) to see which one leads to treasure (best-fit model).

      d. Model Evaluation: We’re checking our treasure map (model) against an unexplored area (unseen data).

      e. Model Deployment: Finally, we put our map (model) in the hands of the explorers (users).

      Some examples of predictive analytics are :

        • Predicting cash flow trends or identifying unusual expense patterns by analyzing troves of data.
        • Machine Learning models also help in identifying the next best action for routine tasks such as cash collection and vendor payments.

        •  

        Enabling analytics capabilities in a data lakehouse

        Setting the analytics capability within a data lakehouse can be quite the journey. Let me walk you through how to support the analytics use-cases within the Publish Layer, which interacts with the batch processing and stream data processing layers.  The two components of the data analytics layer will be:

            1. Analytical Sandbox: This layer it’s a virtual playground for data wizards – data scientists and analysts who love to explore data to uncover insights as well as create adhoc queries. This layer should be able to built upon on-demand clusters such as Apache Spark or SQL clusters, which can be spun up on demand. Once the data is available in the Analytics sandbox, we need to give the power to query billions of data points in seconds, all from your laptop, using Jupyter notebooks or similar tools. It’s like giving a detective a magnifying glass, but for data. For advanced AI analytics use-cases, a cluster with GPU enabled can be instantiated. For AI/ML service, the use-cases will be data exploration and ML model development.

          Now, let’s talk about a real-life business scenario. Picture a Finance & Accounting team working on improving how payables for suppliers on time. Using this Analytical Sandbox, they can pull in vast amounts of supplier, supplies and contract data, and even email feeds to build predictive models. This is how teams start to anticipate the supplies delivered, on-going disputes and exceptions, clearing the path to supplier payments on time leading to great supplier satisfaction ration. It’s data science magic at work, and it’s all happening in the sandbox.

              1. Business Intelligence layer: This layer where data becomes digestible for everyone else in the company, not just the tech gurus. In this layer we turn those complex datasets into visually appealing reports and dashboards that can be as easy to read and make critical decisions based on this data. Tools such as PowerBI, Tableau, or Qlik shine here, allowing users to slice and dice the data however they want.

            Consider the finance department at a manufacturing company, for example. They use this layer to track the cost of raw materials over time, monitor inventory levels across global warehouses, and even predict future costs. By having this information at their fingertips, they can make informed decisions about where to invest or cut costs, ensuring the company remains competitive.

            But how does this all tie back to our lakehouse? Well, the beauty of a data lakehouse is that it’s designed to handle both the heavy lifting of data processing and the ease  of performing data analysis and reporting.

            In the realm of Business Operations, this setup enables scenarios where, at the end of the quarter, financial analysts can quickly reconcile accounts, spot any discrepancies, and ensure that financial statements are accurate and timely. Or consider the procurement team, which can now predict supply chain disruptions before they cause issues, thanks to real-time data analysis.

            In essence, by setting up these various layers within a data lakehouse, we’re not just storing data; we’re unlocking its potential to transform how businesses operate, make decisions, and even predict the future.

             

            Conclusion: The Technical Symphony of Lakehouses

            The Lakehouse architecture is a symphony of various technical components, each playing a crucial role in managing and processing data effectively. For organizations looking to harness their data’s full potential, understanding these components is key to leveraging the power of a Lakehouse. Midoffice Data stands at the forefront of this technological revolution, simplifying the complexity of Lakehouse architecture for businesses seeking agility, efficiency, and data-driven decision-making. We provide the end-to-end suite from data ingestion to storage and serving capabilities to meet the needs of enterprises complex data needs.

            Stay tuned for final part of this series to understand Data Security and Data Governance in a Lakehouse architectureMeanwhile, if you are interested to learn more about how astRai can play a pivotal part of your data strategy, reach out to us.

            Reference: What is a Data Lakehouse? (databricks.com)

            Data Lakehouse in Action by Pradeep Menon

            https://www.eckerson.com/articles/data-architecture-complex-vs-complicated

            Leave a Reply

            Your email address will not be published. Required fields are marked *