The disruptive potential of open data lakehouse architectures and IBM

There’s no debate that the volume and variety of data is exploding and that the associated costs are rising rapidly. The proliferation of data silos also inhibits the unification and enrichment of data which is essential to unlocking the new insights. Moreover, increased regulatory requirements make it harder for enterprises to democratize data access and scale the adoption of analytics and artificial intelligence (AI). Against this challenging backdrop, the sense of urgency has never been higher for businesses to leverage AI for competitive advantage.

The open data lakehouse solution

Previous attempts at addressing some of these challenges have failed to meet their promise. Enter the open data lakehouse. It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. The data lakehouse architecture combines the flexibility, scalability and cost advantages of data lakes with the performance, functionality and usability of data warehouses to deliver optimal price-performance for a variety of data, analytics and AI workloads.

To help organizations scale AI workloads, we recently announced IBM, a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Let’s dive into the analytics landscape and what makes unique.

Join us virtually at IBM watsonx Day

The analytics repositories market landscape

Currently, we see the lakehouse as an augmentation, not a replacement, of existing data stores, whether on-premises or in the cloud. A lakehouse should make it easy to combine new data from a variety of different sources, with mission critical data about customers and transactions that reside in existing repositories. New insights are found in the combination of new data with existing data, and the identification of new relationships. And AI, both supervised and unsupervised machine learning, is the best and sometimes only way to unlock these new insights at scale.

Analytics repositories market landscape

Many of our customers have analytics repositories such as data in analytics appliances on-premises, cloud data warehouses and data lakes. There are two major technology trends that have driven investments in analytics repositories recently: one, a move from on-premises to SaaS, and two, the proliferation and preference for open-source technologies over proprietary. As the performance and functionality gap between open data lakehouses and proprietary data warehouses continues to close, the lakehouse starts to compete with the warehouse for more workloads, while providing choice of tooling and optimal price-performance.

How does bring disruptive innovation to data management? is truly open and interoperable

The solution leverages not just open-source technologies, but those with open-source project governance and diverse communities of users and contributors, like Apache Iceberg and Presto, hosted by the Linux Foundation. supports a variety of query engines

Starting with Presto and Spark, provides for a breadth of workload coverage, ranging from big-data exploration, data transformation, AI model training and tuning, and interactive querying. IBM Db2 Warehouse and Netezza have also been enhanced to support the Iceberg open table format to coexist seamlessly as part of the lakehouse. is truly hybrid

It supports both SaaS and self-managed software deployment models, or a combination of both. This provides further opportunities for cost optimization. has built-in governance and automation

It facilitates self-service accessibility while ensuring security and regulatory compliance. Combined with the integration with Cloud Pak for Data and IBM Knowledge Catalog, it fits seamlessly into a data fabric architecture, enabling centralized data governance with automated local execution. is easy to deploy and use

Last but certainly not least, easily connects to existing data repositories, wherever they reside. It will leverage foundation models to power data exploration and enrichment from a conversational user interface so any user can become more data-driven in their work. put to work

Many of our customers have analytics appliances on-premises, and they’re interested in migrating some or all those workloads to SaaS. The easiest and most cost-effective way to do that is to leverage the compatibility of our cloud data warehouses. The value of scalable and elastic on-demand infrastructure and fully-managed services is higher, so the run-rate of a SaaS solution can be higher than that of an on-premises appliance. Therefore, customers are looking for ways to reduce costs. By augmenting a cloud data warehouse with, customers can convert or tier-down some of the historical data in the warehouse to the Iceberg open table format and preserve all the existing queries and workloads. This simultaneously reduces the cost of storage and makes that data accessible to new AI workloads in the lakehouse.

Going in the opposite direction, raw data can be landed in the lakehouse, cleansed and enriched cost effectively, and then promoted to the warehouse for high-performance queries that exceed the SLAs of the lakehouse engines today.

The decision is not whether to use a warehouse or a lakehouse. The best approach is to use a warehouse and a lakehouse; ideally a multi-engine lakehouse, to optimize the price-performance of all your workloads in a single, integrated solution. Add to that the ability to optimize deployment models across hybrid-cloud environments, and you have a foundational data management architecture for years to come.

In closing, I want to use an analogy to illustrate some of these key concepts. Imagine that a lakehouse architecture is like a network of highways, some have tolls and others are free. If there is traffic and you’re in a hurry, you’re happy to pay the toll to shorten your drive time—think of this as workloads with strict SLAs, like customer-facing applications or executive dashboards. But if you’re not in a hurry, you can take the freeway and save money. Think of this as all your other workloads where performance is not necessarily the driving factor, and you can reduce your costs by up to 50% by using a lakehouse engine instead of defaulting into a data warehouse.

I hope you are now as convinced as I am that the future of data management is lakehouse architectures. We hope you will join us at watsonx Day to explore the new watsonx solution and how it can optimize your AI efforts. 

Learn more about our active beta program

The post The disruptive potential of open data lakehouse architectures and IBM appeared first on IBM Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *