Over the years, data-driven companies have followed four primary principles to democratize data access: 1) Migration to the Cloud, 2) Scalability, 3) Flexibility & Agility and 4) Availability.

These four principles have shaped the evolution of data analytics and data infrastructure: from the pre-2015 co-location of storage and compute in purely monolithic architectures, to the 2015-2020 separation of compute and storage in cloud data warehouses, and—finally—to today’s post-2020 separation of data and compute in data architectures.

Data now exists as its own, separate tier – built on open source standards and formats. The Data Tier brings data warehousing capabilities to the data lake and enables net-new capabilities that data warehouses cannot provide. It makes data highly accessible and flexible across multiple tools and engines and in general, makes data analytics and data architectures much more effective at meeting business requirements and providing business value, while at the same time being affordable, cost effective and operationally effective.


Read this eBook to learn:

  • How the modern data lake architecture consists of multiple layers of open source technologies that enable the data lake to provide data warehousing capabilities, as well as net-new capabilities DWs can’t provide.
  • How these open source technologies enable users to pick and choose the technology tools and engines best suited for the job, both today and tomorrow.
  • How industry standard open table formats (such as Iceberg) and open metastores (such as Nessie) provide an always-consistent view of data across different engines.
  • How the new data tier leverages open source technologies to bring synchronization and operational efficiency between the different tools and engines.
  • How new open metastores are leveraged to enable new functionalities that take advantage of the capabilities of the data lake architecture.