How Apache Iceberg transforms data integration and analysis

How Apache Iceberg transforms data integration and analysis..png

Are you today able to make decisions based on facts, based on data? Do you have an IT infrastructure which as soon as data is produced in your operational system, CRM, ERP, management solution which supports your production… – Others would even speak of industry 4.0. – Yes, are you equipped with a data platform capable of monitoring your activity over time? Such an infrastructure transports and consolidates your data in near real time. 

Apache Iceberg

Over time, Apache Iceberg has become the de facto standard for storing analytical data.

Born at Netflix and Apple, Apache Iceberg has won over the analytics manufacturers Snowflake, Google Cloud, AWS and Confluent step by step. Even Databricks, which promotes a competing format, has indicated that it is adopting it in an abstraction layer that is intended to be universal.

What value does Apache Iceberg provide?

When it comes to data management, it seems so obvious for everyone to be able to modify a table schema! However, if database management IT solutions are designed to evolve data
model structures, few organizations implement this type of process! Especially since authors like Scott Ambler, in Refactoring Databases, describe in great detail all the techniques that make a database “agile”.

This ability to scale, including scaling, is what Apache Iceberg brings you.

Here are some main features of Apache Iceberg:

Accuracy: Apache Iceberg manages write operations with the necessary isolation, and useful mechanisms that guarantee ACID transactions, as in a traditional database. Even in the event of a write failure, your data remains intact.

Scalability: Apache Iceberg offers flexibility in the definition of data structures such as MongoDB or Elasticsearch while relying on the rigor of modeling of the largest data warehouses. Apache Iceberg writes this data to distributed storage. Here again Apache Iceberg benefits from all the developments that have taken place within Big Data architecture and those of data lakes.

Performance: In computing, it is very expensive to open a file to read it. Of course, opening a file is no problem. However, when we scale up and the number of files explodes, we must ensure that this number is reduced. Here again Apache Iceberg provides answers to already existing solutions…

Openness: Of course Apache Iceberg is open source! Here, what must be understood is above all that sharing facilitates adoption in particular because Apache Iceberg interfaces with all development frameworks, analysis solutions... In other words: “No Vendor Locking”.

Real-time: taking into account performance on the one hand, and scalability on the other, Apache Iceberg fits perfectly into the near-real-time architecture. And so, here we are: Iceberg has established itself and become a standard.

In summary

The new facilities for developing real-time processing are relaunching this architecture and making it accessible to as many people as possible. Indeed today, we know how to source this directly from databases.

At Synaltic, we have often referred to the new Iceberg format in our communications... it’s because the Lakehouse Dremio is natively designed for Apache Iceberg! So we have the technology and an open source tool “ready to use”. Come and talk about it on the stand during these 2 days.