Confluent Goes On Prem with Apache Flink Stream Processing

Organizations that want to take advantage of Apache Flink’s class-leading stream processing capabilities but don’t want to run the setup in the cloud were given another option today when Confluent announced the general availability of its on-prem version of Flink, called Confluent Platform for Apache Flink. The company also announced an addition to WarpStream, the Kafka protocol compatible streaming data system it bought three months ago.

Apache Flink has emerged as one of the leading frameworks for building distributed stream processing applications. The open source project, which just won a 2024 BigDATAwire Reader’s Choice Award, builds upon data streaming platforms, such as Apache Kafka, by enabling users to apply various operators upon the streaming data, thereby creating dataflows that are executed as part of a directed acyclic graph (DAG).

Thanks to its high performance, scalability, robust state management, and accessible APIs, companies like Uber, eBay, and Netflix rely on Flink to develop powerful real-time applications, such as fraud detection, personalized customer experience, anomaly detection, and live analytics. It’s not the only stream processing system, as Apache Spark has a larger share of the market, according to 6sense. But Flink’s backers, including project co-creator Robert Metzger, say Flink retains certain technical advantages over its competitors.

Those technical advantages certainly played a role in Confluent’s January 2023 decision to acquire Immerok, one of the leading developers of Flink. In March 2024, Confluent officially announced the addition of Flink support to Confluent Cloud, its fully managed cloud service for running Apache Kafka. It upgraded its Flink support this September with the addition of support for the Flink Table API, giving Java and Python developers more ways to tap into Flink’s capabilities. Today’s announcement of the general availability of Confluent Platform for Apache Flink completes the Flink offerings for Confluent.

Confluent Platform for Apache Flink is an offering that brings support for Flink to Confluent Platform, which is Confluent’s on-prem offering for enterprise Apache Kafka. The main advantage of Confluent Platform compared to its fully managed offering for Kafka and Flink, Confluent Cloud, is that customers can run it on their own hardware or in a virtual private cloud (VPC).

Confluent also announced that it has added Confluent Manager for Apache Flink (CMF) to the offering. CMF is designed to make deploying, updating, and scaling Flink as easy on-premises as it is in the cloud, Confluent says. It builds upon Kubernetes for scaling, provides role-based access control (RBAC) and encryption for tighter security, and ensures consistent configurations across the Confluent ecosystem, the company says in its release notes for Confluent Platform version 7.8.

The new Flink software will help to lower the bar for building real-time experiences, says Confluent Chief Product Officer Shaun Clowes.

“Stream processing is where the magic happens. It transforms real-time data into experiences and operations that drive modern businesses forward,” Clowes says in a press release. “With our latest announcement, any organization can take advantage of Apache Flink–scaling, securing, and managing it with ease–unlocking innovation without limits.”

WarpStream Orbit

Confluent also announced a new offering to its WarpStream line, which used to be a competing streaming data platform that was Kafka protocol-compliant before Confluent acquired it in September. Specifically, Confluent launched WarpStream Orbit, which it says will simplify migrations.

“Traditionally, it can be a challenging and manual process to migrate from open source Kafka to a BYOC [bring your own cloud] model because it involves navigating different Kafka environments and building custom solutions that increase time, costs, and data quality issues,” Confluent says in its press release. “WarpStream Orbit makes it easier than ever to move existing workloads from open source Kafka, or any Kafka-compatible service, to WarpStream clusters.”

While WarpStream is Kafka protocol compatible, it has dramatically different innards than the popular pub-sub system that Jay Kreps and company built for LinkedIn. The software is delivered as a single stateless Go library and sits atop S3 storage, eliminating the need to manage disks as well as brokers (no ZooKeeper, either). The creators of WarpStream claim that by streaming data directly into S3 tables, it’s 5-10x cheaper than Kafka to operate in the cloud, as it doesn’t suffer from the east-west movement of data, or inter-zone networking, in AWS data centers. The downside to WarpStream is that it adds additional latency to streaming data systems compared to Kafka.

“Kafka was designed to run in LinkedIn’s data centers, where the network engineers didn’t charge their application developers for moving data around,” WarpStream CEO and co-founder Richard Artoul wrote in his introductory blog. “But today, most Kafka users are running it on a public cloud, an environment with completely different constraints and cost models. Unfortunately, unless your organization can commit to 10s or 100s of millions of dollars per year in cloud spend, there is no escaping the physics of this problem.”

Confluent says WarpStream Orbit can be used either to simplify Kafka migrations or to create optimized Kafka clusters with tiered storage, which will reduce cost. It can also be used to bolster disaster recovery for existing Kafka clusters, the company says.

Confluent Upgrades Flink Support, Advancing Real-Time Data Streaming Capabilities

Confluent to Develop Apache Flink Offering with Acquisition of Immerok

The post Confluent Goes On Prem with Apache Flink Stream Processing appeared first on BigDATAwire.