Integrating Apache Flink with Kafka and PostgreSQL Using Docker

05 Jul 2024

Integrating pyFlink, Kafka, and PostgreSQL using Docker offers a fascinating journey into the world of real-time data processing. This setup is not just about connecting different technologies but ensuring they work seamlessly together to handle data efficiently. Here’s a detailed look at how this integration can be achieved, along with some practical insights and solutions to common issues.

Setting Up the Scene

The mission to integrate Apache Flink with Kafka and PostgreSQL using Docker is particularly exciting due to the use of pyFlink — the Python flavor of Flink. This setup aims to handle real-time data processing and storage efficiently. The infrastructure includes a publisher module that simulates IoT sensor messages. Inside the Docker container, two Kafka topics are created:

  • sensors: Stores incoming messages from IoT devices in real-time.
  • alerts: Receives filtered messages with temperatures above 30°C.

A Flink application consumes messages from the sensors topic, filters those with temperatures above 30°C, and publishes them to the alerts topic. Additionally, the Flink application inserts the consumed messages into a PostgreSQL table, allowing for structured data storage and further analysis. Visualization tools like Tableau or Power BI can connect to this data for real-time plotting and dashboards. The alerts topic can also be consumed by other clients to initiate actions based on the messages it holds, such as activating air conditioning systems or triggering fire safety protocols.

Issues With Kafka Ports in docker-compose.yml

Initially, I encountered problems with Kafka’s port configuration when using the confluentinc Kafka Docker image. This issue became apparent through the logs, emphasizing the importance of not running docker-compose up in detached mode (-d) during initial setup and troubleshooting phases. The failure was due to the internal and external hosts using the same port, leading to connectivity problems. I resolved this by changing the internal port to 19092:

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:19092,PLAINTEXT_HOST://localhost:9092

This adjustment ensured that Kafka could communicate effectively within the Docker environment.

Configuring Flink in Session Mode

Running Flink in session mode allows multiple jobs to run in a single cluster. The following directives in the docker-compose.yml file were used to achieve this:


flink:
  image: custom-flink-image
  ports:
    - "8081:8081"
  environment:
    - JOB_MANAGER_RPC_ADDRESS=jobmanager
    - TASK_MANAGER_NUMBER_OF_TASK_SLOTS=2

Custom Docker Image for PyFlink

Given the limitations of the default Flink Docker image, I created a custom Docker image for pyFlink. This image includes all necessary dependencies and configurations to run pyFlink applications smoothly. The custom image ensures that all components of the data streaming stack are compatible and optimized for performance.

By following these steps, you can build and experiment with this streaming pipeline yourself. For a complete setup, clone the provided repository and refer to the detailed instructions in the README file. This guide is perfect for both beginners and experienced developers looking to streamline their data streaming stack.

Top charts for

uTorrent

uTorrent

Latest update uTorrent download for free for Windows PC or Android mobile

5
1032 reviews
7508561
downloads
Zona

Zona

Latest update Zona download for free for Windows PC or Android mobile

4
614 reviews
1735392
downloads
WinRAR

WinRAR

Streamline file management with fast compression, secure your documents, and save space.

5
735 reviews
746730
downloads
Minecraft

Minecraft

Shape environments, explore vast worlds, and survive against monsters with endless creativity.

5
750 reviews
495921
downloads

News and reviews for

Visio 2021 Professional Now $9.97 Until February 8

Microsoft offers Visio 2021 Professional for $9.97, down from $249, with added templates, until February 8.

Read more

Code Vein Offers Stylish Combat, Discounted Editions

Code Vein captivates with anime-style combat and offers discounted editions. Fast-paced action meets fun builds in this cult classic.

Read more

Microsoft Phases Out RC4 in Kerberos for Windows Security

Microsoft to eliminate RC4 in Kerberos by July 2026, enhancing Windows security.

Read more

Highguard Faces Criticism but Shows Potential for Growth

Highguard, launched with controversy, holds potential despite poor reviews. Offering genre innovation, it aims to evolve against negative feedback.

Read more

PS2Recomp Boosts Native PS2 Games with Recompilation

PS2Recomp, a new tool, promises enhanced native PS2 game ports, sparking interest among developers for PC platforms.

Read more

NVIDIA Introduces RTX Remix Logic for Classic Game Mods

NVIDIA's RTX Remix Logic, launched on 2026-01-27, enables dynamic modding of classic PC games with a no-code node-based interface.

Read more

Windows 11 Update KB5074109 Affects Legacy Modems

The Windows 11 update KB5074109 disrupts modems by removing several legacy drivers, causing connectivity issues for select users.

Read more

Anytype Replaces Notion, Obsidian, and Todoist for Unified Workflow

Anytype consolidates Notion, Obsidian, and Todoist functions, reducing context-switching and improving workflow efficiency.

Read more

ReBlade: Cyberpunk Roguelike Announced by ChillyRoom

ReBlade from ChillyRoom and Spiral Up Games announced for PC: cyberpunk roguelike offers high-speed action in a dystopian setting.

Read more

Artorias Battles Elden Ring Bosses in New Video Showcase

Artorias from Dark Souls faces Elden Ring bosses, demonstrating impressive skills in Fights' YouTube video.

Read more