Mar 6, 2025

Joining two streams of data with Flink SQL

There was a useful question on the Apache Flink Slack recently about joining data in Flink SQL:

How can I join two streams of data by id in Flink, to get a combined view of the latest data?

Mar 3, 2025

How to explode nested arrays with Flink SQL

Let’s imagine we’ve got a source of data with a nested array of multiple values. The data is from an IoT device. Each device has multiple sensors, each sensor provides a reading.

Feb 28, 2025

Exploring UK Environment Agency data in DuckDB and Rill

The UK Environment Agency publishes a feed of data relating to rainfall and river levels. As a prelude to building a streaming pipeline with this data, I wanted to understand the model of it first.

Feb 27, 2025

DuckDB tricks - renaming fields in a SELECT * across tables

I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted. The issue was, the same field name existing in more than one table. This meant that in the results from the query, it wasn’t clear which field came from which table:

Feb 3, 2025

Interesting links - February 2025

Here’s a bunch of interesting links and articles about data that I’ve come across recently.

Dec 19, 2024

Checkpoint Chronicle - December 2024

Note	This post originally appeared on the Decodable blog.

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling , Robin Moffatt (your editor-in-chief for this edition), and Hans-Peter Grahsl . Feel free to send our way any choice nuggets that you think we should feature in future editions.

Dec 11, 2024

Disabling Vale Linting Selectively in Asciidoc

I’m a HUGE fan of Docs as Code in general, and specifically tools like Vale that lint your prose for adherence to style rule.

One thing that had been bugging me though was how to selectively disable Vale for particular sections of a document. Usually linting issues should be addressed at root: either fix the prose, or update the style rule. Either it’s a rule, or it’s not, right?

Sometimes though I’ve found a need to make a particular exception to a rule, or simply needed to skip linting for a particular file. I was struggling with how to do this in Asciidoc. Despite the documentation showing how to, I could never get it to work reliably. Now I’ve taken some time to dig into it, I think I’ve finally understood :)

Dec 11, 2024

Exploring Flink CDC

Note	This post originally appeared on the Decodable blog.

Flink CDC is an interesting part of Apache Flink that I’ve been meaning to take a proper look at for some time now. Originally created by Ververica in 2021 and called “CDC Connectors for Apache Flink”, it was donated to live under the Apache Flink project in April 2024.

Nov 19, 2024

Streaming Data from Postgres to Snowflake with CDC and Decodable

Note	This post originally appeared on the Decodable blog.

In my last blog post I looked at why you might need CDC. In this post I’m going to put it into practice with probably the most common use case—extracting data from an operational transactional database to store somewhere else for analytics. I’m going to show Postgres to Snowflake, but the pattern is the same for pretty much any combination, such as MySQL to BigQuery, SQL Server to Redshift, and so on.

Oct 30, 2024

Checkpoint Chronicle - October 2024

Note	This post originally appeared on the Decodable blog.

Oct 15, 2024

Why Do I Need CDC?

Note	This post originally appeared on the Decodable blog.

Whether it’s Understanding CDC or CDC Explained or even Five things about CDC; number four will shock you!, the internet is awash with articles about Change Data Capture (CDC) . CDC is the process of incrementally extracting data change events as they occur within a database. In this post I’d like to take a step back from these and look at the reasons why you might even want to consider CDC in the first place. From here we’ll then build on the requirements we identify to come up with a proposed solution.

Sep 26, 2024

Checkpoint Chronicle - September 2024

Note	This post originally appeared on the Decodable blog.

Sep 19, 2024

Current 2024 Recap

Note	This post originally appeared on the Decodable blog.

Launched in 2022, the Current conference hosted by Confluent has established itself as one of the leading conferences in the data streaming space. Stemming from Kafka Summit originally, it’s broadened to reflect Confluent’s product portfolio including most notably Apache Flink.

Sep 2, 2024

Current 2024 - 5k Fun Run (or Walk)

At Current 24 a few of us will be going for an early run (or walk) on Tuesday morning. Everyone is very welcome!

Aug 27, 2024

Adventures with Apache Flink and Delta Lake

Note	This post originally appeared on the Decodable blog.

Delta Lake (or Delta, as it’s often shortened to) is an open-source project from the Linux Foundation that’s primarily backed by Databricks. It’s an open table format (OTF) similar in concept to Apache Iceberg and Apache Hudi. Having previously dug into using Iceberg with both Apache Flink and Decodable , I wanted to see what it was like to use Delta with Flink—and specifically, Flink SQL.

Aug 14, 2024

Declarative Resource Management for Real-time ETL with Decodable

Note	This post originally appeared on the Decodable blog.

So you’ve built your first real-time ETL pipeline with Decodable: congratulations! Now what?

Aug 6, 2024

Troubleshooting Flink SQL S3 problems

Note	This post originally appeared on the Decodable blog.

You’d think once was enough. Having already written about the trouble that I had getting Flink SQL to write to S3 (including on MinIO) this should now be a moot issue for me. Right? RIGHT?!

Jul 30, 2024

How to Migrate from Amazon MSF

Note	This post originally appeared on the Decodable blog.

Amazon Managed Service for Apache Flink (MSF) is one of several providers of hosted Flink. As my colleague Gunnar Morling described in his recent article , it can be used to run a Flink job that you’ve written in Java or Python (PyFlink). But did you know that this isn’t the only way—or perhaps even the best way—to have your Flink jobs run for you?

Jul 18, 2024

Sending Data to Apache Iceberg from Apache Kafka with Apache Flink

Note	This post originally appeared on the Decodable blog.

Sometimes it’s not possible to have too much of a good thing, and whilst this blog may look at first-glance rather similar to the one that I published just recently , today we’re looking at a 100% pure Apache solution. Because who knows, maybe you prefer rolling your own tech stacks instead of letting Decodable do it for you 😉.

Jul 2, 2024

Decodable vs. Amazon MSF: Getting Started with Flink SQL

Note	This post originally appeared on the Decodable blog.

One of the things that I love about SQL is the power that it gives you to work with data in a declarative manner. I want this thing…go do it. How should it do it? Well that’s the problem for the particular engine, not me. As a language with a pedigree of multiple decades and no sign of waning (despite a wobbly patch for some whilst NoSQL figured out they actually wanted to be NewSQL 😉), it’s the lingua franca of data systems.

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓