Skip to content

PG_LOGICAL_SLOT_PEEK_CHANGES query crashes #546

@carlos-pereira-costoya

Description

@carlos-pereira-costoya

PGSync version: 3.0.0
Postgres version: PostgreSQL 15.5 (AWS Aurora db.r7g.xlarge, with writer and reader nodes)
Elasticsearch/OpenSearch version: 2.1.1 (OpenSearch)
Redis version: 5.0.6 (ElastiCache)
Python version: 3.9

Error Message (if any):

image

Problem description:

Hi @toluaina,

We are experiencing issues in the following context:

  • A migration from PostgreSQL to Elastic is performed from scratch for all records without any problems (5 million), without having previous replication slots or indexes in OpenSearch.
  • Another block of 5 million records is added to the relational database system in a single commit.
  • The pgsync process is started and it fails when trying to retrieve the changes linked to the change slot.

We tried executing the query manually and encountered the same behavior:

SELECT count(*) AS count_1 
FROM PG_LOGICAL_SLOT_PEEK_CHANGES('coverage_street', NULL, NULL) 
WHERE CAST(CAST(xid AS TEXT) AS BIGINT) >= 10319409 AND CAST(CAST(xid AS TEXT) AS BIGINT) < 10725529

The query terminates because PostgreSQL cuts off communication after 15-20 minutes. During execution, it uses up to 40 GB of disk space in our case.

image

The only way I've seen it finish is by including the limit upto_nchanges as a parameter (I tried with 100).

In the postgresql there are two replication slots with 68 GB of data each one.

From what I've researched, this happens because the function PG_LOGICAL_SLOT_PEEK_CHANGES needs to convert the changes stored in files on disk into the storage of the database instance.

https://pgpedia.info/p/pg_logical_slot_peek_changes.html

Also, in the past, I've seen work being done to solve this in pgsync:

#267

However, I have not found the parameter PG_LOGICAL_SLOT_UPTO_NCHANGES in either the version I use (3.0.0) or the most recent version of pgsync, so I have not been able to test it. In other versions (3.0.0), there is the parameter PG_LOGICAL_SLOT_UPTO_NCHANGES, but it doesn't work either.

  1. Has this functionality been incorporated to obtain the changes from the slots in batch mode?
  2. Is this behavior common, or does pgsync not allow commits with a large number of records (> 5M)?
  3. Could having a cluster for the PostgreSQL affect the retrieval and processing of data in the temporary space?

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions