PG_LOGICAL_SLOT_PEEK_CHANGES query crashes

PGSync version: 3.0.0
Postgres version: PostgreSQL 15.5 (AWS Aurora db.r7g.xlarge, with writer and reader nodes)
Elasticsearch/OpenSearch version: 2.1.1 (OpenSearch)
Redis version: 5.0.6 (ElastiCache)
Python version: 3.9

Error Message (if any): 

![image](https://github.com/toluaina/pgsync/assets/45233029/5a2eb224-5fdd-470f-98e0-a9fbbcdc16e7)


Problem description:

Hi @toluaina, 

We are experiencing issues in the following context:
- A migration from PostgreSQL to Elastic is performed from scratch for all records without any problems (5 million), without having previous replication slots or indexes in OpenSearch.
- Another block of 5 million records is added to the relational database system in a single commit.
- The pgsync process is started and it fails when trying to retrieve the changes linked to the change slot.

We tried executing the query manually and encountered the same behavior:

```
SELECT count(*) AS count_1 
FROM PG_LOGICAL_SLOT_PEEK_CHANGES('coverage_street', NULL, NULL) 
WHERE CAST(CAST(xid AS TEXT) AS BIGINT) >= 10319409 AND CAST(CAST(xid AS TEXT) AS BIGINT) < 10725529
```

The query terminates because PostgreSQL cuts off communication after 15-20 minutes. During execution, it uses up to 40 GB of disk space in our case.

![image](https://github.com/toluaina/pgsync/assets/45233029/82ceba3a-7069-4a4c-9137-15a761d55eb3)

The only way I've seen it finish is by including the limit `upto_nchanges` as a parameter (I tried with 100).

In the postgresql there are two replication slots with 68 GB of data each one.

From what I've researched, this happens because the function `PG_LOGICAL_SLOT_PEEK_CHANGES` needs to convert the changes stored in files on disk into the storage of the database instance.

https://pgpedia.info/p/pg_logical_slot_peek_changes.html

Also, in the past, I've seen work being done to solve this in pgsync:

https://github.com/toluaina/pgsync/issues/267

However, I have not found the parameter `PG_LOGICAL_SLOT_UPTO_NCHANGES` in either the version I use (3.0.0) or the most recent version of pgsync, so I have not been able to test it. In other versions (3.0.0), there is the parameter `PG_LOGICAL_SLOT_UPTO_NCHANGES`, but it doesn't work either.

1. Has this functionality been incorporated to obtain the changes from the slots in batch mode?
2. Is this behavior common, or does pgsync not allow commits with a large number of records (> 5M)?
3. Could having a cluster for the PostgreSQL affect the retrieval and processing of data in the temporary space?

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PG_LOGICAL_SLOT_PEEK_CHANGES query crashes #546

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

PG_LOGICAL_SLOT_PEEK_CHANGES query crashes #546

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions