Skip to content

Segfault on EstimateSnapshotSpace #106

@yazun

Description

@yazun

Looks like there is an another hanging bug related to the parallel path and aggregation. It happens occasionally, most-likely due to races and is also data dependent it seems so hard to create an repeatable example:

size = add_size(sizeof(SerializedSnapshotData),

mul_size(snap->xcnt, sizeof(TransactionId)));

where snap ptr became null regardless of the asserts at the beginning of the fn.

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: distefan_local surveys 192.168.169.251(44486) SELECT              '.
Program terminated with signal 11, Segmentation fault.
#0  EstimateSnapshotSpace (snap=0x0) at snapmgr.c:2314
2314    snapmgr.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 nspr-4.21.0-1.el7.x86_64 nss-3.44.0-7.el7_7.x86_64 nss-softokn-freebl-3.44.0-8.el7_7.x86_64 nss-util-3.44.0-4.el7_7.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  EstimateSnapshotSpace (snap=0x0) at snapmgr.c:2314
#1  0x0000000000b9c0c2 in InitializeParallelDSM () at parallel.c:241
#2  0x00000000009f7508 in ExecInitParallelPlan () at execParallel.c:756
#3  0x00000000009d351c in ExecGather (pstate=0x1ebb6e0) at nodeGather.c:183
#4  0x00000000009dc1fc in ExecProcNode (node=0x1ebb6e0) at ../../../src/include/executor/executor.h:275
#5  fetch_input_tuple (aggstate=aggstate@entry=0x1ebb038) at nodeAgg.c:739
#6  0x00000000009e30c8 in agg_fill_hash_table (aggstate=0x1ebb038) at nodeAgg.c:3487
#7  ExecAgg (pstate=0x1ebb038) at nodeAgg.c:3029
#8  0x00000000009fdcc6 in ExecProcNode (node=0x1ebb038) at ../../../src/include/executor/executor.h:275
#9  ExecutePlan (execute_once=<optimized out>, dest=0x7f52f580a090, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1ebb038, estate=<optimized out>) at execMain.c:2061
#10 standard_ExecutorRun (queryDesc=<optimized out>, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:471
#11 0x000000000076218c in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x1eba768) at execMain.c:414
#12 PortalRunSelect () at pquery.c:1715
#13 0x0000000000762be1 in PortalRun (portal=0x1ca62e8, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x7f52f580a090, altdest=0x7f52f580a090, completionTag=0x7ffceedd6020 "") at pquery.c:1356
#14 0x000000000076bf03 in exec_simple_query.lto_priv.0 () at postgres.c:1511
#15 0x0000000000765cc5 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:5456
#16 0x0000000000829f54 in BackendRun (port=0x1bc1200) at postmaster.c:4982
#17 BackendStartup (port=0x1bc1200) at postmaster.c:4654
#18 ServerLoop () at postmaster.c:1959
#19 0x000000000082b999 in PostmasterMain () at postmaster.c:1567
#20 0x00000000004f4a1d in main (argc=5, argv=0x1b96690) at main.c:233

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions