koredump enables easy access to core dumps in a Kubernetes cluster. REST API and command line tools are provided, that allow user to get information on core dumps in a cluster, and to download the core dump files.
Core dumps are captured and stored on disk by the infra platform, and koredump supports Red Hat OCP
that uses the systemd-coredump service.
- In-cluster
http://koreapi.koredump.svc.cluster.local:80REST API (Kubernetes Service). One container per cluster, application listening port 5000. - One REST API server per node in k8s cluster (Kubernetes DaemonSet), listening port 5001.
- No changes to platform
core_patternkernel config, use defaultsystemd-coredumpin OCP. - Access coredump files from
/var/lib/systemd/coredump, and (optionally) read journal logs for full coredump metadata written bysystemd-coredump. DAC_OVERRIDEcapability is used in container to access core dump files and journal logs.- Command line utility
koredumpctlthat uses the REST API. Automatically installed in OCP to/usr/local/bin/koredumpctlwith Kubernetes init container. - Note that in OCP core dumps are deleted by default after 3 days (see
systemd-tmpfiles --cat-config | grep core). - Collect all coredumps in cluster by default. Limit to predefined namespaces by setting
filter.namespaceRegexvariable when installing with Helm charts. - Token authentication for REST API. Server uses TokenReview to verify the token.
- Red Hat OCP
privilegedSecurity Context Constraint (SCC) is needed. - In-cluster traffic is unencrypted HTTP.
- Simple implementation with python3.
- Hard requirement on systemd-coredump, core files are processed from
/var/lib/systemd/coredumpdirectory only. Note that ifcore_patternis set e.g. to/tmp/coreor similar, the cores are written to container filesystem, and not visible via this tool. - Core file deletion not (yet) possible. (Host paths are read-only mounted into containers)
- REST API can return errors during installation and upgrade, when the koredump PODs are being terminated or created.
- systemd-coredump by default limits core size to maximum 2GB, larger core files are truncated.
Increase the limit by setting for example
ExternalSizeMax=32Gin /etc/systemd/coredump.conf (or add conf file in/etc/systemd/coredump.conf.d/).
JSON list of cores (metadata) available in cluster.
Example
bash-5.1$ curl -fsS -H "Authorization: Bearer $token" koreapi/apiv1/cores | jq
[
{
"ARCH": "x86_64",
"COREDUMP_CMDLINE": "/usr/bin/example -a -b -c",
"COREDUMP_COMM": "example",
...
"COREDUMP_SIGNAL": 24,
"COREDUMP_SIGNAL_NAME": "SIGXCPU",
"container": "ctr-ns1-example",
"id": "core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4",
"node": "ocp-example",
"pod": "pod-ns1-example-86b5c54447-lrbz2"
},
{
...
}
]
JSON metadata of single core file, identified by kubernetes node name, and core file ID.
Example
bash-5.1$ curl -fsS -H "Authorization: Bearer $token" koreapi/apiv1/cores/metadata/ocp-example/core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4 | jq
{
"ARCH": "x86_64
"COREDUMP_CMDLINE": "/usr/bin/example -a -b -c",
"COREDUMP_COMM": "example",
...
"COREDUMP_SIGNAL": 24,
"COREDUMP_SIGNAL_NAME": "SIGXCPU",
"container": "ctr-ns1-example",
"id": "core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4",
"node": "ocp-example",
"pod": "pod-ns1-example-86b5c54447-lrbz2"
}
Download core file, identified by kubernetes node name, and core file ID.
Example
bash-5.1$ curl -fvsS -O -H "Authorization: Bearer $token" koreapi/apiv1/cores/download/ocp-example/core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4 * Connected to koreapi (172.30.199.84) port 80 (#0) > GET /apiv1/cores/download/ocp-example/core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4 HTTP/1.1 > Host: koreapi > User-Agent: curl/7.79.1 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Server: gunicorn < Date: Fri, 14 Jan 2022 05:48:11 GMT < Connection: close < Content-Disposition: attachment; filename=core.example.9999.f1c1b6957ac9436d9113a86c8c905508.141241.1642081018000000.lz4 < Content-Type: application/octet-stream < Content-Length: 279816 < Last-Modified: Thu, 13 Jan 2022 12:29:50 GMT < Cache-Control: no-cache < * Closing connection 0
Install (in Red Hat OCP as core user):
oc new-project koredump
helm repo add koredump https://nokia.github.io/koredump/
helm repo update
helm install -n koredump koredump koredump/koredump
watch kubectl -n koredump get allUpgrade:
helm repo update
helm upgrade -n koredump koredump koredump/koredump
watch kubectl -n koredump get allTest with koredumpctl:
koredumpctl status
koredumpctl listExample koredumpctl list output:
$ koredumpctl list
- ID: core.prog.0.e36680b3d32e4f4f9899d72d34fe5fb3.207856.1638186984000000.lz4
Node: ocp-6
Pod: po-prog-oam-0
Container: ctr-prog
Namespace: demo
Image: image-registry.openshift-image-registry.svc:5000/demo/prog:1.2.0
Signal: SIGXCPU (24)
Timestamp: 2022-02-23T08:23:16Z
- ID: core.stunnel.9999.29162cb2ca0d4e1eb67a4ffb549ed670.2354652.1645604596000000.lz4
Node: ocp-6
Pod: po-cran1-stunnel-d897f48fd-8q68m
Container: ctr-cran1-stunnel
Namespace: demo
Image: image-registry.openshift-image-registry.svc:5000/demo/stunnel:2.4.0
Signal: SIGXCPU (24)
Timestamp: 2022-02-23T08:23:16Z
Uninstall:
helm uninstall koredump
rm /usr/local/bin/koredumpctlInstall from git repository:
git clone https://github.com/nokia/koredump.git
cd koredump
oc new-project koredump
helm install koredump charts/koredump/
watch kubectl get allRun API servers locally without Kubernetes, for example in Fedora:
NO_TOKENS=1 FLASK_ENV=development PORT=5001 DAEMONSET=1 FAKE_K8S=1 gunicorn --access-logfile=- app
NO_TOKENS=1 FLASK_ENV=development PORT=5000 KOREDUMP_DAEMONSET_PORT=5001 DAEMONSET=0 FAKE_K8S=1 gunicorn --access-logfile=- app