[Nvida_GPU] Enable TSDB #14262

Linu-Elias · 2025-06-19T09:24:05Z

Proposed commit message

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

[ ]

How to test this PR locally

Related issues

Screenshots

agithomas · 2025-06-20T06:46:12Z

I think, you may have to consider the list of common dimension fields/subset of them, additionally

agithomas · 2025-06-20T06:48:01Z

packages/nvidia_gpu/data_stream/stats/fields/fields.yml

+        - name: uuid
+          type: keyword
+          description: |
+            Nvidia GPU uuid


Suggested change

Nvidia GPU uuid

Nvidia GPU UUID

agithomas · 2025-06-20T06:48:42Z

packages/nvidia_gpu/data_stream/stats/fields/fields.yml

+        - name: instance
+          type: keyword
+          description: > 
+            Nvidia GPU instance name


Is this prometheus instance address or GPU name? This description is not very clear.

agithomas · 2025-06-20T06:51:00Z

packages/nvidia_gpu/data_stream/stats/elasticsearch/ingest_pipeline/default.yml

+    target_field: gpu.labels.instance
+    ignore_missing: true
+- rename:
+    field: prometheus.labels.job


Could there be more than one jobs in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension

I don't think there would be multiple job or instance values within a single GPU.
The job and instance field values are associated with the Prometheus endpoint.
For example, from the endpoint I’m currently using:

"instance": "192.168.0.192:9400" "job": "prometheus"

I'm not sure whether these should be considered dimensions, would appreciate your input on that.

Thanks for clarifying. It may be Ok to not make it as a dimension

agithomas · 2025-06-20T06:52:17Z

packages/nvidia_gpu/data_stream/stats/elasticsearch/ingest_pipeline/default.yml

+    field: prometheus.labels.pci_bus_id
+    target_field: gpu.labels.pci_bus_id
+    ignore_missing: true
+- rename:


Could there be more than one error codes in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension

@Linu-Elias , can you add the error code also as a dimension?

packages/nvidia_gpu/changelog.yml

Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>

elasticmachine · 2025-06-27T11:59:04Z

💚 Build Succeeded

Buildkite Build
Commit: 70dd03e

History

💚 Build #27746 succeeded ccecaaf
💚 Build #27737 succeeded f3be15f
💔 Build #27710 failed 667ef2d
💚 Build #27676 succeeded ad7e162
💚 Build #27622 succeeded 418ee7c
💚 Build #27580 succeeded a27a130

cc @Linu-Elias

elastic-sonarqube · 2025-06-27T11:59:05Z

Quality Gate failed

Failed conditions
72.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

agithomas

LGTM!

elastic-vault-github-plugin-prod · 2025-06-30T12:23:07Z

Package nvidia_gpu - 0.4.0 containing this change is available at https://epr.elastic.co/package/nvidia_gpu/0.4.0/

* Initial commit * field_name change * dashboard * dashboard fix * added screenshots * changelog * changelog * field name changes * build fix * field name changes * fix * dashboard fix * typo fix * resolved comments * dashboard fix * System tests added * changelog * docker image update * tsdb * resolve conflicts * added field * update gpu labels * update gpu labels * changelog * sample events * sample events * fix * sample file fix * update changelog * Update packages/nvidia_gpu/changelog.yml Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com> * add err_code dimension true * test update * ecs dimension * ecs dimension --------- Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>

Linu-Elias added 26 commits May 26, 2025 10:04

Initial commit

66a3c4c

field_name change

71b48c0

dashboard

d6f861f

dashboard fix

0f6de89

added screenshots

a74fb62

changelog

1e056b2

changelog

8e1956c

field name changes

886d93a

build fix

7278fde

field name changes

45ba34f

fix

d6b5a1a

dashboard fix

c58a726

typo fix

f48edc7

resolved comments

648da91

dashboard fix

4c5b6ed

System tests added

67ee2a5

changelog

e7a8413

docker image update

3c8957d

tsdb

5ef3eb2

merge main

24e58c7

resolve conflicts

e50943d

added field

4cc82e5

update gpu labels

2b7a5e5

Merge branch 'main' into nvidia_tsdb

583ca3a

update gpu labels

9ce49ba

Merge branch 'main' into nvidia_tsdb

032e714

Linu-Elias requested a review from a team as a code owner June 19, 2025 09:24

Linu-Elias self-assigned this Jun 19, 2025

Linu-Elias added 2 commits June 19, 2025 15:13

changelog

9773dba

sample events

83c41d8

sample events

23c002a

andrewkroh added dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:nvidia_gpu NVIDIA GPU Monitoring Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Jun 19, 2025

agithomas reviewed Jun 20, 2025

View reviewed changes

Linu-Elias added 3 commits June 23, 2025 18:00

fix

a1d89e1

sample file fix

a27a130

update changelog

418ee7c

agithomas reviewed Jun 25, 2025

View reviewed changes

packages/nvidia_gpu/changelog.yml Outdated Show resolved Hide resolved

Linu-Elias and others added 5 commits June 26, 2025 10:30

Update packages/nvidia_gpu/changelog.yml

ad7e162

Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>

add err_code dimension true

667ef2d

test update

f3be15f

ecs dimension

ccecaaf

ecs dimension

70dd03e

agithomas approved these changes Jun 30, 2025

View reviewed changes

Linu-Elias merged commit 20909f0 into elastic:main Jun 30, 2025
6 of 7 checks passed

andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Nvida_GPU] Enable TSDB #14262

[Nvida_GPU] Enable TSDB #14262

Uh oh!

Linu-Elias commented Jun 19, 2025 •

edited

Loading

Uh oh!

agithomas commented Jun 20, 2025

Uh oh!

agithomas Jun 20, 2025

Uh oh!

agithomas Jun 20, 2025

Uh oh!

agithomas Jun 20, 2025

Uh oh!

Linu-Elias Jun 23, 2025

Uh oh!

agithomas Jun 23, 2025

Uh oh!

agithomas Jun 20, 2025

Uh oh!

agithomas Jun 26, 2025

Uh oh!

Uh oh!

elasticmachine commented Jun 27, 2025

Uh oh!

elastic-sonarqube bot commented Jun 27, 2025

Uh oh!

agithomas left a comment

Uh oh!

Uh oh!

elastic-vault-github-plugin-prod bot commented Jun 30, 2025

Uh oh!

Uh oh!

[Nvida_GPU] Enable TSDB #14262

[Nvida_GPU] Enable TSDB #14262

Uh oh!

Conversation

Linu-Elias commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

agithomas commented Jun 20, 2025

Uh oh!

agithomas Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

agithomas Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

agithomas Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Linu-Elias Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

agithomas Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

agithomas Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

agithomas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticmachine commented Jun 27, 2025

💚 Build Succeeded

History

Uh oh!

elastic-sonarqube bot commented Jun 27, 2025

Quality Gate failed

Uh oh!

agithomas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elastic-vault-github-plugin-prod bot commented Jun 30, 2025

Uh oh!

Uh oh!

Linu-Elias commented Jun 19, 2025 •

edited

Loading