Skip to content

Conversation

Linu-Elias
Copy link
Contributor

@Linu-Elias Linu-Elias commented Jun 19, 2025

Proposed commit message

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

image (1)

@Linu-Elias Linu-Elias requested a review from a team as a code owner June 19, 2025 09:24
@Linu-Elias Linu-Elias self-assigned this Jun 19, 2025
@andrewkroh andrewkroh added dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:nvidia_gpu NVIDIA GPU Monitoring Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Jun 19, 2025
@agithomas
Copy link
Contributor

I think, you may have to consider the list of common dimension fields/subset of them, additionally

- name: uuid
type: keyword
description: |
Nvidia GPU uuid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nvidia GPU uuid
Nvidia GPU UUID

- name: instance
type: keyword
description: >
Nvidia GPU instance name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this prometheus instance address or GPU name? This description is not very clear.

target_field: gpu.labels.instance
ignore_missing: true
- rename:
field: prometheus.labels.job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be more than one jobs in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be multiple job or instance values within a single GPU.
The job and instance field values are associated with the Prometheus endpoint.
For example, from the endpoint I’m currently using:

"instance": "192.168.0.192:9400"
"job": "prometheus"

I'm not sure whether these should be considered dimensions, would appreciate your input on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. It may be Ok to not make it as a dimension

field: prometheus.labels.pci_bus_id
target_field: gpu.labels.pci_bus_id
ignore_missing: true
- rename:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be more than one error codes in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Linu-Elias , can you add the error code also as a dimension?

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @Linu-Elias

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
72.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Linu-Elias Linu-Elias merged commit 20909f0 into elastic:main Jun 30, 2025
6 of 7 checks passed
@elastic-vault-github-plugin-prod

Package nvidia_gpu - 0.4.0 containing this change is available at https://epr.elastic.co/package/nvidia_gpu/0.4.0/

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jul 1, 2025
robester0403 pushed a commit to robester0403/integrations that referenced this pull request Jul 8, 2025
* Initial commit

* field_name change

* dashboard

* dashboard fix

* added screenshots

* changelog

* changelog

* field name changes

* build fix

* field name changes

* fix

* dashboard fix

* typo fix

* resolved comments

* dashboard fix

* System tests added

* changelog

* docker image  update

* tsdb

* resolve conflicts

* added field

* update gpu labels

* update gpu labels

* changelog

* sample events

* sample events

* fix

* sample file fix

* update changelog

* Update packages/nvidia_gpu/changelog.yml

Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>

* add err_code dimension true

* test update

* ecs dimension

* ecs dimension

---------

Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Relates to a Kibana dashboard bug, enhancement, or modification. documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:nvidia_gpu NVIDIA GPU Monitoring Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants