-
Notifications
You must be signed in to change notification settings - Fork 476
[Nvida_GPU] Enable TSDB #14262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Nvida_GPU] Enable TSDB #14262
Conversation
I think, you may have to consider the list of common dimension fields/subset of them, additionally |
- name: uuid | ||
type: keyword | ||
description: | | ||
Nvidia GPU uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvidia GPU uuid | |
Nvidia GPU UUID |
- name: instance | ||
type: keyword | ||
description: > | ||
Nvidia GPU instance name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this prometheus instance address or GPU name? This description is not very clear.
target_field: gpu.labels.instance | ||
ignore_missing: true | ||
- rename: | ||
field: prometheus.labels.job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be more than one jobs in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there would be multiple job or instance values within a single GPU.
The job and instance field values are associated with the Prometheus endpoint.
For example, from the endpoint I’m currently using:
"instance": "192.168.0.192:9400"
"job": "prometheus"
I'm not sure whether these should be considered dimensions, would appreciate your input on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. It may be Ok to not make it as a dimension
field: prometheus.labels.pci_bus_id | ||
target_field: gpu.labels.pci_bus_id | ||
ignore_missing: true | ||
- rename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be more than one error codes in a specific timeslice within a GPU (identified by UUID)? If yes, this must be a dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Linu-Elias , can you add the error code also as a dimension?
Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>
💚 Build Succeeded
History
cc @Linu-Elias |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Package nvidia_gpu - 0.4.0 containing this change is available at https://epr.elastic.co/package/nvidia_gpu/0.4.0/ |
* Initial commit * field_name change * dashboard * dashboard fix * added screenshots * changelog * changelog * field name changes * build fix * field name changes * fix * dashboard fix * typo fix * resolved comments * dashboard fix * System tests added * changelog * docker image update * tsdb * resolve conflicts * added field * update gpu labels * update gpu labels * changelog * sample events * sample events * fix * sample file fix * update changelog * Update packages/nvidia_gpu/changelog.yml Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com> * add err_code dimension true * test update * ecs dimension * ecs dimension --------- Co-authored-by: Agi K Thomas <101976829+agithomas@users.noreply.github.com>
Proposed commit message
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Related issues
Screenshots