Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor nvidia-smi output to see GPU resource consumption #72

Open
samhodge-aiml opened this issue Mar 13, 2024 · 2 comments
Open

Monitor nvidia-smi output to see GPU resource consumption #72

samhodge-aiml opened this issue Mar 13, 2024 · 2 comments

Comments

@samhodge-aiml
Copy link

Is your feature request related to a problem? Please describe.
I need to see how much VRAM and GPU compute are being used by a process in a container, and have a historical record in a sql table to continue to narrow the gap between resources allocated and resources consumed

Describe the solution you'd like
I would like to be able to wrap the output of nvidia-smi and have it come out in the same dictionary or a side car type concept for the rest of the watchme metrics

Describe alternatives you've considered
Use the following https://github.com/petronny/nvsmi and dump that into a dictionary at the same time as the watchme decorator

Additional context
Getting computation to match the resources allocated closely is a problem with commercial value, anyone who makes use of GPUs should be interested in how much these resources are occupied because buying and renting them is not cheap

@samhodge-aiml
Copy link
Author

@vsoch
Copy link
Owner

vsoch commented Mar 15, 2024

hey @samhodge-aiml ! This seems like a cool idea (and simple to implement) but I'm not sure I'll have time to work on it soon - too many cool things going on <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants