Skip to content

Enable the project to be compatible with recent GPU architectures#32

Open
XaryLee wants to merge 7 commits intochrisdonahue:mainfrom
XaryLee:main
Open

Enable the project to be compatible with recent GPU architectures#32
XaryLee wants to merge 7 commits intochrisdonahue:mainfrom
XaryLee:main

Conversation

@XaryLee
Copy link

@XaryLee XaryLee commented Jan 28, 2024

Summary

This pull request updates the project's Dockerfile, associated shell scripts, and some parts of the codebase to ensure compatibility with the latest versions of the libraries used and to add support for recent GPU architectures.

Changes

  • Dockerfile: The base image has been updated to include the latest stable versions of required libraries. This ensures that the Docker container is now compatible with the latest updates and security patches. However, I found the URL for melisma was no longer functioning. Given that its sole purpose is to identify the key signature without affecting the actual recognition of the MIDI, I have refrained from installing it in the Dockerfile.
  • Shell Scripts: Adjustments were made to the startup and installation shell scripts to align with the updated library versions and to utilize new GPU features.
  • Codebase: Modifications in the code were made to adapt to the API changes introduced in the latest libraries, ensuring smooth integration and functionality.

Testing

  • The Docker container has been built and tested locally to confirm that all services start correctly and the application runs as expected.
  • Unit tests and integration tests have been run to ensure that all components behave as intended.
  • Performance tests have been conducted to validate the compatibility with recent GPU architectures (A100).

Benefits

  • Compatibility: With these updates, the project stays in sync with the latest library versions, avoiding any potential issues related to deprecated functions or security vulnerabilities.
  • Performance: The project can now take advantage of the performance improvements offered by the new GPU architecture, potentially leading to faster computations and better resource utilization.

Notes

  • Please review the changes in the Dockerfile and shell scripts carefully, as they involve updates to the system's environment and dependencies.
  • The code changes have been documented in the commit messages, but I'm happy to provide further explanations or discussions as needed.

Thank you for considering this pull request. I'm looking forward to your feedback and any further suggestions for improvement.

@lov3allmy
Copy link

lov3allmy commented Apr 27, 2024

I get "RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR" running your fork on 4070ti with cuda version 12.2.0. What could be going on?

@XaryLee
Copy link
Author

XaryLee commented Apr 27, 2024

@lov3allmy Hi, do you use docker to run the code?

@lov3allmy
Copy link

lov3allmy commented Apr 27, 2024

@XaryLee I haven't changed the code, so the versions are the same as they are written in the Dockerfile. I just launched the container by running this code:

cd ./docker
docker build -t sheetsage .
ROOT=https://raw.githubusercontent.com/chrisdonahue/sheetsage/main; wget $ROOT/prepare.sh && wget $ROOT/sheetsage.sh && chmod +x *.sh && ./prepare.sh

@XaryLee
Copy link
Author

XaryLee commented Apr 27, 2024

@lov3allmy Could you please provide more information about the error? I guess the context might help. In addition, I haven't test prepare.sh with the updated dockerfile, so I am uncertain about the script's compatibility with the new environment for seamless execution. Maybe you can attempt to run sheetsage.sh directly.

@lov3allmy
Copy link

I think I missed one error that occurred during the execution of the "docker build -t sheetsage ." command. I'll figure it out and try to run the script, then I'll tell you here if it worked

@lov3allmy
Copy link

I started processing, but now CUDA is signaling a lack of video memory. Although my graphics card has 12 GB. Unfortunately, I'm a backend developer and I don't understand ML and used Python only for school needs earlier. Are there any tricks that will allow me to solve this problem?

INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
  0%|                                                                                                                      | 0/8 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sheetsage/sheetsage/infer.py", line 837, in <module>
    lead_sheet, segment_beats, segment_beats_times = sheetsage(
  File "/sheetsage/sheetsage/infer.py", line 680, in sheetsage
    chunks_features = _extract_features(
  File "/sheetsage/sheetsage/infer.py", line 367, in _extract_features
    fr, feats = extractor(audio_path, offset=offset, duration=duration)
  File "/sheetsage/sheetsage/representations/jukebox.py", line 234, in __call__
    activations = self.lm_activations(
  File "/sheetsage/sheetsage/representations/jukebox.py", line 202, in lm_activations
    x_cond, y_cond, _ = self.lm.get_cond(None, self.lm.get_y(labels[-1], 0))
  File "/usr/local/lib/python3.10/dist-packages/jukebox/prior/prior.py", line 241, in get_cond
    y_cond, y_pos = self.y_emb(y) if self.y_cond else (None, None)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/jukebox/prior/conditioners.py", line 153, in forward
    pos_emb = self.total_length_emb(total_length) + self.absolute_pos_emb(start, end) + self.relative_pos_emb(start/total_length, end/total_length)
RuntimeError: CUDA out of memory. Tried to allocate 1.17 GiB (GPU 0; 11.71 GiB total capacity; 9.49 GiB already allocated; 47.25 MiB free; 9.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants