-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[bug fix] fix multi-gpu model loading bug #1188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[bug fix] fix multi-gpu model loading bug #1188
Conversation
Summary of ChangesHello @billcao2000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug affecting multi-GPU model loading, specifically when using Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical bug in multi-GPU model loading where all weights were incorrectly placed on GPU 0, leading to OOM errors. The fix correctly identifies when running in a distributed environment and specifies the correct device for safetensors. The change is logical and effectively solves the issue. I have one suggestion regarding the placement of the new logic to improve code structure and maintainability.
diffsynth/core/vram/disk_map.py
Outdated
| import torch.distributed as dist | ||
| if dist.is_available() and dist.is_initialized() and str(self.device).strip() == "cuda": | ||
| self.device = f"cuda:{torch.cuda.current_device()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic correctly resolves the device for multi-GPU setups. However, placing it in flush_files means it will be executed every time this method is called (e.g., when the buffer is full), not just during initialization. Since the device for a distributed process is determined at startup and doesn't change, this check is only needed once.
For better code organization and to avoid redundant checks, I recommend moving this block to the __init__ method, right after self.device is assigned. This would centralize the instance's setup logic and make the purpose of flush_files clearer.
|
Hello, I have some suggestions that you can take a look at:
import torch.distributed as dist
from diffsynth.core.device.npu_compatible_device import get_device_name, IS_NPU_AVAILABLE, IS_CUDA_AVAILABLE
if dist.is_available() and dist.is_initialized() and (IS_CUDA_AVAILABLE or IS_NPU_AVAILABLE):
device = get_device_name() |
25f2736 to
2e525dc
Compare
2e525dc to
0bbf574
Compare
|
@Feng0w0
|
|
@billcao2000 |
As mentioned in the following issue, the current code has model loading bug when running on multi-gpus. When launching the script with torchrun, the model loading logic incorrectly places all model weights on GPU 0 (cuda:0). This centralizes the memory load on a single GPU and results in an OOM error.

#1075
I've modified the code in [diffsynth/core/vram/disk_map.py] . I updated the safetensors model loading method within the DiskMap class to ensure proper GPU device mapping when loading the model for multi-GPU inference.
Root cause:
The safetensors library interprets the device 'cuda' as Device 0 (cuda:0) by default. Unlike native PyTorch functions, it does not automatically follow the current device context set by torch.cuda.set_device().