Skip to content

Conversation

@haoliuu
Copy link
Collaborator

@haoliuu haoliuu commented Dec 12, 2025

Summary

This PR adds file integrity verification for model downloads from both Hugging Face and GitHub repositories using SHA-256 hashing.

Changes

Core Verification Logic

  • Added SHA-256 hash verification after model files are downloaded
  • For Hugging Face: Extracts SHA-256 from the LFS oid field in the API response
  • For GitHub: Detects Git LFS files and parses SHA-256 from LFS pointer files
image image

Model Download Enhancements

  • Added Verifying status to the download progress states
  • Added VerificationFailed status when hash mismatch is detected
  • Users can choose to keep or delete the model if verification fails

UI Updates

  • Download progress list now shows verification status
  • Added dialog for handling verification failures with clear options:
    • Delete model (recommended)
    • Keep anyway (user's choice)
    • Cancel (leave as-is)

XamlRoot = this.XamlRoot
};

var result = await dialog.ShowAsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you close the app under this logic, an intermediate state issue should occur.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I added cleanup logic to handle verification-failed models on app close.

}
},
PrimaryButtonText = "Delete model",
SecondaryButtonText = "Keep anyway",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the security rationale for allowing users to keep potentially
compromised models?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I didn't implement the "Keep" option, but I added it after considering that some users are developers who understand the risks. There are also legitimate scenarios where verification may fail (e.g., hash metadata out of sync on the source), and without this option, developers would have no way to use a model if verification consistently fails. The default action is "Delete" and keeping requires explicit user choice. Do you think we should not give developers any option to keep verification-failed files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Since our target audience is developers, I agree we shouldn't block them completely.
To mitigate the risk, I suggest 2 improvements:
1.If user chooses to "keep anyway", its status should not simply change to Completed (which makes it look identical to a normal model). It will be better to display an “Unverified” label or a warning icon in the UI to remind users that this model carries potential risks.
2. The current warning msg is quite technical but doesn't clearly explain the security implications to users who may not fully understand what match the expected hash means. maybe we can consider like: "This could indicate file tampering or corruption. You will be using an unverified model that may behave unexpectedly"

if (fileInfo.Length != downloadableFile.Size)
{
// file did not download properly, should retry
throw new IOException($"File size mismatch for {downloadableFile.Name}: expected {downloadableFile.Size}, got {fileInfo.Length}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this exception handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file size mismatch throws an IOException immediately, while SHA-256 verification failure allows user choice.
Should we handle size mismatches similarly (with user choice) or should both be treated as hard failures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The original code had an empty if block here. I'm trying to address this issue for the legacy code. Size imcomplete sounds more like a hard failure to me since the files will not be used. Any thoughts?

{
DownloadUrl = $"https://huggingface.co/{hfUrl.Organization}/{hfUrl.Repo}/resolve/{hfUrl.Ref}/{f.Path}",
Size = f.Size,
Size = f.Lfs?.Size ?? f.Size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFS size is always reliable and available? this will not break existing downloads where Lfs is null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The ?? operator should handle this: if Lfs is null (non-LFS files), it falls back to f.Size, which is the original logic. This change here is for some of my local HF validation logic which has been removed. We can revert it back. Let me clean it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants