Skip to content

Hugging Face models produce non-deterministic OCI digests #669

@ilopezluna

Description

@ilopezluna

Problem

Follow-up to #647 and #654.
PR #654 fixed the time.Now() usage in the config's Created timestamp, but pulling the same HuggingFace model at different times still produces different OCI digests (MODEL IDs).

> MODEL_RUNNER_HOST=http://localhost:13434 docker model inspect 01369020037d
{
    "id": "sha256:01369020037dae3876b603a977d9352102d1d0820ae71588745cf053b018fbd2",
    "tags": [
        "huggingface.co/mlx-community/llama-3.2-1b-instruct-4bit:latest"
    ],
    "created": 1741144511,
    "config": {
        "format": "safetensors",
        "quantization": "mixed",
        "parameters": "193.15M",
        "size": "695.28MB",
        "safetensors": {
            "format": "mlx",
            "tensor_count": "372"
        }
    }
}

> MODEL_RUNNER_HOST=http://localhost:13434 docker model rm hf.co/mlx-community/llama-3.2-1b-instruct-4bit
Untagged: huggingface.co/mlx-community/llama-3.2-1b-instruct-4bit:latest
Deleted: sha256:01369020037dae3876b603a977d9352102d1d0820ae71588745cf053b018fbd2
MODEL_RUNNER_HOST=http://localhost:13434 docker model pull hf.co/mlx-community/llama-3.2-1b-instruct-4bit
c07a9be597b7: Pull complete [==================================================>]     296B/296B
0a05be823898: Pull complete [==================================================>]  26.16kB/26.16kB
94d5c6ba8f6b: Pull complete [==================================================>]  54.56kB/54.56kB
587cb980af76: Pull complete [==================================================>]  1.121kB/1.121kB
b33563055168: Pull complete [==================================================>]  16.31kB/16.31kB
9d75c1098fc5: Pull complete [==================================================>]  695.3MB/695.3MB
224931eb13f8: Pull complete [==================================================>]  17.21MB/17.21MB
022d5ae3df47: Pull complete [==================================================>]  54.56kB/54.56kB
35e396644bca: Pull complete [==================================================>]  695.3MB/695.3MB
93b691182b6d: Pull complete [==================================================>]  17.26MB/17.26MB
Model pulled successfully

> MODEL_RUNNER_HOST=http://localhost:13434 docker model ls
MODEL NAME                                               PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID      CREATED        CONTEXT  SIZE      
functiongemma-vllm                                       268.10M     BF16                        c50d480b5bbc  5 weeks ago             536.22MB  
huggingface.co/mlx-community/llama-3.2-1b-instruct-4bit  193.15M     mixed                       9195f07d1792  11 months ago           695.28MB  

> MODEL_RUNNER_HOST=http://localhost:13434 docker model inspect 9195f07d1792
{
    "id": "sha256:9195f07d17926446d305b12561559aadee2e733d6179d63e050bed87435a9fb2",
    "tags": [
        "huggingface.co/mlx-community/llama-3.2-1b-instruct-4bit:latest"
    ],
    "created": 1741144511,
    "config": {
        "format": "safetensors",
        "quantization": "mixed",
        "parameters": "193.15M",
        "size": "695.28MB",
        "safetensors": {
            "format": "mlx",
            "tensor_count": "372"
        }
    }
}

The created field is now correctly deterministic, but other sources of non-determinism remain.

Cause

Downloaded files are written to a temp directory via os.Create(), which gives them the current wall-clock time as their filesystem ModTime.

Task

Use the repo's lastModified from the HuggingFace API.
Investigate for other non deterministic fields, the goal of the task is to have same hash for the same model pulled from Hugging Face

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions