Skip to content

Conversation

@tkocmathla
Copy link

Fixes #4210.

@sjarus sjarus requested a review from zjgarvey December 9, 2025 17:47
Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for finding this. It does seem strange that we would be checking for an explicit args.temp_dir for the external data.

I have a few comments. Most importantly, I think we should find a way to test without needing to generate a 2GB random tensor.

mlir_file = run_path / f"{model_name}-explicit_temp_implicit_data.torch.mlir"
onnx.save(onnx_model, model_file)
temp_dir = run_path / "temp"
temp_dir.mkdir(exist_ok=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be good to use tempfile.TemporaryDirectory in case there is an exception thrown before the temp files are cleaned up by the importer.

Comment on lines +94 to +97
byte_size = numpy.dtype(dtype).itemsize
tensor_size = onnx.checker.MAXIMUM_PROTOBUF // byte_size + 1
large_tensor = numpy.random.rand(tensor_size).astype(dtype)
assert large_tensor.nbytes > onnx.checker.MAXIMUM_PROTOBUF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem rather unfortunate that we need to make the CI create a 2GB tensor just to test this.

How long does this test take to run? If it takes more than a minute, I think we should refactor the load_onnx_model code so we can test the file-based shape inference directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pulling out the file based shape inference into a utility is just a good idea in general. https://github.com/llvm/torch-mlir/pull/4375/files#r2608194510

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could try to mirror what is currently being done in IREE:

https://github.com/iree-org/iree/blob/d0dd8893758dcf40558a57916b869ba0babc0a95/compiler/bindings/python/iree/compiler/tools/import_onnx/__main__.py#L86

Ignore the params (this is primarily the reason the importer code has diverged in IREE, but there are some improvements which haven't found their way back to torch-mlir yet).

# Load the temp file and the external data.
inferred_model = onnx.load(temp_inferred_file, load_external_data=False)
data_dir = Path(input_dir if args.temp_dir is None else args.data_dir)
data_dir = Path(input_dir if args.data_dir is None else args.data_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be moved up above the condition on line 127 (if raw_model_modified), since the external data may end up being organized differently and this gets saved to the temp_dir.

E.g.,

data_dir = Path(args.data_dir or input_dir)
if raw_model_modified:
    ...
    data_dir = Path(temp_dir)
...

Comment on lines 130 to 156
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hoist this into a util function so we can test in isolation without needing to generate a 2GB tensor.

E.g.,

def _file_based_shape_infer(
    model_path : Path,
    data_dir : Path,
    temp_dir : Path,
) -> onnx.ModelProto:
    """docstring"""
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ONNX importer fails when --temp-dir is set and --data-dir is not

2 participants