Could you release the HLE benchmark evaluation code?

Thank you for your great work on InternAgent. The project demonstrates strong performance on reasoning benchmarks such as HLE, which is very impressive.

However, **I was not able to find the evaluation code or scripts for running InternAgent on HLE (or similar reasoning benchmarks) in the repository**. It would be extremely helpful if the team could share the corresponding evaluation pipeline, including data preprocessing, prompt templates, and inference settings, to facilitate reproducibility and further research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you release the HLE benchmark evaluation code? #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you release the HLE benchmark evaluation code? #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions