Skip to content

Comments

Prepare inputs needed for downstream evaluation#74

Open
taha-yassine wants to merge 2 commits intomajor-updatefrom
dataset-with-search
Open

Prepare inputs needed for downstream evaluation#74
taha-yassine wants to merge 2 commits intomajor-updatefrom
dataset-with-search

Conversation

@taha-yassine
Copy link
Collaborator

This PR adds a script to build the dataset required to run the benchmark on downstream tasks using OpenHands/benchmarks. The script takes as input the search results produced by CodeScout as a .jsonl.
There's also a custom prompt template to be used with the dataset to insert the search results into the user message.

Example command to run the benchmark

uv run swebench-infer .llm_config/modal.json \
        --dataset path/to/dataset_with_search_results.jsonl \
        --split test \
        --max-iterations 100 \
        --workspace docker \
        --prompt-path path/to/default_with_search.j2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant