Skip to content

Conversation

@PrecisEDAnon
Copy link

This introduces a surrogate autotuner to OpenROAD (Note : ORFS PR to follow). The idea of the surrogate autotuner is to use a simple model to see how parameters like core util affect the PPA results, and then optimize them. It runs very fast (10 mins total optimization time max) vs the ORFS autotuner (24h+). AI generated content follows.

OpenROAD PR: Optional surrogate-based optimizer (surrogate_optimize) for fast autotuning

This PR adds an optional, gated surrogate model + optimizer to OpenROAD to enable fast design-space exploration (autotuning) without running full place/CTS/route for every sample.

The integration is intended to be used either:

  • directly from OpenROAD Tcl (power users / experiments), or
  • via a companion ORFS (OpenROAD-flow-scripts) PR that provides Makefile targets, design-specific knob spaces, and end-to-end “tune + validate” plumbing (companion ORFS PR: TBD).

Summary of user-visible changes

When enabled, OpenROAD registers three Tcl commands:

  • surrogate_supported_features — returns a JSON array of supported objective/output names
  • surrogate_eval — evaluates the surrogate model for a single parameter point (JSON params)
  • surrogate_optimize — searches a JSON-defined knob space and writes a JSON summary of top candidates

These commands are not registered unless explicitly enabled (see “Gating”).

Implementation overview (for reviewers)

Main touched/added areas:

  • src/Surrogate.cc: surrogate model + JSON space parsing + optimizer + Tcl commands
  • src/OpenRoad.cc: compile/runtime gating + command registration
  • include/ord/Surrogate.hh: minimal public init entry point
  • CMakeLists.txt + src/CMakeLists.txt: ENABLE_SURROGATE build option (default OFF)

Motivation / why this belongs in OpenROAD

Autotuning is often limited by the cost of running full flows. The goal here is to make it practical to:

  • score many candidates quickly using a built-in analytic surrogate, then
  • validate only a small portfolio of promising/diverse candidates with full runs (in ORFS or another driver).

This PR provides the OpenROAD half: a fast, built-in evaluator + optimizer that can be driven from Tcl, with stable JSON I/O suitable for orchestration.

Gating (no impact unless explicitly opted in)

This PR is designed to be non-invasive and safe for upstream:

  1. Compile-time gate (CMake): ENABLE_SURROGATE (default OFF)
cmake -S . -B build -D CMAKE_BUILD_TYPE=Release -D ENABLE_SURROGATE=ON
cmake --build build -j
  1. Runtime gate (env var): OPENROAD_ENABLE_SURROGATE=1
OPENROAD_ENABLE_SURROGATE=1 ./build/bin/openroad ...

If either gate is off, default OpenROAD behavior and Tcl command set are unchanged.

surrogate_optimize (main entry point)

What it does

surrogate_optimize:

  • parses a knob “space” (JSON) with per-knob {type, minmax, step},
  • samples candidate knob settings (multi-threaded),
  • evaluates each candidate using the built-in surrogate,
  • returns and writes a JSON summary including the best point and a top list.

The surrogate produces (at least) predictions for:

  • effective_clock_period
  • routed_wirelength
  • area
  • instance_area
  • power

It also produces internal diagnostic features (e.g. surrogate_fail_risk) that can be included in output for debugging.

Tcl usage (example)

# After loading a design (dbBlock must exist):
#
#   read_lef ...
#   read_def ...
#   read_sdc ...
#
# Provide a knob space JSON file (see schema below):

set res [surrogate_optimize \
  -space_file /path/to/surrogate_space.json \
  -objective effective_clock_period \
  -minimize \
  -samples 500000 \
  -top_n 50 \
  -threads 16 \
  -time_budget_s 600 \
  -output /tmp/surrogate_optimize.json \
  -include_features]
puts $res

Important options (CLI contract)

  • -space_file <path> or -space <json>
  • -objective <name> (see surrogate_supported_features)
  • -minimize / -maximize (default: minimize)
  • -samples <N> (required; N>0)
  • -top_n <K> (keeps the best K candidates; K>=1)
  • -threads <N> (default: hardware concurrency; clamped to 1..256)
  • -time_budget_s <seconds> (optional wall-clock cap; acts as an early stop)
  • -base_params_file <path> (optional: baseline/starting knob values)
  • -freeze <csv> (optional: do not vary these knobs; e.g. clock_period)
  • -calibrate_ws_file <path> and/or -calibrate_wl_file <path> (optional; see below)
  • -multi_fidelity + -shrink <0..1> (optional refinement strategy)
  • -portfolio + -portfolio_shrink <0..1> (optional “multi-island” sampling strategy)
  • -format simple|json (simple prints best_objective=..., json prints full JSON)
  • -output <path> (required)
  • -include_features (include diagnostic feature values for the best/top entries)

Unknown args are ignored with a warning (to keep wrappers forwards-compatible).

Space file schema (JSON)

The knob space is a JSON object mapping knob name → spec:

{
  "core_utilization":  { "type": "int",   "minmax": [20, 99],   "step": 1 },
  "core_aspect_ratio": { "type": "float", "minmax": [0.9, 1.1], "step": 0 },
  "enable_dpo":        { "type": "binary","minmax": [0, 1],     "step": 1 }
}

Rules:

  • type is one of: float, int, binary
  • minmax: [min, max] is required
  • step is optional:
    • if step > 0, values are sampled on min + k*step
    • if step == 0 or omitted, values are sampled uniformly over [min, max]
  • binary samples {0, 1} (the minmax/step fields are required but effectively ignored)

Supported knob names (current list):

  • clock_period
  • core_utilization, core_aspect_ratio, tns_end_percent
  • global_padding, detail_padding, place_density, enable_dpo
  • pin_layer_adjust, above_layer_adjust, density_margin_addon
  • cts_cluster_size, cts_cluster_diameter

Unknown knob names are ignored (no hard error).

Calibration inputs (optional but recommended)

To make surrogate predictions more comparable to a real baseline point, surrogate_optimize can ingest baseline metrics and calibrate built-in scaling factors via:

  • -calibrate_ws_file <6_report.json> (ORFS finish metrics; uses timing + power + area fields)
  • -calibrate_wl_file <5_2_route.json> (ORFS route metrics; uses detailedroute__route__wirelength)

This calibrates internal length/timing scales with a small log-space search and also provides baseline anchors for certain derived quantities (e.g. the power model).

Calibration overrides can also be provided via environment variables:

  • SURROGATE_BUILTIN_LENGTH_SCALE
  • SURROGATE_BUILTIN_TIMING_SCALE
  • SURROGATE_BUILTIN_REF_CLOCK_USER

Output JSON (stable artifact)

surrogate_optimize writes a JSON object like:

  • objective metadata (objective name, sense, samples, threads, etc.)
  • best_objective
  • best_params
  • best_outputs
  • optional best_features (with -include_features)
  • top: array of {objective, params, outputs, features?}

This JSON is designed to be consumed by external orchestration (e.g. an ORFS “tune + validate” wrapper).

surrogate_eval (single-point evaluation)

For debugging or analysis, surrogate_eval evaluates one parameter set:

set res [surrogate_eval -params_file /path/to/params.json -include_features]
puts $res

Empirical results (from a prototype ORFS driver)

In a prototype ORFS integration (branch orfs-surrogate-rebased) using this OpenROAD feature, on-disk runs with 600s surrogate search and validating K=14 candidates across {asap7,nangate45,sky130hd} × {aes,ibex,jpeg} observed:

  • routed_wirelength: median gain 3.35% (p25 0.96%, p75 7.37%), best 15.78%
  • effective_clock_period: median gain 2.99% (p25 1.61%, p75 4.71%), best 11.58%

These numbers are primarily to demonstrate usefulness; end-to-end gains depend on the driver, validation budget, and knob space.

Testing

  • Build coverage: compiles cleanly with -D ENABLE_SURROGATE=ON and keeps default builds unchanged with the option off.
  • Runtime sanity: when enabled, surrogate_supported_features returns the expected list; surrogate_eval and surrogate_optimize run on real designs via the ORFS prototype driver.

Follow-ups / companion work

  • ORFS companion PR (TBD): adds Makefile targets and an end-to-end “tune + validate” workflow and documentation; this OpenROAD PR is self-contained without it.

Minimal enablement of the surrogate autotuner (Surrogate module + build integration).
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a surrogate autotuner to OpenROAD, which is a significant new feature for fast design-space exploration. The implementation is extensive, primarily in the new src/Surrogate.cc file. The code is well-structured, using modern C++ features and good practices for multithreading and numerical computation.

My review focuses on improving code quality, readability, and maintainability of the new surrogate model implementation. I've identified a few areas for improvement:

  • Replacing a custom clamp function with the standard std::clamp.
  • Simplifying a redundant conditional block.
  • Refactoring a very long function into smaller, more manageable pieces.
  • Using named constants instead of magic numbers to improve clarity.
  • Removing a redundant check in the result sorting logic.

Overall, this is a high-quality contribution. The suggested changes are aimed at making this complex new feature even more robust and easier to maintain in the future.

Comment on lines +54 to +57
T clamp(const T& v, const T& lo, const T& hi)
{
return std::min(hi, std::max(lo, v));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This clamp function can be replaced with std::clamp from the <algorithm> header, which is standard since C++17. Using the standard library function is preferred for conciseness and clarity. You could remove this helper and use std::clamp at call sites directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you should prompt to use c++20 style

Comment on lines +176 to +182
} else if (master->isCore()) {
s.total_core_area_um2 += area_um2;
s.num_core_insts++;
} else {
s.total_core_area_um2 += area_um2;
s.num_core_insts++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The else if (master->isCore()) block and the final else block have identical code. This is redundant and can be simplified by merging them.

    } else {
      s.total_core_area_um2 += area_um2;
      s.num_core_insts++;
    }

return clamp(std::exp(0.5 * (a + b)), min_scale, max_scale);
}

SimOut simulateOnce(const ModelContext& ctx, const Knobs& k, const int fidelity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function simulateOnce is very long (over 400 lines), which makes it difficult to read, understand, and maintain. Consider refactoring it by extracting logical sections into smaller, well-named helper functions. For example, you could create separate functions for:

  • Placement and routing proxy estimation (lines 1072-1170)
  • Timing proxy estimation (lines 1171-1322)
  • Final PPA calculations including power (lines 1324-1429)
    This will improve modularity and readability.

Comment on lines +1073 to +1075
double k_place = 0.35;
k_place *= 1.0 + 0.25 * std::max(0.0, util_target - 0.60);
k_place *= 1.0 + 0.04 * pad_sum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The simulateOnce function contains many "magic numbers" (unnamed numerical constants). While this is common in empirical models, giving them meaningful names via constexpr can greatly improve readability and maintainability. For example, here and in subsequent lines, constants like 0.35, 0.25, 0.60, 0.04 could be defined with names that explain their purpose.

}
return;
}
if (!vec.empty() && better(s.obj, vec.back().obj)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check !vec.empty() is redundant here. The preceding if block (lines 1957-1965) handles cases where vec.size() < args.top_n. Since args.top_n is guaranteed to be at least 1, if execution reaches this line, vec will be full and therefore not empty. You can safely remove this check.

      if (better(s.obj, vec.back().obj)) {

@oharboe
Copy link
Collaborator

oharboe commented Jan 12, 2026

Information in pull requests is "lost" to the "community memory" as soon as the PR is merged.

Request: create a README.md documentation explaining how to use this for permanent "community memory".

I'm skeptical of monolothic DSE(design space exploration) in OpenROAD. I'd rather see OpenROAD enabling the users choice of DSE setup, than to have OpenROAD be responsible for running the DSE (which is a recipe for framework inversion problems).

But perhaps this PR enables something that I think would be useful and a nice seperation of concerns: a fast scan to find ranges of values that are worth exploring in DSE?

I thought ORFS/OpenROAD (and EDA tools in general) had a long tail, meaning there's a substantial amount of quality of results to be found after extensive searches, and that there's no way around the "24 hour exploration times" for finding the best parameters for a design as early flow choices can have big impacts on the final result. For instance, an increase in placement density could cause macro placement to flip from one configuration to another, yielding very different results.

I think it is worth reading about the nature of the variables that the user has to set as it is explained in ORFS documentation: https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/master/docs/user/FlowVariables.md#types-of-variables

Also there was some discussion along these lines in The-OpenROAD-Project/OpenROAD-flow-scripts#3738

@MrAMS
Copy link

MrAMS commented Jan 12, 2026

Interesting work 👍

@maliberty
Copy link
Member

Fwiw I think the Gemini comments are useful here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

#include <stdexcept>
#include <string>
#include <system_error>
#include <thread>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: included header system_error is not used directly [misc-include-cleaner]

Suggested change
#include <thread>
#include <thread>

#include <system_error>
#include <thread>
#include <unordered_set>
#include <utility>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: included header unordered_set is not used directly [misc-include-cleaner]

Suggested change
#include <utility>
#include <utility>

#include <unordered_set>
#include <utility>
#include <vector>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: included header vector is not used directly [misc-include-cleaner]

Suggested change

#include <utility>
#include <vector>

#include "db_sta/dbSta.hh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'db_sta/dbSta.hh' file not found [clang-diagnostic-error]

#include "db_sta/dbSta.hh"
         ^

#include "db_sta/dbSta.hh"
#include "odb/db.h"
#include "odb/dbTypes.h"
#include "ord/OpenRoad.hh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: included header dbTypes.h is not used directly [misc-include-cleaner]

Suggested change
#include "ord/OpenRoad.hh"
#include "ord/OpenRoad.hh"

@maliberty
Copy link
Member

@oharboe I think the idea is to give a good candidate, not an optimal one, very quickly. It could be a seed for AT/sweep or just a quick and easy gain. Is that useful to you assuming it works well at that goal?

This is a more interesting idea than the other PRs but it actual value in practice is unclear. The ui is a bit ugly and it does need permanent documentation.

@MrAMS
Copy link

MrAMS commented Jan 13, 2026

@maliberty I have developed a parallelized DSE framework based on ORFS that explores various clock frequencies and Chisel parameters (source: MrAMS/bazel-chisel-verilator-openroad-demo/tree/dse-parallel-trials/eda/dse). By leveraging Bazel for parallel execution, I've already achieved an order-of-magnitude speedup.

regarding early pruning: I previously discussed this via email with @oharboe and ran some experiments. However, we found that mathematically, "early pruning" is inherently difficult to apply to multi-objective optimization problems. Most existing optimization frameworks do not support this directly, as they typically handle multi-objective problems by scalarizing them into single-objective ones first.

I would love to discuss how we might adapt candidate screening algorithms to effectively handle multi-objective optimization in this context.

@maliberty
Copy link
Member

@luarss FYI

The OR autotuner is built on RayTune which has a variety of search algorithms with different qualities. Usually you need a single score to optimize though you could report multiple metrics. What did you have in mind for "multi-objective optimization"?

@oharboe
Copy link
Collaborator

oharboe commented Jan 13, 2026

There are many different mathematical models, only some of which are available in Ray and Optuna. I do believe that there is some optortunity for a more specialized flow for scoping the DSE parameters and providing an initial estimate of the landscape to a downstream full flow search. The landscape has discontinuities that are not going to be fully mapped out by any approximation and the discontinuities are going to be more significant the higher the utilization is, so ultimately a full flow with final variables will have to be run. In our case, this includes architectural parameters(pre-synthesis).

This is a hilly landscape, with cliffs... Tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants