-
Notifications
You must be signed in to change notification settings - Fork 782
OpenROAD surrogate autotuner #9236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
OpenROAD surrogate autotuner #9236
Conversation
Minimal enablement of the surrogate autotuner (Surrogate module + build integration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a surrogate autotuner to OpenROAD, which is a significant new feature for fast design-space exploration. The implementation is extensive, primarily in the new src/Surrogate.cc file. The code is well-structured, using modern C++ features and good practices for multithreading and numerical computation.
My review focuses on improving code quality, readability, and maintainability of the new surrogate model implementation. I've identified a few areas for improvement:
- Replacing a custom
clampfunction with the standardstd::clamp. - Simplifying a redundant conditional block.
- Refactoring a very long function into smaller, more manageable pieces.
- Using named constants instead of magic numbers to improve clarity.
- Removing a redundant check in the result sorting logic.
Overall, this is a high-quality contribution. The suggested changes are aimed at making this complex new feature even more robust and easier to maintain in the future.
| T clamp(const T& v, const T& lo, const T& hi) | ||
| { | ||
| return std::min(hi, std::max(lo, v)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you should prompt to use c++20 style
| } else if (master->isCore()) { | ||
| s.total_core_area_um2 += area_um2; | ||
| s.num_core_insts++; | ||
| } else { | ||
| s.total_core_area_um2 += area_um2; | ||
| s.num_core_insts++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return clamp(std::exp(0.5 * (a + b)), min_scale, max_scale); | ||
| } | ||
|
|
||
| SimOut simulateOnce(const ModelContext& ctx, const Knobs& k, const int fidelity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function simulateOnce is very long (over 400 lines), which makes it difficult to read, understand, and maintain. Consider refactoring it by extracting logical sections into smaller, well-named helper functions. For example, you could create separate functions for:
- Placement and routing proxy estimation (lines 1072-1170)
- Timing proxy estimation (lines 1171-1322)
- Final PPA calculations including power (lines 1324-1429)
This will improve modularity and readability.
| double k_place = 0.35; | ||
| k_place *= 1.0 + 0.25 * std::max(0.0, util_target - 0.60); | ||
| k_place *= 1.0 + 0.04 * pad_sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simulateOnce function contains many "magic numbers" (unnamed numerical constants). While this is common in empirical models, giving them meaningful names via constexpr can greatly improve readability and maintainability. For example, here and in subsequent lines, constants like 0.35, 0.25, 0.60, 0.04 could be defined with names that explain their purpose.
| } | ||
| return; | ||
| } | ||
| if (!vec.empty() && better(s.obj, vec.back().obj)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check !vec.empty() is redundant here. The preceding if block (lines 1957-1965) handles cases where vec.size() < args.top_n. Since args.top_n is guaranteed to be at least 1, if execution reaches this line, vec will be full and therefore not empty. You can safely remove this check.
if (better(s.obj, vec.back().obj)) {|
Information in pull requests is "lost" to the "community memory" as soon as the PR is merged. Request: create a README.md documentation explaining how to use this for permanent "community memory". I'm skeptical of monolothic DSE(design space exploration) in OpenROAD. I'd rather see OpenROAD enabling the users choice of DSE setup, than to have OpenROAD be responsible for running the DSE (which is a recipe for framework inversion problems). But perhaps this PR enables something that I think would be useful and a nice seperation of concerns: a fast scan to find ranges of values that are worth exploring in DSE? I thought ORFS/OpenROAD (and EDA tools in general) had a long tail, meaning there's a substantial amount of quality of results to be found after extensive searches, and that there's no way around the "24 hour exploration times" for finding the best parameters for a design as early flow choices can have big impacts on the final result. For instance, an increase in placement density could cause macro placement to flip from one configuration to another, yielding very different results. I think it is worth reading about the nature of the variables that the user has to set as it is explained in ORFS documentation: https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/blob/master/docs/user/FlowVariables.md#types-of-variables Also there was some discussion along these lines in The-OpenROAD-Project/OpenROAD-flow-scripts#3738 |
|
Interesting work 👍 |
|
Fwiw I think the Gemini comments are useful here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| #include <stdexcept> | ||
| #include <string> | ||
| #include <system_error> | ||
| #include <thread> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: included header system_error is not used directly [misc-include-cleaner]
| #include <thread> | |
| #include <thread> |
| #include <system_error> | ||
| #include <thread> | ||
| #include <unordered_set> | ||
| #include <utility> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: included header unordered_set is not used directly [misc-include-cleaner]
| #include <utility> | |
| #include <utility> |
| #include <unordered_set> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: included header vector is not used directly [misc-include-cleaner]
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| #include "db_sta/dbSta.hh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'db_sta/dbSta.hh' file not found [clang-diagnostic-error]
#include "db_sta/dbSta.hh"
^| #include "db_sta/dbSta.hh" | ||
| #include "odb/db.h" | ||
| #include "odb/dbTypes.h" | ||
| #include "ord/OpenRoad.hh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: included header dbTypes.h is not used directly [misc-include-cleaner]
| #include "ord/OpenRoad.hh" | |
| #include "ord/OpenRoad.hh" |
|
@oharboe I think the idea is to give a good candidate, not an optimal one, very quickly. It could be a seed for AT/sweep or just a quick and easy gain. Is that useful to you assuming it works well at that goal? This is a more interesting idea than the other PRs but it actual value in practice is unclear. The ui is a bit ugly and it does need permanent documentation. |
|
@maliberty I have developed a parallelized DSE framework based on ORFS that explores various clock frequencies and Chisel parameters (source: MrAMS/bazel-chisel-verilator-openroad-demo/tree/dse-parallel-trials/eda/dse). By leveraging Bazel for parallel execution, I've already achieved an order-of-magnitude speedup. regarding early pruning: I previously discussed this via email with @oharboe and ran some experiments. However, we found that mathematically, "early pruning" is inherently difficult to apply to multi-objective optimization problems. Most existing optimization frameworks do not support this directly, as they typically handle multi-objective problems by scalarizing them into single-objective ones first. I would love to discuss how we might adapt candidate screening algorithms to effectively handle multi-objective optimization in this context. |
|
There are many different mathematical models, only some of which are available in Ray and Optuna. I do believe that there is some optortunity for a more specialized flow for scoping the DSE parameters and providing an initial estimate of the landscape to a downstream full flow search. The landscape has discontinuities that are not going to be fully mapped out by any approximation and the discontinuities are going to be more significant the higher the utilization is, so ultimately a full flow with final variables will have to be run. In our case, this includes architectural parameters(pre-synthesis). This is a hilly landscape, with cliffs... Tricky. |
This introduces a surrogate autotuner to OpenROAD (Note : ORFS PR to follow). The idea of the surrogate autotuner is to use a simple model to see how parameters like core util affect the PPA results, and then optimize them. It runs very fast (10 mins total optimization time max) vs the ORFS autotuner (24h+). AI generated content follows.
OpenROAD PR: Optional surrogate-based optimizer (
surrogate_optimize) for fast autotuningThis PR adds an optional, gated surrogate model + optimizer to OpenROAD to enable fast design-space exploration (autotuning) without running full place/CTS/route for every sample.
The integration is intended to be used either:
Summary of user-visible changes
When enabled, OpenROAD registers three Tcl commands:
surrogate_supported_features— returns a JSON array of supported objective/output namessurrogate_eval— evaluates the surrogate model for a single parameter point (JSON params)surrogate_optimize— searches a JSON-defined knob space and writes a JSON summary of top candidatesThese commands are not registered unless explicitly enabled (see “Gating”).
Implementation overview (for reviewers)
Main touched/added areas:
src/Surrogate.cc: surrogate model + JSON space parsing + optimizer + Tcl commandssrc/OpenRoad.cc: compile/runtime gating + command registrationinclude/ord/Surrogate.hh: minimal public init entry pointCMakeLists.txt+src/CMakeLists.txt:ENABLE_SURROGATEbuild option (defaultOFF)Motivation / why this belongs in OpenROAD
Autotuning is often limited by the cost of running full flows. The goal here is to make it practical to:
This PR provides the OpenROAD half: a fast, built-in evaluator + optimizer that can be driven from Tcl, with stable JSON I/O suitable for orchestration.
Gating (no impact unless explicitly opted in)
This PR is designed to be non-invasive and safe for upstream:
ENABLE_SURROGATE(defaultOFF)cmake -S . -B build -D CMAKE_BUILD_TYPE=Release -D ENABLE_SURROGATE=ON cmake --build build -jOPENROAD_ENABLE_SURROGATE=1If either gate is off, default OpenROAD behavior and Tcl command set are unchanged.
surrogate_optimize(main entry point)What it does
surrogate_optimize:{type, minmax, step},toplist.The surrogate produces (at least) predictions for:
effective_clock_periodrouted_wirelengthareainstance_areapowerIt also produces internal diagnostic features (e.g.
surrogate_fail_risk) that can be included in output for debugging.Tcl usage (example)
Important options (CLI contract)
-space_file <path>or-space <json>-objective <name>(seesurrogate_supported_features)-minimize/-maximize(default: minimize)-samples <N>(required;N>0)-top_n <K>(keeps the best K candidates;K>=1)-threads <N>(default: hardware concurrency; clamped to 1..256)-time_budget_s <seconds>(optional wall-clock cap; acts as an early stop)-base_params_file <path>(optional: baseline/starting knob values)-freeze <csv>(optional: do not vary these knobs; e.g.clock_period)-calibrate_ws_file <path>and/or-calibrate_wl_file <path>(optional; see below)-multi_fidelity+-shrink <0..1>(optional refinement strategy)-portfolio+-portfolio_shrink <0..1>(optional “multi-island” sampling strategy)-format simple|json(simple printsbest_objective=..., json prints full JSON)-output <path>(required)-include_features(include diagnostic feature values for the best/top entries)Unknown args are ignored with a warning (to keep wrappers forwards-compatible).
Space file schema (JSON)
The knob space is a JSON object mapping knob name → spec:
{ "core_utilization": { "type": "int", "minmax": [20, 99], "step": 1 }, "core_aspect_ratio": { "type": "float", "minmax": [0.9, 1.1], "step": 0 }, "enable_dpo": { "type": "binary","minmax": [0, 1], "step": 1 } }Rules:
typeis one of:float,int,binaryminmax: [min, max]is requiredstepis optional:step > 0, values are sampled onmin + k*stepstep == 0or omitted, values are sampled uniformly over[min, max]binarysamples{0, 1}(theminmax/stepfields are required but effectively ignored)Supported knob names (current list):
clock_periodcore_utilization,core_aspect_ratio,tns_end_percentglobal_padding,detail_padding,place_density,enable_dpopin_layer_adjust,above_layer_adjust,density_margin_addoncts_cluster_size,cts_cluster_diameterUnknown knob names are ignored (no hard error).
Calibration inputs (optional but recommended)
To make surrogate predictions more comparable to a real baseline point,
surrogate_optimizecan ingest baseline metrics and calibrate built-in scaling factors via:-calibrate_ws_file <6_report.json>(ORFSfinishmetrics; uses timing + power + area fields)-calibrate_wl_file <5_2_route.json>(ORFS route metrics; usesdetailedroute__route__wirelength)This calibrates internal length/timing scales with a small log-space search and also provides baseline anchors for certain derived quantities (e.g. the power model).
Calibration overrides can also be provided via environment variables:
SURROGATE_BUILTIN_LENGTH_SCALESURROGATE_BUILTIN_TIMING_SCALESURROGATE_BUILTIN_REF_CLOCK_USEROutput JSON (stable artifact)
surrogate_optimizewrites a JSON object like:best_objectivebest_paramsbest_outputsbest_features(with-include_features)top: array of{objective, params, outputs, features?}This JSON is designed to be consumed by external orchestration (e.g. an ORFS “tune + validate” wrapper).
surrogate_eval(single-point evaluation)For debugging or analysis,
surrogate_evalevaluates one parameter set:Empirical results (from a prototype ORFS driver)
In a prototype ORFS integration (branch
orfs-surrogate-rebased) using this OpenROAD feature, on-disk runs with600ssurrogate search and validatingK=14candidates across{asap7,nangate45,sky130hd} × {aes,ibex,jpeg}observed:routed_wirelength: median gain3.35%(p250.96%, p757.37%), best15.78%effective_clock_period: median gain2.99%(p251.61%, p754.71%), best11.58%These numbers are primarily to demonstrate usefulness; end-to-end gains depend on the driver, validation budget, and knob space.
Testing
-D ENABLE_SURROGATE=ONand keeps default builds unchanged with the option off.surrogate_supported_featuresreturns the expected list;surrogate_evalandsurrogate_optimizerun on real designs via the ORFS prototype driver.Follow-ups / companion work