Skip to content

Conversation

@Unisay
Copy link
Contributor

@Unisay Unisay commented Dec 4, 2025

Summary

  • Change valueContains costing from multiplied_sizes to const_above_diagonal with linear_in_x_and_y inner model
  • Add LinearInXAndY constructor to SimpleJSON.hs for JSON parsing
  • Update cost model visualization to support new model types

Motivation

The true complexity of valueContains is O(m*log(n/m+1)) where n is the container size and m is the contained size. Kenneth MacKenzie showed that this is bounded above by a linear function (m+n)/k for some constant k.

The new model better reflects this:

  • Above diagonal (y > x): constant cost (early exit, returns False immediately)
  • Below diagonal (x >= y): intercept + slope1*x + slope2*y

Changes

File Change
models.R Update valueContainsModel to use const_above_diagonal with linear_in_x_and_y
SimpleJSON.hs Add LinearInXAndY constructor for JSON parsing
utils.js Add evaluators for linear_in_x_and_y and const_above_diagonal
builtinCostModel{A,B,C}.json Regenerated with new model type

New Cost Model Structure

"valueContains": {
  "cpu": {
    "arguments": {
      "constant": 284518,
      "model": {
        "arguments": {
          "intercept": 1000,
          "slope1": 158792,
          "slope2": 24659
        },
        "type": "linear_in_x_and_y"
      }
    },
    "type": "const_above_diagonal"
  },
  "memory": {
    "arguments": 32,
    "type": "constant_cost"
  }
}

Test plan

  • Build passes
  • Conformance tests pass (will need budget expectation updates)
  • Cost model visualization works with new model type

Closes https://github.com/IntersectMBO/plutus-private/issues/1984

Change valueContains costing from multiplied_sizes to
const_above_diagonal with linear_in_x_and_y inner model.

The new model:
- Above diagonal (y > x): constant cost (early exit, returns False)
- Below diagonal (x >= y): intercept + slope1*x + slope2*y

This better reflects the actual complexity bound O(m*log(n/m+1))
which Kenneth MacKenzie showed is bounded above by a linear function.

Changes:
- models.R: Update valueContainsModel to use const_above_diagonal
- SimpleJSON.hs: Add LinearInXAndY constructor for JSON parsing
- utils.js: Add evaluators for linear_in_x_and_y and const_above_diagonal
- builtinCostModel{A,B,C}.json: Regenerated with new model type
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

PR Preview Action v1.6.2

🚀 View preview at
https://IntersectMBO.github.io/plutus/pr-preview/cost-models/pr-7476/

Built to branch gh-pages at 2025-12-04 16:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@Unisay Unisay requested a review from kwxm December 4, 2025 13:07
@Unisay Unisay self-assigned this Dec 4, 2025
@Unisay Unisay added the No Changelog Required Add this to skip the Changelog Check label Dec 4, 2025
Remove const_above_diagonal optimization since benchmark data shows
linear behavior regardless of input size relationship. The diagonal
optimization assumed quick exit when needle size > haystack size,
but benchmark measurements don't reflect this behavior.
Change valueContains benchmark to use ValueTotalSize for both container
and contained arguments, enabling meaningful diagonal comparison:
- When contained size > container size, containment is impossible
- This allows const_above_diagonal optimization in cost model

Also update R model to fit const_above_diagonal with proper diagonal
filtering for above/below diagonal data points.
Revert builtinCostModel{A,B,C}.json to master to avoid unnecessary
re-indentation diff. The valueContains cost model will be updated
once new benchmarks complete with ValueTotalSize for both args.
… parameters

- Changed CPU cost model from a simple linear model to a more complex model with constant and intercept values.
- Updated the type from "multiplied_sizes" to "const_above_diagonal" for better accuracy in cost estimation.
- Adjusted arguments for intercept, slope1, and slope2 to reflect new performance metrics.
Copy link
Member

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks quite reasonable, but I'll leave it to @kwxm

@Unisay
Copy link
Contributor Author

Unisay commented Dec 5, 2025

This PR replaces #7476

"model": {
"arguments": {
"intercept": 386227,
"slope1": 1970,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theoretical linear upper bound of k*(m+n) would have slope1 = slope2 here, but here the two slopes are very different! I think that's OK though: the theoretical bound is very conservative and this seems to fit the data very well.

@Unisay Unisay requested a review from kwxm December 23, 2025 14:55
Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but again I urge you to reduce the number of benchmarking inputs: see this earlier comment. We have over 1000 points here and that makes the benchmark take a very long time (> 90 minutes for just valueContains) and doesn't contribute anything to the accuracy of the model: you can see this by taking subsets of the existing data and fitting models to them. Also, the data is very unevenly distributed, which is probably not a good idea. There are 500 inputs with x<200 and y<200, so 1/25 of the space of inputs is contributing half of the datapoints, which will bias the model. Something like the below-diagonal part of a 25x25 or 30x30 grid of evenly spaced inputs (plus some points above the diagonal) would probably suffice and not lose any precision.

Also, the inputs aren't symmetric: the x values go up to 2047, but the y values only go up to 1000, so we're not always checking cases like a value containing itself, which might take some time. Can you remind me if there's some reason for this? We had earlier discussions about what the worst case would be, but I don't recall the details at the moment.

"exp_mod_cost" -> ExpModCost <$> parseJSON args
"literal_in_y_or_linear_in_z" -> LiteralInYOrLinearInZ <$> parseJSON args
"linear_in_max_yz" -> LinearInMaxYZ <$> parseJSON args
"linear_in_x_and_y" -> LinearInXAndY <$> parseJSON args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a case for LinearInXAndY in the print-cost-model executable here.

# Above diagonal (y > x): containment impossible, quick False return
# Use minimum observed time as constant for this case
above_diag <- filtered %>% filter(y_mem > x_mem)
constant <- if (nrow(above_diag) > 0) ceiling(mean(above_diag$t)) else min(filtered$t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding some points (maybe 20 or so) above the diagonal in the benchmarking code to ensure that above_diag is always nonempty and then we won't need to have the two ifs in this code. There's redundancy here that can be avoided by making sure that the benchmark results always contain points on both sides of the diagonal. If someone was later to remove the points above the diagonal then we'd get an error here anyway when we get to mean(above_diag), so there's no real reason to have two different cases to deal with the presence or absence of points above the diagonal.

@kwxm kwxm added Builtins Costing Anything relating to costs, fees, gas, etc. labels Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Builtins Costing Anything relating to costs, fees, gas, etc. No Changelog Required Add this to skip the Changelog Check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants