Skip to content

Conversation

@jess-lowe
Copy link
Contributor

@jess-lowe jess-lowe commented Dec 8, 2025

This PR is intended for regression monitoring, and should probably be set up to only run under a specific flag if the conversion code is changed, as it will be doing a lot of converting.

When the data is converted, a snapshot is saved per CNA so that if it changes we can see what exactly has changed.

We also keep a snapshot of the conversion stats, to monitor that on a higher level.

A follow up PR will include all of the sampled data we want to be testing.

@jess-lowe jess-lowe changed the title tests: monitor CVE5 conversion regressions through snapshot testing test: monitor CVE5 conversion regressions through snapshot testing Dec 8, 2025
@jess-lowe jess-lowe requested a review from another-rex December 8, 2025 05:16
@jess-lowe jess-lowe marked this pull request as ready for review December 9, 2025 05:43
// TestSnapshotConversion runs the conversion process on a sample of CVEs and
// creates snapshots of the output for comparison. This is used for monitoring
// progressions and regressions when making changes to the converter.
func TestSnapshotConversion(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Parallel()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't parallelize this because we are grouping snapshots by CNA, so there would be race conditions causing snapshots to be non-deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to t.Parallel here though? This just makes it run at the same time as snapshot_test.go, which should be independent to this I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean the stats output test, then no it can't be run in parallel as the the rest of the files first need to be processed in order to have a stats outcome.

The files can't be parallelized easily either, because of the non-determinism thing grouping them by CNA and that they're all writing to the same stats output.

Comment on lines 64 to 68
keys := make([]string, 0, len(stats))
for k := range stats {
keys = append(keys, k)
}
sort.Strings(keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the shorter form with maps.Key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maps.Keys isn't deterministic :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant use slices.Sorted(maps.Key(stats)), same as in the previous PR.

Comment on lines 63 to 75
// Sort keys for deterministic output
keys := make([]string, 0, len(stats))
for k := range stats {
keys = append(keys, k)
}
sort.Strings(keys)

var statsOutput strings.Builder
statsOutput.WriteString("Conversion Outcomes:\n")
for _, k := range keys {
statsOutput.WriteString(fmt.Sprintf("%s: %d\n", k, stats[k]))
}
snaps.WithConfig(snaps.Filename("conversion_outcomes")).MatchSnapshot(t, statsOutput.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually is this useful? The snapshot of each individual record would change (including the metrics) without this aggregate right? Since we are no longer putting every single record in here, it's probably not needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have conversion_outcomes aggregation being written normally and uploaded? If not we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd find it useful because I have the high level overview of numbers going in which direction. If it were numbers going up to Successful, I wouldn't need to look into it as much as NoRanges suddenly spiking. It's just slightly more useful for speedy triage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have conversion_outcomes being written and uploaded for the CVE5 logic, but there is an equivalent for NVD. The NVD one uses a CSV file that I'm not convinced isn't being truncated at 5000 entries.

It'd be good to use this data though in the future for our metrics dashboard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that we're running conversion in parallel and would need to do some analysis using the .metrics.json files, or do icky stuff with mutexes.

Have got the earlier solution in another WIP PR that I can put up after this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm main concern is I don't want the direction to be part of tests. That'll work in the short term but I feel might get really annoying to update in the long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants