Skip to content

Conversation

@qkaiser
Copy link
Contributor

@qkaiser qkaiser commented Dec 15, 2025

The discriminator previously used str(random()) pieces, which could emit scientific notation (e.g. .756153012152602e-05) and generate invalid BOM ref fragments.

  • Switch to uuid4().int for each segment so we always produce integer-only values while maintaining uniqueness.
  • Add a regression test to lock the BomRef.. format.

uuid4() gives ~122 bits of entropy (non-crypto), compared to ~53 bits from random.random(), and the prior decimal-string slicing wasted some of those bits while allowing scientific notation.

Went with uuid4 after exploring in-memory addresses like the JavaScript CycloneDX library and the PHP CycloneDX library which uses a time based approach.


A generalized approach for BomRef emission would be nice I think. Both from a resulting format and generation standpoint. Specifically:

  • PHP library uses uniqid which is time based and results in a timestamp in microseconds expressed in hex suffixed by a random digit
  • JavaScript uses in-memory addresses but also Math.random() which emits two dot separated base32 encoded value
  • Python is currently using math.random() which emits two dot separated 16 digits integer, sometimes in scientific notation

Instead of having custom implementations for each library offering support for CycloneDX, they could all rely on unique identifiers such as uuid4 which is natively supported by each language. This would offer strong guarantees of uniqueness along with a common format.

@qkaiser qkaiser requested a review from a team as a code owner December 15, 2025 11:58
The discriminator previously used str(random()) pieces, which could emit
scientific notation (e.g. 4.1e-05) and generate invalid BOM ref
fragments. Switch to uuid4().int for each segment so we always produce
integer-only values while maintaining uniqueness. Add a regression test
to lock the BomRef.<int>.<int> format. uuid4() gives ~122 bits of
entropy (non-crypto), compared to ~53 bits from random.random(), and the
prior decimal-string slicing wasted some of those bits while allowing
scientific notation.

Signed-off-by: Quentin Kaiser <quentin.kaiser@onekey.com>
@qkaiser qkaiser force-pushed the fix/bom-ref-discriminator branch from fb05b8c to b0a9f50 Compare December 15, 2025 11:59
@jkowalleck
Copy link
Member

Thanks for the report. I do have some questiosn

[...] generate invalid BOM ref fragments

invalid to what?

Instead of having custom implementations for each library offering support for CycloneDX, they could all rely on unique identifiers such as uuid4 which is natively supported by each language. This would offer strong guarantees of uniqueness along with a common format.

why exactly?

I dont see any reason to streamline a thing that has basically no rule nor format.
I mean, BomRef have only one constraint: being unique in the CycloneDX context(bom) they are used in.

Please explain why you feel this needs changing.

@jkowalleck jkowalleck added the question Further information is requested label Dec 16, 2025
@qkaiser
Copy link
Contributor Author

qkaiser commented Dec 17, 2025

Please explain why you feel this needs changing.

Some of our validators check BomRef fields to make sure they follow the format from this library which we naively assumed was dot separated digits. When casting from float to string leads to scientific notation, the presence of - and e breaks this validation.

I fully agree with you there's no standardized format for BomRef. It just feels logic to fix the value generation rather than relax a validation to accept scientific notation values in an identifier.

Feel free to close this MR if you want to keep things as they are.

@jkowalleck
Copy link
Member

I fully agree with you there's no standardized format for BomRef.
[...]
Feel free to close this MR if you want to keep things as they are.

I’ll close this ticket as “Not Planned”, since there’s no existing standard to base further work on.


It just feels logic to fix the value generation rather than relax a validation to accept scientific notation values in an identifier.

You really should "relax"(fix!) this validator, since it was build on wrong assumptions, right?

@jkowalleck jkowalleck closed this Dec 17, 2025
@jkowalleck jkowalleck added invalid This doesn't seem right and removed question Further information is requested labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants