Skip to content

Conversation

@ST-DDT
Copy link
Member

@ST-DDT ST-DDT commented Jan 18, 2025

Implements #1862


Adds a distributor option to faker.number.int and float. The distributor function can be used to influence the distribution of the generated values. E.g. uniformDistributor and exponentialDistributor.


The following table shows the rough distribution of values generated using
exponentialDistributor({ base: x }):

Result Base 0.1 Base 0.5 Base 1 Base 2 Base 10
0.0 - 0.1 4.1% 7.4% 10.0% 13.8% 27.8%
0.1 - 0.2 4.5% 7.8% 10.0% 12.5% 16.9%
0.2 - 0.3 5.0% 8.2% 10.0% 11.5% 12.1%
0.3 - 0.4 5.7% 8.7% 10.0% 10.7% 9.4%
0.4 - 0.5 6.6% 9.3% 10.0% 10.0% 7.8%
0.5 - 0.6 7.8% 9.9% 10.0% 9.3% 6.6%
0.6 - 0.7 9.4% 10.7% 10.0% 8.8% 5.7%
0.7 - 0.8 12.1% 11.5% 10.0% 8.2% 5.0%
0.8 - 0.9 16.9% 12.6% 10.0% 7.8% 4.5%
0.9 - 1.0 27.9% 13.8% 10.0% 7.5% 4.1%

The exponential distribution is achieved by using base ** exponent where exponent is a random float between 0 and 1.
The result is then stretched evenly to fit to min and max.

@ST-DDT ST-DDT added c: feature Request for new feature p: 1-normal Nothing urgent m: number Something is referring to the number module labels Jan 18, 2025
@ST-DDT ST-DDT added this to the vAnytime milestone Jan 18, 2025
@ST-DDT ST-DDT requested review from a team January 18, 2025 15:01
@ST-DDT ST-DDT self-assigned this Jan 18, 2025
@netlify
Copy link

netlify bot commented Jan 18, 2025

Deploy Preview for fakerjs ready!

Name Link
🔨 Latest commit b71f3a9
🔍 Latest deploy log https://app.netlify.com/projects/fakerjs/deploys/6960dadc5d6b210008cdc6b6
😎 Deploy Preview https://deploy-preview-3375.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link

codecov bot commented Jan 18, 2025

Codecov Report

❌ Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.97%. Comparing base (82f2b9d) to head (b71f3a9).

Files with missing lines Patch % Lines
src/distributors/exponential.ts 78.57% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             next    #3375      +/-   ##
==========================================
- Coverage   99.97%   99.97%   -0.01%     
==========================================
  Files        2995     2998       +3     
  Lines      236324   236353      +29     
  Branches      941      946       +5     
==========================================
+ Hits       236267   236291      +24     
- Misses         57       62       +5     
Files with missing lines Coverage Δ
src/distributors/distributor.ts 100.00% <100.00%> (ø)
src/distributors/uniform.ts 100.00% <100.00%> (ø)
src/index.ts 100.00% <100.00%> (ø)
src/modules/number/index.ts 100.00% <100.00%> (ø)
src/distributors/exponential.ts 78.57% <78.57%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jan 18, 2025

I'm also not sure whether exponentialDistribution is the correct name for the function.

@matthewmayer
Copy link
Contributor

I don't like calling it "exponentialDistribution " because you're not returning a distribution, you are returning a single value from that distribution.

So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jan 19, 2025

you are returning a single value from that distribution.

I'm not even sure that is the case.

So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value.

Good idea. I'm not finally sure which term I would like to use but stepping away from calling it exponentialDistribution is the correct choice.

I'm currently considering one of these:

  • exponential
  • exponentialValue
  • exponentialNumber
  • exponentialFloat
  • floatExponential
  • exponentiallySpread (or something similar)
  • powerLaw___ (<- ChatGpt)

@xDivisionByZerox
Copy link
Member

Team decision

The feature will be implemented by extending the already existing methods (number.int(), number.float(), etc). An additional parameter will be added. The parameter will be a function that exepts a randomizer instance and will return a biased randomized value. The return value will be used by the original function to "push" the value in the direction the distribution desires.

We are currently unsure how to name the parameter. While distribution is okay, we are not sure if this is the mathmatical correct term.
The name "spread" was discussed, but discarded since it might confuse users with the JS spread operator.
We will think about other names until the next meeting. If we do not come up with any, the current name of "distribution" is fine for us.

@ST-DDT ST-DDT changed the title feat(number): add exponentialDistribution function feat(number): add distributor functions Feb 17, 2025
@ST-DDT
Copy link
Member Author

ST-DDT commented Feb 17, 2025

Ready for review.

I changed the implementations to use a distributor parameter.
distributor is close enough to distribution so that you can guess what the option does, without actually running into the mathematical specification.

@ST-DDT ST-DDT linked an issue Feb 17, 2025 that may be closed by this pull request
@ST-DDT ST-DDT requested a review from a team February 17, 2025 21:56
Copy link
Member

@xDivisionByZerox xDivisionByZerox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I'm looking at this PR, the more I'm falling in love with this feature's design. ❤️

I found some small things that I was noticing, while trying to get back into this topic. @ST-DDT are you still available or can I take over the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not couple the number tests with the implementation details of our given distributor functions, we should probably use custom, "deterministic" ones in theses test cases. Additionally, we should add separate test cases for the provided distributor functions. For example: The exponential distributor is currently not being checked for a negative "base" input.

import { uniformDistributor } from './uniform';

/**
* Creates a new function that generates power-law/exponentially distributed values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a more accessible explanation what this means as well.

Suggested change
* Creates a new function that generates power-law/exponentially distributed values.
* Creates a new function that generates exponentially distributed values.
* This means that values on the edges (min or max based on configuration) are more likely to be generated.

This needs to be put in the second signature as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this is a power law distribution and not a exponential distribution in the mathematical sense.

This means that values on one side of the of the value range are far more likely than the values on the other side of the range (depending on configuration).

Or something similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewmayer can you give insight here? I did not study mathematics, so I can only give perspective from a user that is clueless about these things.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jun 17, 2025

@ST-DDT are you still available or can I take over the PR?

Feel free to take over.

@matthewmayer
Copy link
Contributor

What is the point of the uniformDistributor? Isn't that the same as not passing a distributor at all?

@matthewmayer
Copy link
Contributor

Should we provide a sample normal distribution function (it wouldn't be a true normal distribution because it's bounded at the min and max).

But I think it would be a fairly common use case to want something like "generate me heights in centimetres, with a mean of 170cm and standard deviation of 7cm." So I think we could provide a "recipe" for that kind of distribution even if it's not strictly normal.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jun 21, 2025

What is the point of the uniformDistributor? Isn't that the same as not passing a distributor at all?

To allow for using a non-nullish distributor value.
But I dont particulary care.

@matthewmayer
Copy link
Contributor

What is the point of the uniformDistributor? Isn't that the same as not passing a distributor at all?

To allow for using a non-nullish distributor value.

But I particulary care.

That's fine maybe we could just clarify that in the documentation. Otherwise a developer may set the option and wonder why "nothing happens".

@ST-DDT
Copy link
Member Author

ST-DDT commented Jun 21, 2025

EDIT/fix:
But I dont particulary care.

@ST-DDT
Copy link
Member Author

ST-DDT commented Jun 21, 2025

Should we provide a sample normal distribution function (it wouldn't be a true normal distribution because it's bounded at the min and max).

But I think it would be a fairly common use case to want something like "generate me heights in centimetres, with a mean of 170cm and standard deviation of 7cm." So I think we could provide a "recipe" for that kind of distribution even if it's not strictly normal.

You might want to start without it first.
It is tricky in regard to boundary matching and you might want to have a mathematician have a look at it so it doesnt deviate too much from the expectations.
Aka:

  • how do we prevent values outside of the 0-1 range, because a normal distribution goes -infinity to +infinity
  • how does that change if we configure the "width" or "center" of the distribution
  • how can we help users configure it more easily, where is "170cm height" or "7cm std diviation" in the scale between 0 and 1.

@matthewmayer
Copy link
Contributor

I am a mathematician 😀 I will have a go at something that would meet expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: feature Request for new feature m: number Something is referring to the number module p: 1-normal Nothing urgent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding Random probability distribution function

5 participants