Conversation
|
Hmm, The tests are already present in https://github.com/QuantEcon/QuantEcon.py/blob/main/quantecon/tests/test_inequality.py so I thought to skip adding them. |
|
Is this adding new features or refactoring the code? |
|
This is just refactoring the code and removing some |
|
@Smit-create It is usually advised to use for loops (and avoid creation of intermediate arrays) in |
I found this better in terms of time performance as this is vectorized version while the extra space complexity (creating new arrays) is still the same for both the functions -- O(n) |
|
@Smit-create thanks for opening this PR. Would you mind to update the top level comment box with a description of the change and document the performance improvements you mention in this thread so it's clear what this change is about. |
|
Thanks @mmcky. Done |
|
@Smit-create I haven't merged this as it isn't clear what level of performance improvement is being made. Is it 2% or 20% for example, it would be helpful to have timings between the different versions to make an assessment here. @oyamad makes a good point around |
There was a problem hiding this comment.
Pull Request Overview
This PR vectorizes the lorenz_curve and gini_coefficient functions to improve performance by replacing Python loops with NumPy vectorized operations. The changes eliminate explicit iteration in favor of array operations that can be optimized by NumPy's underlying C implementations.
Key changes:
- Replace manual loop-based cumulative calculations in
lorenz_curvewith vectorized NumPy operations - Optimize
gini_coefficientby reducing nested loops and using vectorized absolute difference calculations - Remove trailing whitespace
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| """ | ||
|
|
||
| n = len(y) | ||
| n = y.shape[0] |
There was a problem hiding this comment.
[nitpick] Using y.shape[0] instead of len(y) is inconsistent with the pattern used elsewhere in the function (line 76 still uses len(y)). Consider keeping len(y) for consistency, as it works for both arrays and lists.
| n = y.shape[0] | |
| n = len(y) |
| i_sum[i] += abs(y[i] - y[j]) | ||
| return np.sum(i_sum) / (2 * n * np.sum(y)) | ||
| i_sum += np.sum(np.abs(y[i] - y)) | ||
| t_sum += y[i] |
There was a problem hiding this comment.
This vectorization still contains a loop over i and computes np.abs(y[i] - y) for each element, which may not provide the expected performance improvement. Consider fully vectorizing using broadcasting: np.sum(np.abs(y[:, np.newaxis] - y[np.newaxis, :])) to eliminate the loop entirely.
lorenz_curveandgini_coefficient.