clib.conversion._to_numpy: Add tests for pandas.Series with pandas string dtype#3607
Merged
clib.conversion._to_numpy: Add tests for pandas.Series with pandas string dtype#3607
Conversation
seisman
commented
Nov 10, 2024
| # types. | ||
| for col, array in enumerate(arrays[2:]): | ||
| if pd.api.types.is_string_dtype(array.dtype): | ||
| if np.issubdtype(array.dtype, np.str_): |
Member
Author
seisman
commented
Nov 10, 2024
| strings = np.array( | ||
| [" ".join(vals) for vals in zip(*string_arrays, strict=True)] | ||
| [" ".join(vals) for vals in zip(*string_arrays, strict=True)], | ||
| dtype=np.str_, |
Member
Author
There was a problem hiding this comment.
Specifying dtype is not necesary here, but I feel it's good to expicitly tell that here we're expecting a np.str_ array.
35 tasks
michaelgrund
approved these changes
Nov 12, 2024
seisman
commented
Nov 14, 2024
| vec_dtype = str(getattr(data, "dtype", "")) | ||
| array = np.ascontiguousarray(data, dtype=dtypes.get(vec_dtype)) | ||
|
|
||
| # Check if a np.object_ array can be converted to np.str_. |
Member
Author
There was a problem hiding this comment.
This is necessary to support pd.Series string like:
x = pd.Series(["abc", "defg", "12345"], dtype=None)
x = pd.Series(["abc", "defg", "12345"], dtype=np.str_)
x = pd.Series(["abc", "defg", "12345"], dtype="U10")
| [" ".join(vals) for vals in zip(*string_arrays, strict=True)], | ||
| dtype=np.str_, | ||
| ) | ||
| strings = np.asanyarray(a=strings, dtype=np.str_) |
Member
Author
weiji14
approved these changes
Nov 15, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Add tests for pandas.Series with string dtype. Six cases are tested:
dtype=Nonedtype=np.str_dtype="U10"dtype="string[python]"dtype="string[pyarrow]"dtype="string[pyarrow_numpy]"Neither can be converted to
np.str_directly. Cases 4-6 can be fixed by 01ba317, and cases 1-3 can be fixed by dac7e8e.