GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None#49247
GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None#49247AlenkaF wants to merge 1 commit intoapache:mainfrom
Conversation
raulcd
left a comment
There was a problem hiding this comment.
Thanks @AlenkaF this looks good to me. Seems to be what was proposed by @jorisvandenbossche in the issue and in-line to what we have here:
arrow/python/pyarrow/pandas_compat.py
Lines 936 to 944 in d2315fe
I'll wait until end of day in case @jorisvandenbossche has time to take a look otherwise I'll merge.
raulcd
left a comment
There was a problem hiding this comment.
Oops, sorry, I've just realised there are several test failures which are related and we should fix. Should have checked CI before :)
| dtype = "object" | ||
| elif types_mapper: | ||
| dtype = types_mapper(original_type) | ||
| elif _pandas_api.uses_string_dtype() and ( |
There was a problem hiding this comment.
I think it would be better to move this check a bit more above, together with the if/elif block checking for types_mapper. Because then we get to if hasattr(dtype, '__from_arrow__'): and will avoid actually converting the pyarrow memory to an numpy object-dtype array of strings
There was a problem hiding this comment.
Yes, that was my first idea but didn't think it well through before changing to what I have now. The thing is that I have hit this line https://github.com/AlenkaF/arrow/blob/6f1fda5ef1cfe7ee40ccd1ddefc3861c2718d920/python/pyarrow/array.pxi#L2319
and then the change of the dtype got reverted to None.
Will try putting the whole if/elif block further back, as suggested (if I do not break something else).
| dtype = "object" | ||
| elif types_mapper: | ||
| dtype = types_mapper(original_type) | ||
| elif _pandas_api.uses_string_dtype() and ( |
There was a problem hiding this comment.
| elif _pandas_api.uses_string_dtype() and ( | |
| elif _pandas_api.uses_string_dtype() and not strings_to_categorical and ( |
like we do in the other place with this logic (which Raul quoted), only that you will have to get the option value out of the options object
|
Thanks for quick reviews! Will go through the comments now - was going through all the tests I just broke =) Should have put the PR back to draft, will do so next time. |
Rationale for this change
The conversion from array with string type to pandas series, when array only has a
Noneelement, has been taking the old code path even with pandas 3.0.What changes are included in this PR?
Always check
dtypein the_array_like_to_pandasconversion and use pandas new default stringdtypeif available.Are these changes tested?
Yes.
Are there any user-facing changes?
No, only bug fix.
None#49002