Skip to content

GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None#49247

Draft
AlenkaF wants to merge 1 commit intoapache:mainfrom
AlenkaF:gh-49002-pandas-string-to_pandas-empty
Draft

GH-49002: [Python] Fix array.to_pandas string type conversion for arrays with None#49247
AlenkaF wants to merge 1 commit intoapache:mainfrom
AlenkaF:gh-49002-pandas-string-to_pandas-empty

Conversation

@AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Feb 11, 2026

Rationale for this change

The conversion from array with string type to pandas series, when array only has a None element, has been taking the old code path even with pandas 3.0.

What changes are included in this PR?

Always check dtype in the _array_like_to_pandas conversion and use pandas new default string dtype if available.

Are these changes tested?

Yes.

Are there any user-facing changes?

No, only bug fix.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlenkaF this looks good to me. Seems to be what was proposed by @jorisvandenbossche in the issue and in-line to what we have here:

# for pandas 3.0+, use pandas' new default string dtype
if _pandas_api.uses_string_dtype() and not strings_to_categorical:
for field in table.schema:
if field.name not in ext_columns and (
pa.types.is_string(field.type)
or pa.types.is_large_string(field.type)
or pa.types.is_string_view(field.type)
) and field.name not in categories:
ext_columns[field.name] = _pandas_api.pd.StringDtype(na_value=np.nan)

I'll wait until end of day in case @jorisvandenbossche has time to take a look otherwise I'll merge.

@github-actions github-actions bot removed the awaiting review Awaiting review label Feb 12, 2026
@raulcd raulcd self-requested a review February 12, 2026 08:35
@github-actions github-actions bot added the awaiting merge Awaiting merge label Feb 12, 2026
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, I've just realised there are several test failures which are related and we should fix. Should have checked CI before :)

@raulcd raulcd self-requested a review February 12, 2026 08:36
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Feb 12, 2026
dtype = "object"
elif types_mapper:
dtype = types_mapper(original_type)
elif _pandas_api.uses_string_dtype() and (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to move this check a bit more above, together with the if/elif block checking for types_mapper. Because then we get to if hasattr(dtype, '__from_arrow__'): and will avoid actually converting the pyarrow memory to an numpy object-dtype array of strings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my first idea but didn't think it well through before changing to what I have now. The thing is that I have hit this line https://github.com/AlenkaF/arrow/blob/6f1fda5ef1cfe7ee40ccd1ddefc3861c2718d920/python/pyarrow/array.pxi#L2319
and then the change of the dtype got reverted to None.

Will try putting the whole if/elif block further back, as suggested (if I do not break something else).

dtype = "object"
elif types_mapper:
dtype = types_mapper(original_type)
elif _pandas_api.uses_string_dtype() and (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif _pandas_api.uses_string_dtype() and (
elif _pandas_api.uses_string_dtype() and not strings_to_categorical and (

like we do in the other place with this logic (which Raul quoted), only that you will have to get the option value out of the options object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do!

@AlenkaF
Copy link
Member Author

AlenkaF commented Feb 12, 2026

Thanks for quick reviews! Will go through the comments now - was going through all the tests I just broke =) Should have put the PR back to draft, will do so next time.

@AlenkaF AlenkaF marked this pull request as draft February 12, 2026 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants