Skip to content

Conversation

@ancy-augustin
Copy link

@ancy-augustin ancy-augustin commented Nov 11, 2024

What does this Pull Request accomplish?

Adds utility functions to interact with SystemLink DataFrame API using pandas data frames.

Why should this Pull Request be merged?

Simplifies working with pandas data frames and SystemLink DataFrame by providing convenient methods for creating, appending, and querying tables directly from pandas.

What testing has been done?

TODO: Detail what testing has been done to ensure this submission meets requirements.

@santhoshramaraj santhoshramaraj linked an issue Nov 11, 2024 that may be closed by this pull request
@santhoshramaraj santhoshramaraj added the enhancement New feature or request label Nov 11, 2024
index_name: str = None
if index:
index_name = _get_table_index_name(client=client, table_id=table_id)
if query.columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two if statements into a single if.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot combine the if statements because the index_name is required regardless of whether query.columns is None, and checking index_name in query.columns should only occur when both are valid.

@santhoshramaraj
Copy link
Member

It would be nice to have a function create_table_with_data_from_pandas_df to cover the common case where the whole data for the table is available and the user would like to upload it (create, upload, close inclusive) without much effort.

There is of course challenge in batching the rows upload if the data is large.

"""The names and order of the columns included in the data frame."""

data: List[List[Optional[str]]]
data: Optional[List[List[Optional[str]]]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could making this field optional lead to an error? This will provide a way to call the API without providing data.

client (DataFrameClient): Instance of DataFrameClient.
df (pd.DataFrame): Pandas dataframe.
table_name (str): Name of the table.
nullable_columns (bool): Make the columns nullable. Nullable columns can contain `null` values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a default value to nullable_columns field?

@santhoshramaraj What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to understand the impact of nullable_columns on performance to choose the default value or require the user to make a conscious decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get DataFrame data as Pandas DataFrame

3 participants