Skip to content

feat: Add CSV format support to write_to_storage_sync #321

@cofin

Description

@cofin

Summary

Add CSV format support to the write_to_storage_sync() method on ArrowResult, similar to the existing Parquet support.

Use Case

When building data export tools that need to produce CSV files compatible with other systems (e.g., shell scripts, legacy tools), having native CSV export would simplify the workflow.

Currently, you can export to Parquet:

result = session.select_to_arrow("SELECT * FROM users")
result.write_to_storage_sync("/path/to/users.parquet", format_hint="parquet")

But for CSV, you need to convert to pandas first:

result = session.select_to_arrow("SELECT * FROM users")
df = result.to_pandas()
df.to_csv("/path/to/users.csv", sep="|", index=False)

Proposed Solution

Add format_hint="csv" support to write_to_storage_sync():

result = session.select_to_arrow("SELECT * FROM users")
result.write_to_storage_sync(
    "/path/to/users.csv",
    format_hint="csv",
    delimiter="|",
    header=True,
    quote_style="all"  # or "needed", "none"
)

Additional Options

Consider supporting these CSV options:

  • delimiter - field separator (default: ,)
  • header - include header row (default: True)
  • quote_style - how to quote fields (all, needed, none)
  • null_value - string to represent NULL values

Implementation Notes

PyArrow has pyarrow.csv.write_csv() which could be used for the implementation:

import pyarrow.csv as pa_csv

write_options = pa_csv.WriteOptions(
    include_header=True,
    delimiter=delimiter
)
pa_csv.write_csv(table, path, write_options=write_options)

Context

This came up when building shell-script-compatible collection output for database migration tools where CSV with specific formatting (pipe delimiter, quoted strings) is required for downstream compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions