nom parser instead of ad-hoc in examples by cj-zhukov · Pull Request #20122 · apache/datafusion

cj-zhukov · 2026-02-03T06:45:39Z

Which issue does this PR close?

Closes #Explore replacing ad-hoc parsing logic in datafusion-examples with a nom-based parser #20025.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

cj-zhukov · 2026-02-03T06:55:06Z

High-Level Overview

This PR is an exploratory step to evaluate whether using a parser combinator library (nom) improves the clarity and robustness of the example documentation parsing logic.

In a previous PR #19750, the parsing of subcommands and example metadata in main.rs docs was implemented using ad-hoc string manipulation. While that approach works, this PR experiments with replacing that logic using nom for two functions:

parse_subcommand_line
parse_metadata_line

Personally, I found the nom-based implementation easier to read, reason about, and maintain. Expressing the grammar declaratively with a parser tool feels more natural for this kind of structured input, and the intent of the parsing logic is clearer compared to manual string slicing and conditionals.

That said, this PR is intentionally limited in scope. nom is currently used only for these two parsing helpers, and introducing a new dependency for such a narrow use case may not be justified on its own. The main open question is whether DataFusion would benefit from using nom more broadly for similar parsing tasks in the future.

If the project sees value in adopting nom for other parsing needs, this PR could serve as a small, contained starting point. Otherwise, it may be reasonable to stick with the existing ad-hoc approach to avoid dependency overhead.

Feedback on whether this trade-off is worthwhile is very welcome.

cj-zhukov · 2026-02-03T13:34:04Z

I'd like to keep the parser simple for now. Currently, it can't handle extra symbols like () in the description of an example. In practice, only one group udf has this case, so I updated its README to remove the parentheses.

I'm happy to improve the parser in the future to handle such cases more robustly if needed. For now, this keeps the code readable and avoids unnecessary complexity.

cj-zhukov · 2026-02-03T13:36:31Z

@Jefffrey since you helped with previous PRs related to example docs generation, it would be great if you could take a look at this one as well. Your feedback or any improvements would be much appreciated.

comphead

Thanks @cj-zhukov I have some feeling the examples_docs can be renamed to reflect it is a parser for examples, perhaps it would be good to split into smaller utility files as well. But this can be addressed in the following PR.

One thing to note is: nom not very actively developing

Jefffrey · 2026-02-06T01:57:54Z

datafusion-examples/README.md

-| udwf       | [`udf/simple_udwf.rs`](examples/udf/simple_udwf.rs)     | Simple UDWF example                             |
+| Subcommand | File Path                                               | Description                                   |
+| ---------- | ------------------------------------------------------- | --------------------------------------------- |
+| adv_udaf   | [`udf/advanced_udaf.rs`](examples/udf/advanced_udaf.rs) | Advanced User Defined Aggregate Function UDAF |


nit: I think the braces were nice since it was just highlighting the abbreviation

Good point - I agree that supporting extra symbols like parentheses in the description would be better.

I’ll update the parser in this PR to handle that case more robustly.

xudong963

Happy to see we use nom to do the work. (I had some good experiences using it to build sql parser years ago)

xudong963 · 2026-02-06T06:20:51Z

nom not very actively developing

IMO, It has become relatively mature.

cj-zhukov · 2026-02-06T09:49:21Z

Thanks @cj-zhukov I have some feeling the examples_docs can be renamed to reflect it is a parser for examples, perhaps it would be good to split into smaller utility files as well. But this can be addressed in the following PR.

One thing to note is: nom not very actively developing

Thanks for the feedback! I agree that examples_docs could be renamed to better reflect that it’s focused on parsing example metadata, and that the code could be split into smaller, more focused utility modules.

To keep this PR scoped and easy to review, I’d prefer to address renaming and refactoring in a follow-up PR. I’ll open one specifically for improving naming and structure.

Thanks also for the note about nom - I’m keeping its usage minimal here so it should be easy to adjust if needed in the future.

nom parser instead of ad-hoc in examples

9cd4a45

cj-zhukov added 3 commits February 3, 2026 14:33

fix typo and add Cargo.lock

f0e76ae

fix extra symbols () in desc

9314c5d

fix trim in parser and cargo clippy

6a7c67d

comphead approved these changes Feb 5, 2026

View reviewed changes

Jefffrey approved these changes Feb 6, 2026

View reviewed changes

xudong963 approved these changes Feb 6, 2026

View reviewed changes

fix support extra symbols in the description

1c26d5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nom parser instead of ad-hoc in examples#20122

nom parser instead of ad-hoc in examples#20122
cj-zhukov wants to merge 5 commits intoapache:mainfrom
cj-zhukov:cj-zhukov/nom-parser-instead-of-ad-hoc-in-examples

cj-zhukov commented Feb 3, 2026

Uh oh!

cj-zhukov commented Feb 3, 2026

Uh oh!

cj-zhukov commented Feb 3, 2026

Uh oh!

cj-zhukov commented Feb 3, 2026

Uh oh!

comphead left a comment

Uh oh!

Jefffrey Feb 6, 2026

Uh oh!

cj-zhukov Feb 6, 2026

Uh oh!

xudong963 left a comment

Uh oh!

xudong963 commented Feb 6, 2026

Uh oh!

cj-zhukov commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cj-zhukov commented Feb 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

cj-zhukov commented Feb 3, 2026

High-Level Overview

Uh oh!

cj-zhukov commented Feb 3, 2026

Uh oh!

cj-zhukov commented Feb 3, 2026

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

cj-zhukov Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

xudong963 commented Feb 6, 2026

Uh oh!

cj-zhukov commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants