Skip to content

Bump unstructured[csv,docx,md,pdf,pptx,xlsx] from 0.18.14 to 0.18.31 in /backend#2373

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/backend/unstructured-csvdocxmdpdfpptxxlsx--0.18.31
Closed

Bump unstructured[csv,docx,md,pdf,pptx,xlsx] from 0.18.14 to 0.18.31 in /backend#2373
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/backend/unstructured-csvdocxmdpdfpptxxlsx--0.18.31

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jan 31, 2026

Bumps unstructured[csv,docx,md,pdf,pptx,xlsx] from 0.18.14 to 0.18.31.

Release notes

Sourced from unstructured[csv,docx,md,pdf,pptx,xlsx]'s releases.

0.18.31

What's Changed

New Contributors

Full Changelog: Unstructured-IO/unstructured@0.18.28...0.18.31

0.18.28

Enhancement

  • Optimize clean_extra_whitespace_with_index_run (codeflash)
  • Optimize recursive_xy_cut_swapped (codeflash)
  • Optimize _DocxPartitioner._parse_category_depth_by_style_name (codeflash)
  • Optimize VertexAIEmbeddingEncoder._add_embeddings_to_elements (codeflash)
  • Optimize ngrams (codeflash)
  • Optimize stage_for_datasaur (codeflash)

0.18.27

Fixes

  • Comment no-ops in zoom_image (codeflash)
  • Fix an issue where elements with partially filled extracted text are marked as extracted

Enhancement

  • Optimize sentence_count (codeflash)
  • Optimize _PartitionerLoader._load_partitioner (codeflash)
  • Optimize detect_languages (codeflash)
  • Optimize contains_verb (codeflash)
  • Optimize get_bbox_thickness (codeflash)

... (truncated)

Changelog

Sourced from unstructured[csv,docx,md,pdf,pptx,xlsx]'s changelog.

0.18.31

Enhancements

  • Changed default DPI to 350
  • Add token-based chunking support: Added max_tokens, new_after_n_tokens, and tokenizer parameters to chunk_by_title() and chunk_elements() for chunking by token count instead of character count. Uses tiktoken for token counting. Install with pip install "unstructured[chunking-tokens]". (fixes #4127)

Fixes

0.18.30

Enhancements

  • Updated the Dockerfile to build from the chainguard base. Implemented updating and added base-packages that was done in the base-images repo to instead all be done here.
  • is_text_embedded now considers rotated text as low fidelity and and elements with no trivial amount of it are considered not embedded
  • Replace pdf2image with PyPDFium2 for PDF rendering
  • Optimize _get_optimal_value_for_bbox (codeflash)
  • Optimize _DocxPartitioner._style_based_element_type (codeflash)

Fixes

  • Fix EN DASH not cleaned by clean_bullets: Added EN DASH (\u2013) to UNICODE_BULLETS pattern so clean_bullets properly removes EN DASH bullet points without requiring clean_dashes (fixes #4105)
  • Change languages parameter default from ["auto"] to None: Updated default value in detect_languages() and partition_epub() functions. Behavior unchanged as None is converted to ["auto"] internally. (fixes #2471)
  • Resolve GHSA-58pv-8j8x-9vj2
  • use render mode data to determine if a character extracted by pdfminer is invisible or not

0.18.28

Enhancement

  • Optimize clean_extra_whitespace_with_index_run (codeflash)
  • Optimize recursive_xy_cut_swapped (codeflash)
  • Optimize _DocxPartitioner._parse_category_depth_by_style_name (codeflash)
  • Optimize VertexAIEmbeddingEncoder._add_embeddings_to_elements (codeflash)
  • Optimize ngrams (codeflash)
  • Optimize stage_for_datasaur (codeflash)

0.18.27

Fixes

  • Comment no-ops in zoom_image (codeflash)
  • Fix an issue where elements with partially filled extracted text are marked as extracted

Enhancement

  • Optimize sentence_count (codeflash)

... (truncated)

Commits
  • d1f1bdf chorse sep bump to resolve open CVEs (#4205)
  • d4caedf fix: Preserve Line Breaks in Code Blocks During Chunking (#4196)
  • 8f32550 fix(deps): Update semitechnologies/weaviate Docker tag to v1.35.3 (#4135)
  • dbe96e2 fix(deps): Update opensearchproject/opensearch Docker tag to v2.19.4 (#4134)
  • 7b366c5 fix(deps): Update docker.elastic.co/elasticsearch/elasticsearch Docker tag to...
  • f0b0e7c fix: filter coordinates kwargs to prevent TypeError in hi_res PDF processing ...
  • 01c3f7c Token-Based Chunking Support (#4203)
  • c0323a6 fix: remove sandbox=True from pypandoc to fix ODT conversion (#4193)
  • 95fea7e fix(deps): switch from pip-compile to uv pip compile (#4202)
  • 8cb6278 fix: reduce default dpi to 350 (#4199)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Jan 31, 2026

Labels

The following labels could not be found: dependencies. Please create it before Dependabot can add it to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@dependabot dependabot bot requested review from Phinease and WMC001 as code owners January 31, 2026 02:50
Bumps [unstructured[csv,docx,md,pdf,pptx,xlsx]](https://github.com/Unstructured-IO/unstructured) from 0.18.14 to 0.18.31.
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](Unstructured-IO/unstructured@0.18.14...0.18.31)

---
updated-dependencies:
- dependency-name: unstructured[csv,docx,md,pdf,pptx,xlsx]
  dependency-version: 0.18.31
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/backend/unstructured-csvdocxmdpdfpptxxlsx--0.18.31 branch from d99d2cc to 9cb8dfe Compare February 6, 2026 09:22
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Feb 13, 2026

Superseded by #2524.

@dependabot dependabot bot closed this Feb 13, 2026
@dependabot dependabot bot deleted the dependabot/pip/backend/unstructured-csvdocxmdpdfpptxxlsx--0.18.31 branch February 13, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants

Comments