Improve performance when handling large XML files with significant column augmentation #145

ollietulloch · 2025-09-01T22:32:02Z

The Xml::Table class attempts to augment column_mappings based on the data where there might be repeating sections/fields. It does this for each line, and so can be memory/compute intensive.

This PR takes a number of small steps to improve the efficiency, focusing on the Xml::MaskedMappings class.

Before this PR, ~1250 repeating sections took ~95 seconds for mapping augmentation. This PR gets that down to ~5.8 seconds.

Some of the small steps...

Frozen object reuse e.g. DO_NOT_CAPTURE_MAPPING = { 'do_not_capture' => true }.freeze
Using deep_dup only when needed
Pre-allocating Arrays and hash with exact size to avoid resizing operations
Direct array assignment to avoid intermediate objects (e.g. .map)
Where possible, using O(1) lookups instead of O(n) searches
Use Sets

kenny-lee-1

LGTM
Thanks for looking at it

ollietulloch added 2 commits September 1, 2025 23:28

Improve masked mapping performance

c75eda7

CHANGELOG entry

634b569

kenny-lee-1 approved these changes Sep 2, 2025

View reviewed changes

ollietulloch merged commit 0521cbb into main Sep 2, 2025
20 checks passed

ollietulloch deleted the xml-masked-mapping-performance branch September 2, 2025 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance when handling large XML files with significant column augmentation #145

Improve performance when handling large XML files with significant column augmentation #145

Uh oh!

ollietulloch commented Sep 1, 2025

Uh oh!

kenny-lee-1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve performance when handling large XML files with significant column augmentation #145

Improve performance when handling large XML files with significant column augmentation #145

Uh oh!

Conversation

ollietulloch commented Sep 1, 2025

Uh oh!

kenny-lee-1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants