Skip to content

Conversation

@ollietulloch
Copy link
Contributor

The Xml::Table class attempts to augment column_mappings based on the data where there might be repeating sections/fields. It does this for each line, and so can be memory/compute intensive.

This PR takes a number of small steps to improve the efficiency, focusing on the Xml::MaskedMappings class.

Before this PR, ~1250 repeating sections took ~95 seconds for mapping augmentation. This PR gets that down to ~5.8 seconds.

Some of the small steps...

  • Frozen object reuse e.g. DO_NOT_CAPTURE_MAPPING = { 'do_not_capture' => true }.freeze
  • Using deep_dup only when needed
  • Pre-allocating Arrays and hash with exact size to avoid resizing operations
  • Direct array assignment to avoid intermediate objects (e.g. .map)
  • Where possible, using O(1) lookups instead of O(n) searches
  • Use Sets

Copy link
Contributor

@kenny-lee-1 kenny-lee-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for looking at it

@ollietulloch ollietulloch merged commit 0521cbb into main Sep 2, 2025
20 checks passed
@ollietulloch ollietulloch deleted the xml-masked-mapping-performance branch September 2, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants