fix: preserve whitespace-only content in inline tags (#155)#253
Open
sp2935 wants to merge 1 commit intomatthewwithanm:developfrom
Open
fix: preserve whitespace-only content in inline tags (#155)#253sp2935 wants to merge 1 commit intomatthewwithanm:developfrom
sp2935 wants to merge 1 commit intomatthewwithanm:developfrom
Conversation
) When an inline formatting tag (strong, b, em, i, etc.) contains only whitespace, the content is now preserved as a single space instead of being stripped entirely. This fixes issue matthewwithanm#155 where text like `further<strong> </strong>reference` was incorrectly converted to `furtherreference` instead of `further reference`. Changes: - Modified chomp() to return (' ', '', ' ') for whitespace-only text - Modified abstract_inline_conversion() to skip markup for whitespace-only text - Updated test_chomp to reflect new expected behavior - Added test_whitespace_only_inline_tags for regression testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes issue #155 where whitespace-only content inside inline formatting tags (like
<strong> </strong>) was being stripped entirely, causing words to concatenate incorrectly.Before:
further<strong> </strong>reference→furtherreferenceAfter:
further<strong> </strong>reference→further referenceProblem
The
chomp()function strips leading/trailing whitespace from text inside inline tags to prevent incorrect markdown like** foo**. However, when the tag contains only whitespace,chomp()would return('', '', '')(with prefix stripped and empty text), which then gets converted to an empty string byabstract_inline_conversion().This causes text like
word1<b> </b>word2to becomeword1word2instead ofword1 word2.Solution
Modified
chomp()to detect whitespace-only text and return('', '', ' ')- preserving a single space as the text content.Modified
abstract_inline_conversion()to check if the chomped text is whitespace-only (text.isspace()) and if so, return just the whitespace without wrapping it in markdown markers.Changes
markdownify/__init__.py: Updatedchomp()andabstract_inline_conversion()tests/test_advanced.py: Updatedtest_chompexpectations to reflect new behaviortests/test_conversions.py: Addedtest_whitespace_only_inline_tags()for regression testingTest Plan
test_whitespace_only_inline_tagsverifies the fix🤖 Generated with Claude Code