Skip to content

A port of the (line-based) Hunt-Szymanski diff algorithm to Elixir, with a focus on performance and memory efficiency.

License

Notifications You must be signed in to change notification settings

danielres/hunt_szymanski_diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hunt-Szymanski diff

A port of the Hunt-Szymanski diff algorithm to Elixir, with a focus on performance and memory efficiency.

This library is particularly suited for longer texts such as wiki pages or blog articles.

Credits

This library was generated with assistance from ChatGPT.
While I guided its development and reviewed the implementation, the core algorithm was recreated by AI.

Usage

left = """
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.
"""

right = """
Roses are blue,
Violets are blue,
I love fluffy clouds,
And so are you.
"""

# 1. Diff two strings

diff = HSDiff.diff(left, right)

# => 
# [
#   del: ["Roses are red,"],
#   ins: ["Roses are blue,"],
#   eq: ["Violets are blue,"],
#   del: ["Sugar is sweet,"],
#   ins: ["I love fluffy clouds,"],
#   eq: ["And so are you."],
#   eq: [""]
# ]


# 2. Optimize the diff result

optimized = HSDiff.optimize(diff)

# => 
# [
#   del: 1,
#   ins: ["Roses are blue,"],
#   eq: 1,
#   del: 1,
#   ins: ["I love fluffy clouds,"],
#   eq: 2
# ]


# 3. Patch 

restored = HSDiff.patch(left, optimized) 

# => "Roses are blue,\nViolets are blue,\nI love fluffy clouds,\nAnd so are you.\n"

Why Use Hunt-Szymanski Diff?

Elixir’s standard library already provides String.myers_difference/2 to generate diffs between strings (and lists). However:

  • High Memory Usage: For larger inputs (e.g., a typical article or blog post), String.myers_difference/2 can quickly use gigabytes of memory. (Note: this can be alleviated by chunking the input into smaller pieces)
  • Small vs Large Inputs: String.myers_difference/2 performs well for short strings or larger strings with small changes. But for large-scale diffs—such as comparing entire documents—the Hunt–Szymanski line-based diff is more memory-efficient and performs better.

In short, if you need a diff algorithm for big textual data, Hunt–Szymanski can help you avoid the heavy memory footprint that comes with other approaches.

Installation

Add hunt_szymanski_diff to your list of dependencies in mix.exs:

def deps do
  [
    {:hunt_szymanski_diff, "~> 0.1.0"}
  ]
end

About

A port of the (line-based) Hunt-Szymanski diff algorithm to Elixir, with a focus on performance and memory efficiency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages