Skip to content

diff large files without running out of memory; only unified format; probably buggy, but ~no memory usage

Notifications You must be signed in to change notification settings

unhammer/diff-large-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Usage:

    $ python2 diff-large-files.py fileA fileB

Pass -h to get options. Output is in unified diff format, possible to
pipe into `dwdiff --diff-input` to get word-diffs.

The script is meant to give visually inspectable diffs of large files
with few differences, to avoid the memory consumption issue with
diffing large files. When a difference is found, it moves forth in
both files a line at a time, looking for a line in one file that
matches a previously seen line in the other file.

This simple method works well for some types of files, but can easily
match the wrong lines if e.g. every other line is identical (or there
are a lot of identical lines in the files). In that case, you're
probably better off using `split` to split your input files and then
sending each file-pair into regular `diff`.

About

diff large files without running out of memory; only unified format; probably buggy, but ~no memory usage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages