New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try and preserve the structure of the html during a diff #350
base: master
Are you sure you want to change the base?
Changes from all commits
ba831cf
c4c504c
b340f0a
66e89a8
f4c38c6
2c9aca8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,8 +54,8 @@ As a special case, changing the href of a link is displayed, and | |
images are treated like words: | ||
|
||
>>> pdiff('<a href="http://yahoo.com">search</a>', '<a href="http://google.com">search</a>') | ||
<a href="http://google.com">search <ins> Link: http://google.com</ins> | ||
<del> Link: http://yahoo.com</del> </a> | ||
<a href="http://google.com">search <ins> Link: http://google.com | ||
</ins> <del> Link: http://yahoo.com</del> </a> | ||
Comment on lines
-57
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this a necessary change? It doesn't feel right. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tbh, didn't feel right to me either but this test breaks without it, and I didn't have much luck trying to figure out why. Initial instinct was that it has something to do with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the underlying cause for this appears to be an extra trailing whitespace that appears to get added to the current lxml diff:
diff with this patchset: What I'm not entirely sure about is why the trailing whitespace which ideally should also be present in the current lxml diff (according to call to In any case, would you consider this as a regression in behavior worthy of a fix or does this fix of the test case suffice ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What strikes me is that there is a space before the closing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I'll investigate and attempt to make them consistent. |
||
>>> pdiff('<p>Print this <img src="print.gif"></p>', '<p>Print this</p>') | ||
<p>Print this <del><img src="print.gif"></del> </p> | ||
>>> pdiff('<a href="http://yahoo.com">search</a>', '<a href="http://yahoo.com">search</a>') | ||
|
@@ -87,6 +87,21 @@ Whitespace is generally ignored for the diff but preserved during the diff: | |
second | ||
<ins>third</ins> </pre> | ||
|
||
Ensure we preserve the html structure on doing the diff: | ||
|
||
>>> a = "<div id='first'>some old text</div><div id='last'>more old text</div>" | ||
>>> b = "<div id='first'>some old text</div><div id='middle'>and new text</div><div id='last'>more old text</div>" | ||
>>> pdiff(a, b) | ||
<div id="first"><ins>some old text</ins></div> <div id="middle"> | ||
<ins>and new</ins> <del>some old</del> text</div><div id="last">more | ||
old text</div> | ||
>>> a = "<div><p>Some text that will change</p><p>Some tags will be added</p></div>" | ||
>>> b = "<div><div><p>Some text that has changed a bit</p><p>All of this is new</p></div></div>" | ||
>>> pdiff(a, b) | ||
<div><div><p>Some text that <ins>has changed a bit</ins> </p> | ||
<p><ins>All of this is new</ins></p> </div> <del>will | ||
change</del><p><del>Some tags will be added</del></p> </div> | ||
Comment on lines
+98
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This diff looks weird. Is it getting confused by the nested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's correct. In this specific case, the rendered diff looks 'ok' to me (with the insertions preceding the deletions) but yeah, I imagine in larger context it might be less than ideal. I'm going to take another stab at this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw, fwiw, this isn't very different from the diff generated by the current code for the same input. The only difference being the closing current lxml diff: <div><div><p>Some text that <ins>has changed a bit</ins><p><ins>All of
this is new</ins></p> </p></div> <del>will change</del><p><del>Some
tags will be added</del></p> </div> diff generated with this patchset <div><div><p>Some text that <ins>has changed a bit</ins> </p>
<p><ins>All of this is new</ins></p> </div> <del>will
change</del><p><del>Some tags will be added</del></p> </div> |
||
|
||
The sixteen combinations:: | ||
|
||
First "insert start" (del start/middle/end/none): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition in this line probably doesn't work any more if
with_idx=True
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, wouldn't it ? Here
balanced
would be something like[(1, '<div>'), None, ...]
ifwith_idx==True
, so theif chunk is not None
check still does the right thing.