Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect slice diff output #238

Closed
dsnet opened this issue Oct 15, 2020 · 7 comments · Fixed by #247
Closed

Incorrect slice diff output #238

dsnet opened this issue Oct 15, 2020 · 7 comments · Fixed by #247
Labels

Comments

@dsnet
Copy link
Collaborator

dsnet commented Oct 15, 2020

Consider the following snippet:

got := "Forwarding service bread:foo-http-jean has a same-env shard undrained: aa:jean-shard000. The quick brown fox jumped over the lazy dog."
want := "Forwarding service bread:foo-api-scary has a same-env shard undrained: bb:scary-shard000. The quick brown fox jumped over the lazy dog."
fmt.Println(cmp.Diff(got, want))

Depending on the randomization involved, this sometimes prints:

  strings.Join({
  	"Fo",
- 	"rwar",
+ 	"rwar",
  	"ding service bread:foo-",
- 	"http-jean",
+ 	"api-scary",
  	" has a same-env shard undrained: ",
- 	"aa:jean",
+ 	"bb:scary",
  	"-shard000. The quick brown fox jumped over the lazy dog.",
  }, "")

when it should print:

  strings.Join({
  	"Forwarding service bread:foo-",
- 	"http-jean",
+ 	"api-scary",
  	" has a same-env shard undrained: ",
- 	"aa:jean",
+ 	"bb:scary",
  	"-shard000. The quick brown fox jumped over the lazy dog.",
  }, "")

Not that it incorrectly reports that "rwar" is different?

@dsnet dsnet added the bug label Oct 15, 2020
@Eun
Copy link

Eun commented Nov 20, 2020

Having the same on uint8 slices:

func TestDiff(t *testing.T) {
	fmt.Println(cmp.Diff(
		// Hello Universe
		[]uint8{0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x55, 0x6e, 0x69, 0x76, 0x65, 0x72, 0x73, 0x65},
		// Hello World
		[]uint8{0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64},
	))
}

sometimes reports

  []uint8{
  	... // 4 identical elements
  	0x6f,
  	0x20,
- 	0x55,
+ 	0x57,
- 	0x6e,
+ 	0x6f,
- 	0x69,
- 	0x76,
- 	0x65,
  	0x72,
- 	0x73,
+ 	0x6c,
- 	0x65,
+ 	0x64,
  }

sometimes

  bytes.Join({
  	"Hello ",
- 	"Universe",
+ 	"World",
  }, "")

@dsnet
Copy link
Collaborator Author

dsnet commented Nov 20, 2020

Hi @Eun, thanks for the report, but it's not the same issue. My original one is about the fact that the Diff is reporting a difference when there is none:

- 	"rwar",
+ 	"rwar",

This is incorrect.

The example you provided is one where the Diff is still correct, but sub-optimally shown. I'm fairly sure your specific case would be fixed by improving the optimality of the diffing algorithm (which is a different but valid concern). It currently uses a greedy search for differences, which may cause it to land in a local optimum, but is at least guaranteed to run in O(n). It just so happens that running the algorithm forwards produces a more optimal diff than running it backwards.

@Eun
Copy link

Eun commented Nov 20, 2020

@dsnet Yeah I also realized that this might be a different issue than yours, I wasn't sure tho. I will create a new issue for that.

@ghost
Copy link

ghost commented Nov 21, 2020

It appears that the problem may be related to the fact that when zigzagInit is assigned the value 1 in diff.Difference(), the first forward-searching match is determined to be (5, 2). Whereas when zigzagInit is assigned the value 0, the first forward match is correctly determined to be (0, 0). In the cases where the forward search is not allowed to begin at zero, it's as if it cannot discern matches that begin at the first element.

I notice the comments state:

// To ensure flexibility in changing the algorithm in the future,
// introduce some degree of deliberate instability.
// This is achieved by fiddling the zigzag iterator to start searching
// the graph starting from the bottom-right versus than the top-left.

Is this correct? Perhaps I'm misunderstanding, but it seems like zigZagInit does not drive whether the search begins from the top-left or bottom-right.

dsnet added a commit that referenced this issue Nov 23, 2020
A previous attempt to add non-determinism to the diffing algorithm
unfortunately broke the algorithm for half the cases.

This change modifies the algorithm to truly switch between starting
with a forward search versus a reverse search.
The main for-loop of Difference would switch repeatedly between
performing a forward search, then a reverse search, and vice-versa.
Since we can't jump into the middle of a for-loop to start with the
reverse search first, we use a series of labels and goto statements
to accomplish the same effect.

Fixes #238
@dsnet
Copy link
Collaborator Author

dsnet commented Nov 23, 2020

This bug is definitely related to zigzagInit. It's not operate remotely like what is documented. I sent #247, which should fix it.

Coincidentally, this also fixes the case that @Eun reported. However, I'm going to keep #244 open since improving the optimality of the diffing algorithm is still something we want to do.

@ghost
Copy link

ghost commented Nov 24, 2020

Nice. Here is a somewhat simpler solution that I had envisioned if you are interested.

@dsnet
Copy link
Collaborator Author

dsnet commented Nov 24, 2020

Thanks @C-DY for the PR! Mine may be a slightly more lines of change, but I was aiming to more clearly preserve the symmetry that exists between the forward and backward searches. They are identical except operating in different directions.

dsnet added a commit that referenced this issue Nov 24, 2020
A previous attempt to add non-determinism to the diffing algorithm
unfortunately broke the algorithm for half the cases.

This change modifies the algorithm to truly switch between starting
with a forward search versus a reverse search.
The main for-loop of Difference would switch repeatedly between
performing a forward search, then a reverse search, and vice-versa.
Since we can't jump into the middle of a for-loop to start with the
reverse search first, we use a series of labels and goto statements
to accomplish the same effect.

Fixes #238
dsnet added a commit that referenced this issue Nov 24, 2020
A previous attempt to add non-determinism to the diffing algorithm
unfortunately broke the algorithm for half the cases.

This change modifies the algorithm to truly switch between starting
with a forward search versus a reverse search.
The main for-loop of Difference would switch repeatedly between
performing a forward search, then a reverse search, and vice-versa.
Since we can't jump into the middle of a for-loop to start with the
reverse search first, we use a series of labels and goto statements
to accomplish the same effect.

Fixes #238
dsnet added a commit that referenced this issue Nov 24, 2020
A previous attempt to add non-determinism to the diffing algorithm
unfortunately broke the algorithm for half the cases.

This change modifies the algorithm to truly switch between starting
with a forward search versus a reverse search.
The main for-loop of Difference would switch repeatedly between
performing a forward search, then a reverse search, and vice-versa.
Since we can't jump into the middle of a for-loop to start with the
reverse search first, we use a series of labels and goto statements
to accomplish the same effect.

Fixes #238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants