Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust heuristic for line-based versus byte-based diffing (#299) #312

Closed
wants to merge 100 commits into from

Conversation

rgosdin
Copy link

@rgosdin rgosdin commented Nov 15, 2022

dsnet and others added 30 commits February 25, 2019 17:37
Switch the internal interface for reporting to use a push-pop mechanism.
This API is strictly more flexible than the previous Report-only API since it
enables the reporter to re-construct the entire value tree.

This change is done in preparation for a major overhaul of reporter logic.
Rename the file based on what it does rather than how its implemented.
Some of the value information is already in the path (e.g., MapIndex.Key
and SliceIndex.Key), which seems to suggest that all of the value information
should be in the Path. Doing so has several benefits:
	* It simplifies the comparison logic
	* It paves the way for FilterPath to be able to ignore missing
	slice elements or map entries
	* It allows for a simpler API for custom reporters

One regression introduced by this change is the removal of batching for
slices when sufficient number of elements differ. This logic was added
as a hack in the past to keep the reporter output small in such cases.
However, this batching functionality should be the responsibility of the
reporter logic, not the comparer logic. A future change will refactor
the reporter logic to reintroduce that feature.
This is necessary for the upcoming reporter re-implementation so that
it can format the differences with more intelligence.
No tests added in this PR since an upcoming change will heavily test
these code paths.
Rather than checking up-front whether the values are invalid,
give tryOptions a chance to operate on them and potentially ignore
the values. This is useful since a value can only be invalid if it
is a missing slice element or map entry, and provides FilterPath
combined with Ignore the ability to ignore such cases.

Some complexity is added to compareSlice to look for ignored elements
first before applying diffing so that we can decouple the stability
of the diffing algorithm from the primary result.
Add a benchmark that uses cmp.{Equal,Diff} to measure performance of
comparing large byte slices. Comparing large slices of primitives is
generally considered the most pathological use-case for cmp since the
rules of cmp dictate that it must run the filters on every element
of the slice even if the filters never apply.

In theory, a user of cmp could short-circuit comparison of slices by
providing a comparer for slices (e.g., cmp.Comparer(bytes.Equal)),
but it would be ideal for cmp to be sufficiently optimized that comparing
large slices is not too costly.

BenchmarkBytes/4KiB/EqualFilter0-4         	     300	   3792755 ns/op	   2.16 MB/s	  628702 B/op	    8276 allocs/op
BenchmarkBytes/4KiB/EqualFilter1-4         	     200	   6058637 ns/op	   1.35 MB/s	  890920 B/op	   16471 allocs/op
BenchmarkBytes/4KiB/EqualFilter2-4         	     200	   7498098 ns/op	   1.09 MB/s	  890952 B/op	   16472 allocs/op
BenchmarkBytes/4KiB/EqualFilter3-4         	     100	  10094237 ns/op	   0.81 MB/s	  891016 B/op	   16473 allocs/op
BenchmarkBytes/4KiB/EqualFilter4-4         	     100	  11128655 ns/op	   0.74 MB/s	  891016 B/op	   16473 allocs/op
BenchmarkBytes/4KiB/EqualFilter5-4         	     100	  12952062 ns/op	   0.63 MB/s	  891144 B/op	   16474 allocs/op
BenchmarkBytes/4KiB/DiffFilter0-4          	     300	   4015119 ns/op	   2.04 MB/s	  629425 B/op	    8305 allocs/op
BenchmarkBytes/4KiB/DiffFilter1-4          	     200	   6457963 ns/op	   1.27 MB/s	  891738 B/op	   16501 allocs/op
BenchmarkBytes/4KiB/DiffFilter2-4          	     200	   8371022 ns/op	   0.98 MB/s	  891770 B/op	   16502 allocs/op
BenchmarkBytes/4KiB/DiffFilter3-4          	     200	   9420398 ns/op	   0.87 MB/s	  891834 B/op	   16503 allocs/op
BenchmarkBytes/4KiB/DiffFilter4-4          	     100	  12493540 ns/op	   0.66 MB/s	  891835 B/op	   16503 allocs/op
BenchmarkBytes/4KiB/DiffFilter5-4          	     100	  14208928 ns/op	   0.58 MB/s	  891965 B/op	   16504 allocs/op
BenchmarkBytes/64KiB/EqualFilter0-4        	      20	  66146276 ns/op	   1.98 MB/s	10826734 B/op	  131206 allocs/op
BenchmarkBytes/64KiB/EqualFilter1-4        	      10	 102687117 ns/op	   1.28 MB/s	15021123 B/op	  262281 allocs/op
BenchmarkBytes/64KiB/EqualFilter2-4        	      10	 125444269 ns/op	   1.04 MB/s	15021174 B/op	  262282 allocs/op
BenchmarkBytes/64KiB/EqualFilter3-4        	      10	 154392619 ns/op	   0.85 MB/s	15021219 B/op	  262283 allocs/op
BenchmarkBytes/64KiB/EqualFilter4-4        	      10	 183307772 ns/op	   0.72 MB/s	15021219 B/op	  262283 allocs/op
BenchmarkBytes/64KiB/EqualFilter5-4        	       5	 212334761 ns/op	   0.62 MB/s	15021340 B/op	  262284 allocs/op
BenchmarkBytes/64KiB/DiffFilter0-4         	      20	  67147375 ns/op	   1.95 MB/s	10828312 B/op	  131244 allocs/op
BenchmarkBytes/64KiB/DiffFilter1-4         	      10	 105385709 ns/op	   1.24 MB/s	15022724 B/op	  262319 allocs/op
BenchmarkBytes/64KiB/DiffFilter2-4         	      10	 135994379 ns/op	   0.96 MB/s	15022763 B/op	  262320 allocs/op
BenchmarkBytes/64KiB/DiffFilter3-4         	      10	 159346836 ns/op	   0.82 MB/s	15022820 B/op	  262321 allocs/op
BenchmarkBytes/64KiB/DiffFilter4-4         	      10	 190129916 ns/op	   0.69 MB/s	15022843 B/op	  262321 allocs/op
BenchmarkBytes/64KiB/DiffFilter5-4         	       5	 208805873 ns/op	   0.63 MB/s	15022996 B/op	  262322 allocs/op
BenchmarkBytes/1MiB/EqualFilter0-4         	       2	 978302544 ns/op	   2.14 MB/s	173184064 B/op	 2097346 allocs/op
BenchmarkBytes/1MiB/EqualFilter1-4         	       1	1543190869 ns/op	   1.36 MB/s	240293104 B/op	 4194502 allocs/op
BenchmarkBytes/1MiB/EqualFilter2-4         	       1	1998443802 ns/op	   1.05 MB/s	240293232 B/op	 4194504 allocs/op
BenchmarkBytes/1MiB/EqualFilter3-4         	       1	2507293058 ns/op	   0.84 MB/s	240293328 B/op	 4194506 allocs/op
BenchmarkBytes/1MiB/EqualFilter4-4         	       1	2981132381 ns/op	   0.70 MB/s	240293008 B/op	 4194502 allocs/op
BenchmarkBytes/1MiB/EqualFilter5-4         	       1	3351177035 ns/op	   0.63 MB/s	240293424 B/op	 4194506 allocs/op
BenchmarkBytes/1MiB/DiffFilter0-4          	       1	1132136753 ns/op	   1.85 MB/s	173185752 B/op	 2097384 allocs/op
BenchmarkBytes/1MiB/DiffFilter1-4          	       1	1666196345 ns/op	   1.26 MB/s	240294504 B/op	 4194537 allocs/op
BenchmarkBytes/1MiB/DiffFilter2-4          	       1	2204467232 ns/op	   0.95 MB/s	240294600 B/op	 4194539 allocs/op
BenchmarkBytes/1MiB/DiffFilter3-4          	       1	2499107753 ns/op	   0.84 MB/s	240294600 B/op	 4194539 allocs/op
BenchmarkBytes/1MiB/DiffFilter4-4          	       1	2966222324 ns/op	   0.71 MB/s	240295016 B/op	 4194544 allocs/op
BenchmarkBytes/1MiB/DiffFilter5-4          	       1	3382045549 ns/op	   0.62 MB/s	240294728 B/op	 4194540 allocs/op
BenchmarkBytes/16MiB/EqualFilter0-4        	       1	16585516720 ns/op	   2.02 MB/s	2967917664 B/op	33554689 allocs/op
BenchmarkBytes/16MiB/EqualFilter1-4        	       1	23980880452 ns/op	   1.40 MB/s	4041659536 B/op	67109123 allocs/op
BenchmarkBytes/16MiB/EqualFilter2-4        	       1	30729382462 ns/op	   1.09 MB/s	4041659568 B/op	67109124 allocs/op
BenchmarkBytes/16MiB/EqualFilter3-4        	       1	37830223988 ns/op	   0.89 MB/s	4041660016 B/op	67109129 allocs/op
BenchmarkBytes/16MiB/EqualFilter4-4        	       1	44731081109 ns/op	   0.75 MB/s	4041659536 B/op	67109124 allocs/op
BenchmarkBytes/16MiB/EqualFilter5-4        	       1	52110015114 ns/op	   0.64 MB/s	4041659760 B/op	67109126 allocs/op
BenchmarkBytes/16MiB/DiffFilter0-4         	       1	20349410654 ns/op	   1.65 MB/s	2967919128 B/op	33554724 allocs/op
BenchmarkBytes/16MiB/DiffFilter1-4         	       1	27073250483 ns/op	   1.24 MB/s	4041661320 B/op	67109163 allocs/op
BenchmarkBytes/16MiB/DiffFilter2-4         	       1	32223912220 ns/op	   1.04 MB/s	4041661064 B/op	67109160 allocs/op
BenchmarkBytes/16MiB/DiffFilter3-4         	       1	39189759283 ns/op	   0.86 MB/s	4041661128 B/op	67109161 allocs/op
BenchmarkBytes/16MiB/DiffFilter4-4         	       1	48344470628 ns/op	   0.69 MB/s	4041661256 B/op	67109163 allocs/op
BenchmarkBytes/16MiB/DiffFilter5-4         	       1	51184873999 ns/op	   0.66 MB/s	4041661256 B/op	67109162 allocs/op

The last benchmark shows that the current implementation allocates
an astonishing 3.75GiB just to compare a 16MiB byte slice.
Avoid appending validator in every call to tryOptions and
instead ensure that the state starts with one in the options list.
Doing so dramatically reduces the number of allocations.

benchmark                               old MB/s     new MB/s     speedup
BenchmarkBytes/4KiB/EqualFilter0-4      2.16         2.56         1.19x
BenchmarkBytes/4KiB/EqualFilter1-4      1.35         1.53         1.13x
BenchmarkBytes/4KiB/EqualFilter2-4      1.09         1.20         1.10x
BenchmarkBytes/4KiB/EqualFilter3-4      0.81         0.99         1.22x
BenchmarkBytes/4KiB/EqualFilter4-4      0.74         0.77         1.04x
BenchmarkBytes/4KiB/EqualFilter5-4      0.63         0.69         1.10x
BenchmarkBytes/4KiB/DiffFilter0-4       2.04         2.38         1.17x
BenchmarkBytes/4KiB/DiffFilter1-4       1.27         1.52         1.20x
BenchmarkBytes/4KiB/DiffFilter2-4       0.98         1.15         1.17x
BenchmarkBytes/4KiB/DiffFilter3-4       0.87         0.96         1.10x
BenchmarkBytes/4KiB/DiffFilter4-4       0.66         0.81         1.23x
BenchmarkBytes/4KiB/DiffFilter5-4       0.58         0.69         1.19x
BenchmarkBytes/64KiB/EqualFilter0-4     1.98         2.31         1.17x
BenchmarkBytes/64KiB/EqualFilter1-4     1.28         1.71         1.34x
BenchmarkBytes/64KiB/EqualFilter2-4     1.04         1.23         1.18x
BenchmarkBytes/64KiB/EqualFilter3-4     0.85         0.93         1.09x
BenchmarkBytes/64KiB/EqualFilter4-4     0.72         0.83         1.15x
BenchmarkBytes/64KiB/EqualFilter5-4     0.62         0.70         1.13x
BenchmarkBytes/64KiB/DiffFilter0-4      1.95         2.29         1.17x
BenchmarkBytes/64KiB/DiffFilter1-4      1.24         1.56         1.26x
BenchmarkBytes/64KiB/DiffFilter2-4      0.96         1.14         1.19x
BenchmarkBytes/64KiB/DiffFilter3-4      0.82         0.97         1.18x
BenchmarkBytes/64KiB/DiffFilter4-4      0.69         0.79         1.14x
BenchmarkBytes/64KiB/DiffFilter5-4      0.63         0.70         1.11x
BenchmarkBytes/1MiB/EqualFilter0-4      2.14         2.40         1.12x
BenchmarkBytes/1MiB/EqualFilter1-4      1.36         1.63         1.20x
BenchmarkBytes/1MiB/EqualFilter2-4      1.05         1.12         1.07x
BenchmarkBytes/1MiB/EqualFilter3-4      0.84         0.81         0.96x
BenchmarkBytes/1MiB/EqualFilter4-4      0.70         0.72         1.03x
BenchmarkBytes/1MiB/EqualFilter5-4      0.63         0.64         1.02x
BenchmarkBytes/1MiB/DiffFilter0-4       1.85         2.07         1.12x
BenchmarkBytes/1MiB/DiffFilter1-4       1.26         1.49         1.18x
BenchmarkBytes/1MiB/DiffFilter2-4       0.95         1.00         1.05x
BenchmarkBytes/1MiB/DiffFilter3-4       0.84         0.88         1.05x
BenchmarkBytes/1MiB/DiffFilter4-4       0.71         0.72         1.01x
BenchmarkBytes/1MiB/DiffFilter5-4       0.62         0.67         1.08x
BenchmarkBytes/16MiB/EqualFilter0-4     2.02         2.16         1.07x
BenchmarkBytes/16MiB/EqualFilter1-4     1.40         1.56         1.11x
BenchmarkBytes/16MiB/EqualFilter2-4     1.09         1.06         0.97x
BenchmarkBytes/16MiB/EqualFilter3-4     0.89         0.88         0.99x
BenchmarkBytes/16MiB/EqualFilter4-4     0.75         0.76         1.01x
BenchmarkBytes/16MiB/EqualFilter5-4     0.64         0.64         1.00x
BenchmarkBytes/16MiB/DiffFilter0-4      1.65         2.15         1.30x
BenchmarkBytes/16MiB/DiffFilter1-4      1.24         1.35         1.09x
BenchmarkBytes/16MiB/DiffFilter2-4      1.04         0.97         0.93x
BenchmarkBytes/16MiB/DiffFilter3-4      0.86         0.87         1.01x
BenchmarkBytes/16MiB/DiffFilter4-4      0.69         0.75         1.09x
BenchmarkBytes/16MiB/DiffFilter5-4      0.66         0.65         0.98x

benchmark                               old allocs     new allocs     delta
BenchmarkBytes/4KiB/EqualFilter0-4      8276           83             -99.00%
BenchmarkBytes/4KiB/EqualFilter1-4      16471          84             -99.49%
BenchmarkBytes/4KiB/EqualFilter2-4      16472          85             -99.48%
BenchmarkBytes/4KiB/EqualFilter3-4      16473          85             -99.48%
BenchmarkBytes/4KiB/EqualFilter4-4      16473          86             -99.48%
BenchmarkBytes/4KiB/EqualFilter5-4      16474          86             -99.48%
BenchmarkBytes/4KiB/DiffFilter0-4       8305           111            -98.66%
BenchmarkBytes/4KiB/DiffFilter1-4       16501          112            -99.32%
BenchmarkBytes/4KiB/DiffFilter2-4       16502          113            -99.32%
BenchmarkBytes/4KiB/DiffFilter3-4       16503          113            -99.32%
BenchmarkBytes/4KiB/DiffFilter4-4       16503          114            -99.31%
BenchmarkBytes/4KiB/DiffFilter5-4       16504          114            -99.31%
BenchmarkBytes/64KiB/EqualFilter0-4     131206         133            -99.90%
BenchmarkBytes/64KiB/EqualFilter1-4     262281         134            -99.95%
BenchmarkBytes/64KiB/EqualFilter2-4     262282         135            -99.95%
BenchmarkBytes/64KiB/EqualFilter3-4     262283         135            -99.95%
BenchmarkBytes/64KiB/EqualFilter4-4     262283         136            -99.95%
BenchmarkBytes/64KiB/EqualFilter5-4     262284         137            -99.95%
BenchmarkBytes/64KiB/DiffFilter0-4      131244         171            -99.87%
BenchmarkBytes/64KiB/DiffFilter1-4      262319         172            -99.93%
BenchmarkBytes/64KiB/DiffFilter2-4      262320         173            -99.93%
BenchmarkBytes/64KiB/DiffFilter3-4      262321         173            -99.93%
BenchmarkBytes/64KiB/DiffFilter4-4      262321         174            -99.93%
BenchmarkBytes/64KiB/DiffFilter5-4      262322         174            -99.93%
BenchmarkBytes/1MiB/EqualFilter0-4      2097346        192            -99.99%
BenchmarkBytes/1MiB/EqualFilter1-4      4194502        193            -100.00%
BenchmarkBytes/1MiB/EqualFilter2-4      4194504        196            -100.00%
BenchmarkBytes/1MiB/EqualFilter3-4      4194506        194            -100.00%
BenchmarkBytes/1MiB/EqualFilter4-4      4194502        198            -100.00%
BenchmarkBytes/1MiB/EqualFilter5-4      4194506        196            -100.00%
BenchmarkBytes/1MiB/DiffFilter0-4       2097384        229            -99.99%
BenchmarkBytes/1MiB/DiffFilter1-4       4194537        231            -99.99%
BenchmarkBytes/1MiB/DiffFilter2-4       4194539        233            -99.99%
BenchmarkBytes/1MiB/DiffFilter3-4       4194539        234            -99.99%
BenchmarkBytes/1MiB/DiffFilter4-4       4194544        233            -99.99%
BenchmarkBytes/1MiB/DiffFilter5-4       4194540        233            -99.99%
BenchmarkBytes/16MiB/EqualFilter0-4     33554689       256            -100.00%
BenchmarkBytes/16MiB/EqualFilter1-4     67109123       255            -100.00%
BenchmarkBytes/16MiB/EqualFilter2-4     67109124       257            -100.00%
BenchmarkBytes/16MiB/EqualFilter3-4     67109129       257            -100.00%
BenchmarkBytes/16MiB/EqualFilter4-4     67109124       258            -100.00%
BenchmarkBytes/16MiB/EqualFilter5-4     67109126       257            -100.00%
BenchmarkBytes/16MiB/DiffFilter0-4      33554724       292            -100.00%
BenchmarkBytes/16MiB/DiffFilter1-4      67109163       293            -100.00%
BenchmarkBytes/16MiB/DiffFilter2-4      67109160       294            -100.00%
BenchmarkBytes/16MiB/DiffFilter3-4      67109161       294            -100.00%
BenchmarkBytes/16MiB/DiffFilter4-4      67109163       295            -100.00%
BenchmarkBytes/16MiB/DiffFilter5-4      67109162       299            -100.00%

benchmark                               old bytes      new bytes      delta
BenchmarkBytes/4KiB/EqualFilter0-4      628702         366504         -41.70%
BenchmarkBytes/4KiB/EqualFilter1-4      890920         366536         -58.86%
BenchmarkBytes/4KiB/EqualFilter2-4      890952         366601         -58.85%
BenchmarkBytes/4KiB/EqualFilter3-4      891016         366602         -58.86%
BenchmarkBytes/4KiB/EqualFilter4-4      891016         366730         -58.84%
BenchmarkBytes/4KiB/EqualFilter5-4      891144         366728         -58.85%
BenchmarkBytes/4KiB/DiffFilter0-4       629425         367149         -41.67%
BenchmarkBytes/4KiB/DiffFilter1-4       891738         367180         -58.82%
BenchmarkBytes/4KiB/DiffFilter2-4       891770         367244         -58.82%
BenchmarkBytes/4KiB/DiffFilter3-4       891834         367245         -58.82%
BenchmarkBytes/4KiB/DiffFilter4-4       891835         367372         -58.81%
BenchmarkBytes/4KiB/DiffFilter5-4       891965         367372         -58.81%
BenchmarkBytes/64KiB/EqualFilter0-4     10826734       6632425        -38.74%
BenchmarkBytes/64KiB/EqualFilter1-4     15021123       6632443        -55.85%
BenchmarkBytes/64KiB/EqualFilter2-4     15021174       6632502        -55.85%
BenchmarkBytes/64KiB/EqualFilter3-4     15021219       6632540        -55.85%
BenchmarkBytes/64KiB/EqualFilter4-4     15021219       6632640        -55.84%
BenchmarkBytes/64KiB/EqualFilter5-4     15021340       6632707        -55.84%
BenchmarkBytes/64KiB/DiffFilter0-4      10828312       6633995        -38.73%
BenchmarkBytes/64KiB/DiffFilter1-4      15022724       6634049        -55.84%
BenchmarkBytes/64KiB/DiffFilter2-4      15022763       6634091        -55.84%
BenchmarkBytes/64KiB/DiffFilter3-4      15022820       6634081        -55.84%
BenchmarkBytes/64KiB/DiffFilter4-4      15022843       6634241        -55.84%
BenchmarkBytes/64KiB/DiffFilter5-4      15022996       6634222        -55.84%
BenchmarkBytes/1MiB/EqualFilter0-4      173184064      106075104      -38.75%
BenchmarkBytes/1MiB/EqualFilter1-4      240293104      106075088      -55.86%
BenchmarkBytes/1MiB/EqualFilter2-4      240293232      106075344      -55.86%
BenchmarkBytes/1MiB/EqualFilter3-4      240293328      106075152      -55.86%
BenchmarkBytes/1MiB/EqualFilter4-4      240293008      106075504      -55.86%
BenchmarkBytes/1MiB/EqualFilter5-4      240293424      106075376      -55.86%
BenchmarkBytes/1MiB/DiffFilter0-4       173185752      106076648      -38.75%
BenchmarkBytes/1MiB/DiffFilter1-4       240294504      106076776      -55.86%
BenchmarkBytes/1MiB/DiffFilter2-4       240294600      106076936      -55.86%
BenchmarkBytes/1MiB/DiffFilter3-4       240294600      106076936      -55.86%
BenchmarkBytes/1MiB/DiffFilter4-4       240295016      106076968      -55.86%
BenchmarkBytes/1MiB/DiffFilter5-4       240294728      106076968      -55.86%
BenchmarkBytes/16MiB/EqualFilter0-4     2967917664     1894175792     -36.18%
BenchmarkBytes/16MiB/EqualFilter1-4     4041659536     1894175696     -53.13%
BenchmarkBytes/16MiB/EqualFilter2-4     4041659568     1894175856     -53.13%
BenchmarkBytes/16MiB/EqualFilter3-4     4041660016     1894175856     -53.13%
BenchmarkBytes/16MiB/EqualFilter4-4     4041659536     1894175984     -53.13%
BenchmarkBytes/16MiB/EqualFilter5-4     4041659760     1894175888     -53.13%
BenchmarkBytes/16MiB/DiffFilter0-4      2967919128     1894177352     -36.18%
BenchmarkBytes/16MiB/DiffFilter1-4      4041661320     1894177352     -53.13%
BenchmarkBytes/16MiB/DiffFilter2-4      4041661064     1894177448     -53.13%
BenchmarkBytes/16MiB/DiffFilter3-4      4041661128     1894177448     -53.13%
BenchmarkBytes/16MiB/DiffFilter4-4      4041661256     1894177576     -53.13%
BenchmarkBytes/16MiB/DiffFilter5-4      4041661256     1894177832     -53.13%
…ted options (#115)

A common scenario is:
  1. someone uses cmp.Diff on big.Int (transitively)
  2. they get a message that says uses Ignore/Allow Unexported options
  3. they see that AllowUnexported is hard to use correctly
  4. they use IgnoreUnexported
They end up with:
  cmpopts.IgnoreUnexported(big.Int{})
Which definitely doesn't do what's intended.

If we point out that a custom comparer is what they most likely need, then they are more likely to use cmp correctly
The previous implementation of the reporter simply listed all differences,
each qualified by the full path to the difference.
This method of reporting is exact, but difficult for humans to parse.
It is one of the more common sources of complaints by users and a significant
reason why cmp is not preferred over competing libraries.

This change reimplements the reporter to format the output as a
structured literal in pseudo-Go syntax. The output resembles literals that
the user would likely have in their test code. Differences between the
x and y values are denoted by a '-' or '+' prefix at the start of the line.

An overview of the new implementation is as follows:

* report.go: The defaultReporter type implements the Reporter interface.

* report_value: Through the PushStep/PopStep API, the defaultReporter is able
to contruct an in-memory valueNode tree representing the comparison of
x and y as cmp.Equal walks the sub-values.

* report_compare.go: After report_value.go constructs an AST-representation
of the compared values, report_compare.go formats the valueNode tree as a
textNode tree, which is the textual output in a tree form.
Some relevant design decisions include:
	* The format logic goes through effort to avoid printing ignored nodes.
	* Some number of surrounding equal (but not ignored) struct fields,
	slice elements, or map entries are printed for context.
	* cmp.Equal may declare two sub-reflect.Values to be equal, but are
	different values when printed. In order to present a unified view on
	this "equal" node, the logic formats both values and arbitrarily choses
	the one with the shorter string.
	* Transformed nodes are formatted with the pseudo-Go syntax of:
		Inverse(TransformerName, OutputType{...})
	where Inverse is some magical pseudo-function that inverts the
	transformation referred to by TransformerName. The OutputType literal
	is the output of the transformation.

* report_reflect.go: This contains logic to pretty-print reflect.Values and
is relied upon by report_compare.go to format the leaves of the tree.
Note that the leaves of the tree can be any arbitrary Go type and value
(including cyclic data structures).

* report_text.go: This contains logic for purely lexicographical formatting
and is depended upon by the other report_*.go files.

Advantages:
* The output is more familiar as it uses pseudo-Go syntax for literals
* It provides context about surrounding struct fields, slice elements, or
map entries that were equal
* Inserted and removed elements in a slice are easier to visualize
* Related diffs lie on the same indentation
* For diffs in a deeply nested value, the output is easier to visualize
than having a list of all the full paths to the diff.

Disadvantages:
* The implementation is drastically more complex.
* In most cases, the output is longer (though more sparse)
The Reporter option allows users to hook in their own custom reporters
to programatically interpret the diff structure.

The Reporter API uses a push/pop mechanism, which is strictly more powerful
than an API that only calls report on leaf nodes.
The Reporter.Report method takes in a Result type to provide flexibility in
what properties can be reported in the future since new properties can be added,
but new methods cannot be easily added to the reporter interface.
These helper options ignore slice elements or map entries based
on a user-provided predicate function. These are especially useful
for ignoring missing elements or entries.
Rather than representing each path step as an interface,
use concrete types instead. This provides some performance benefits as
it reduces the amount of virtual function calls and also provides the
ability for the compiler to inline method calls.

This is technically a breaking change, but since each of the path step
interfaces were explicitly implemented in such a way that they couldn't
be implemented directly (due to the presence of an unexported method),
the only way someone could have been depending on these as interfaces is
if they embedded the interface into another interface. Static analysis of
all code at Google and publicly available on GitHub shows that this is
not a problem.

The performance benefits of this change is significant:

benchmark                               old ns/op     new ns/op     delta
BenchmarkBytes/64KiB/EqualFilter0-4     80551394      46592605      -42.16%
BenchmarkBytes/64KiB/EqualFilter1-4     102922132     69974509      -32.01%
BenchmarkBytes/64KiB/EqualFilter2-4     159009935     94474812      -40.59%
BenchmarkBytes/64KiB/EqualFilter3-4     181231264     124601102     -31.25%
BenchmarkBytes/64KiB/EqualFilter4-4     189775228     148864070     -21.56%
BenchmarkBytes/64KiB/EqualFilter5-4     285065469     175198907     -38.54%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkBytes/64KiB/EqualFilter0-4     1.63         2.81         1.72x
BenchmarkBytes/64KiB/EqualFilter1-4     1.27         1.87         1.47x
BenchmarkBytes/64KiB/EqualFilter2-4     0.82         1.39         1.70x
BenchmarkBytes/64KiB/EqualFilter3-4     0.72         1.05         1.46x
BenchmarkBytes/64KiB/EqualFilter4-4     0.69         0.88         1.28x
BenchmarkBytes/64KiB/EqualFilter5-4     0.46         0.75         1.63x

benchmark                               old allocs     new allocs     delta
BenchmarkBytes/64KiB/EqualFilter0-4     133            134            +0.75%
BenchmarkBytes/64KiB/EqualFilter1-4     134            134            +0.00%
BenchmarkBytes/64KiB/EqualFilter2-4     135            135            +0.00%
BenchmarkBytes/64KiB/EqualFilter3-4     135            135            +0.00%
BenchmarkBytes/64KiB/EqualFilter4-4     136            136            +0.00%
BenchmarkBytes/64KiB/EqualFilter5-4     136            136            +0.00%

benchmark                               old bytes     new bytes     delta
BenchmarkBytes/64KiB/EqualFilter0-4     6632417       6632523       +0.00%
BenchmarkBytes/64KiB/EqualFilter1-4     6632416       6632464       +0.00%
BenchmarkBytes/64KiB/EqualFilter2-4     6632464       6632507       +0.00%
BenchmarkBytes/64KiB/EqualFilter3-4     6632502       6632483       -0.00%
BenchmarkBytes/64KiB/EqualFilter4-4     6632652       6632668       +0.00%
BenchmarkBytes/64KiB/EqualFilter5-4     6632604       6632659       +0.00%
Lists of primitives are a common-enough data structure that
it is worth providing specialized diffing for.

This provides significantly better readability for strings
and byte slices. There is also a heuristic for detecting
what a string should be diffed as a multiline string.
This reverts a change introduced in commit 2940eda where cmp 
stopped calling the String method when printing map keys.

Fixes #141
I assume this is a typo, as the left and right side of the boolean-or were
identical.
- No need to wrap ToUpper & Fields
- Simpler concatenation
- Easier pointer access
Previously, the line pointlessly updated the lastLineIdx variable
since the subsequent line always assigned to it.
Change it to update maxLineLen, which was the original intention.
* Fix IsZero to properly report false for IsZero(-0.0) since
we define IsZero as whether it is equal to the zero memory value.
* Add note to isLess that we don't need to handle -0.0 since
we can't possibly have both keys present in the same map.
* Use sort.SliceStable in SortedKeys for deterministic output since
it is possible to have both -0.0 and +0.0 from two different maps.
The zero key from the left left map will be taken over the right map.
In the panic message when accessing an unexported field,
print the full name of the type for user convenience.
Add an Exporter option that accepts a function to determine
which struct types to permit access to unexported fields.
Treat this as a first-class option and implement AllowUnexported
in terms of the new Exporter option.

The new Exporter option:
* Better matches the existing style of top-level options
both by name (e.g., Comparer, Transformer, and Reporter)
and by API style (all accept a function).
* Is more flexible as it enables users to functionally
implement AllowAllUnexported by simply doing:
	Exporter(func(reflect.Type) bool { return true })

Fixes #40
Adjust coding style of EquateApproxTime to match EquateApprox.
The EquateErrors helper equates errors according to errors.Is.
We also declare a sentinel AnyError value that matches any
non-nil error value.

This adds a dependency on golang.org/x/xerrors so that we can
continue to suppport go1.8, which is our current minimally
supported version of Go.

Fixes #89
Previously, trying to call Equal on a graph would result in a stack-overflow
due to infinite recursion traversing cycles on a graph.
While a vast majority of Go values are trees or acyclic graphs, there exist
a small number of cases where graph equality is required.

As such, we add cycle detection to Equal and define what it means for two graphs
to be equal. Contrary to reflect.DeepEqual, which declares two graphs to be
equal so long any cycle were encountered, we require two graphs to have
equivalent graph structures.

Mathematically speaking, a graph G is a tuple (V, E) consisting of the set of
vertices and edges in that graph. Graphs G1 and G2 are equal if V1 == V2,
E1 == E2, and both have the same root vertex (entry point into the graph).
When traversing G1 and G2, we remember a stack of previously visited edges
ES1 and ES2. If the current edge e1 is in ES1 or e2 is in ES2, then we know that
a cycle exists. The graphs have the same structure when the previously
encountered edge ep1 and ep2 were encountered together.
Note that edges and vertices unreachable from the root vertex are ignored.

Appreciation goes to Eyal Posener (@posener), who proposed a different
(but semantically equivalent) approach in #79, which served as inspiration.

Fixes #74
The color of the shield banner is "Gopher Blue" as defined by
brand book referenced by the Go blog. See https://blog.golang.org/go-brand
dsnet and others added 27 commits May 24, 2021 17:33
Even with an optimal diffing algoritm, coalescing adjacent edit groups
may cause the corresponding pair of strings for an edit group to have
leading or trailing spans of equal elements.

While this is technically a correct representation of a diff,
it is a suboptimal outcome. As such, scan through all unequal groups and
move leading/trailing equal spans to the preceding/succeeding equal group.

Before this change:
    strings.Join({
      "org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=aa",
  -   ",#=_value _value=2 ",
  +   " _value=2 ",
      `11 org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=bb`,
  -   ",#=_value _value=2 2",
  +   " _value=2 2",
      `1  org-4747474747474747,bucket-4242424242424242:m,tag1=b,tag2=cc`,
  -   ",#=_value",
      ` _value=1 21 org-4747474747474747,bucket-4242424242424242:m,tag1`,
      "=a,tag2",
  -   "=dd,#=_value",
  +   "=dd",
      ` _value=3 31 org-4747474747474747,bucket-4242424242424242:m,tag1`,
  -   "=c,#=_value",
  +   "=c",
      ` _value=4 41 `,
    }, "")

After this change:
    strings.Join({
      "org-4747474747474747,bucket-4242424242424242:m,tag1=a,tag2=aa",
  -   ",#=_value",
      ` _value=2 11 org-4747474747474747,bucket-4242424242424242:m,tag1`,
      "=a,tag2=bb",
  -   ",#=_value",
      ` _value=2 21 org-4747474747474747,bucket-4242424242424242:m,tag1`,
      "=b,tag2=cc",
  -   ",#=_value",
      ` _value=1 21 org-4747474747474747,bucket-4242424242424242:m,tag1`,
      "=a,tag2=dd",
  -   ",#=_value",
      ` _value=3 31 org-4747474747474747,bucket-4242424242424242:m,tag1`,
      "=c",
  -   ",#=_value",
      ` _value=4 41 `,
    }, "")
Avoid diffing by lines if it turns out to be significantly less
efficient than diffing by bytes.

Before this change:
    (
    	"""
  - 	d5c14bdf6bac81c27afc5429500ed750
  - 	25483503b557c606dad4f144d27ae10b
  - 	90bdbcdbb6ea7156068e3dcfb7459244
  - 	978f480a6e3cced51e297fbff9a506b7
  + 	Xd5c14bdf6bac81c27afc5429500ed750
  + 	X25483503b557c606dad4f144d27ae10b
  + 	X90bdbcdbb6ea7156068e3dcfb7459244
  + 	X978f480a6e3cced51e297fbff9a506b7
    	"""
    )

After this change:
    strings.Join({
  + 	"X",
    	"d5c14bdf6bac81c27afc5429500ed750\n",
  + 	"X",
    	"25483503b557c606dad4f144d27ae10b\n",
  + 	"X",
    	"90bdbcdbb6ea7156068e3dcfb7459244\n",
  + 	"X",
    	"978f480a6e3cced51e297fbff9a506b7\n",
    }, "")
The previous heuristic of treating strings as binary data
if it contains any invalid UTF-8 was too strict.
Loosen the heuristic to check if most of the characters
are printable text.

Fixes #257
Address some minor issues flagged by staticcheck.
None of these affect the correctness of the package.
Rename the shadowed variable i to j for better readability.
There are two bugs being fixed:

1.  The hueristic for whether a slice of byte looks like text
    should check whether a rune IsPrint OR IsSpace, and not both.
    Only a single rune (i.e., U+0020) ever satisfies both conditions.

    Previously, it would print as:
        MyBytes{0x68, 0x65, 0x6c, 0x6c, 0x6f}
    and now it would now print as:
        MyBytes(MyBytes("hello"))

2.  If we're printing as string, then we should set skipType=true
    since we already explicitly format the value with the type.

    Previously, it would print as:
        MyBytes(MyBytes("hello"))
    and now it would now print as:
        MyBytes("hello")
Fix textual printing of byte slices
Some aggressive dependency checks flag the use of md5.
Switch to sha256 as it accomplishes the same purpose.
Co-authored-by: Damien Neil <neild@users.noreply.github.com>
The original threshold of 64 was chosen without much thought.
Lower it to 32 now that we have some concrete examples
that it is aesthetically better.

Co-authored-by: Damien Neil <neild@users.noreply.github.com>
Now that Go 1.11 is the minimally supported version,
we can drop some local hacks to work around bugs in
reflect that were present in Go1.9.
Starting with Go 1.17, //go:build lines are preferred over // +build
lines, see https://golang.org/doc/go1.17#build-lines and
https://golang.org/design/draft-gobuild for details.

This change was generated by running Go 1.17 go fmt ./... which
automatically adds //go:build lines based on the existing // +build
lines.

Also update the corresponding GitHub action to use Go 1.17 gofmt.
When printing a pointer, only elide the type for unnamed pointers.
Otherwise, we can run into situations where named and unnamed pointers
format the same way in indistinguishable ways.

When printing an interview, never skip the interface type.
Whether we skip printing the type should be determined by the
parent containers, and not locally determined.
For examples, interface values within a struct, slice, or map
will always be elided since they can be inferred.
If a slice of bytes is mostly text, format them as text
instead of as []byte literal with hexadecimal digits.

Avoid always printing the type. This is technically invalid Go code,
but is unnecessary in many cases since the type is inferred
from the parent concrete type.

Fixes #272
Versions older than Go 1.13 are no longer in use. Remove unnecessary
dependencies.
Now that Go 1.13 is the minimum version, we can use the reflect.Value.IsZero
method instead of our own internal/value.IsZero function.
Interestingly, our IsZero function pre-dates the IsZero method,
but fortunately has the exact same semantics, since both are targetting
semantics defined by the Go language specification.
This allows the GoDoc to take advantage of new markup syntax
introduced in Go 1.19. This does not require that our minimum
supported version be bumped to Go 1.19 since the pkgsite renders
our godoc regardless of supported Go version.
Co-authored-by: Damien Neil <neild@users.noreply.github.com>
The value.TypeString function is what the rest of the package uses
and is slightly cleaner than using reflect.Type.String.

Updates #305

Co-authored-by: Damien Neil <neild@users.noreply.github.com>
* Run tests on Go 1.19

* Format comment

Finish the rest of the work for #304

Co-authored-by: Damien Neil <neild@users.noreply.github.com>
If the string has many characters that require escape sequences to print,
then we need to take that into consideration and avoid byte-by-byte diffing.

Co-authored-by: Damien Neil <neild@users.noreply.github.com>
@google-cla
Copy link

google-cla bot commented Nov 15, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@dsnet
Copy link
Collaborator

dsnet commented Feb 22, 2023

I'm not sure what this is. Closing as it looks like an accidental PR. Feel free to send another cleaned up PR if this was intended to be a legitimate change.

@dsnet dsnet closed this Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet