Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cksum: Reverse-engineering implicit behavior of text/binary tag/untagged #6364

Open
BenWiederhake opened this issue May 6, 2024 · 1 comment

Comments

@BenWiederhake
Copy link
Collaborator

BenWiederhake commented May 6, 2024

cksum has some really weird and funky implicit tags going on, see #6256

So let's figure out what exactly cksum is doing.

$ ../gnu/src/cksum -a md5 --tag README.md # This is the tagged format:
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b
$ ../gnu/src/cksum -a md5 README.md # tagged by default:
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b
$ ../gnu/src/cksum -a md5 --text README.md # tagged+text is a problem:
../gnu/src/cksum: --text mode is only supported with --untagged
Try '../gnu/src/cksum --help' for more information.
[$? = 1]
$ ../gnu/src/cksum -a md5 --text --tag README.md # tagged+text is not a problem?!
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b

So yes, something funny is going on. Let's just brute-force all possible 1024 + 256 + 64 + 16 + 4 + 1 combinations of zero to five arguments (--binary, --text, --tag, --untagged), and visualize the behavior as a graph:
general_nondet_graph

(legend: edges are marked b/t/T/U for binary/text/Tag/Untagged, and vertices are the observed behavior: E/T/A/S for Error/Tagged/UntaggedSpace/UntaggedAsterisk)

First, observe that -b/-t seems to be doing precisely what we would hope for: toggle between binary/text mode. Good!

Next, observe that --tag/--untagged seems to be the flags that have the weird behavior attached to them. In particular, the T state seems to be more that one actual state, probably differentiated along the "text-binary-axis".

Removing --untagged from the brute-force search reveals that --tag always pulls the state in the binary direction:
nountagged_nondet_graph

Removing --binary from the brute-force search reveals that --untagged always pulls the state away from E (so a binary-ish direction), but A is unreachable ("Asterisk", which indicated a binary file in the untagged format):
nobinary_nondet_graph

Hypothesis: There are three steps along the "text-binary-axis": always-binary, always-text, and binary-ish. For simplicity, let's assume the same thing along the tagged-ness-axis.

By the previous observations, --tagged implies either always-binary or binary-ish. (Probably "binary-ish".)

Ending in bU does not determine the result:

  • bU outputs A
  • TbU outputs S
  • UbU outputs A
  • bUbU outputs A
  • TUbU outputs A
  • UTbU outputs S
  • Therefore, U does not set the binary-ness to a constant, but rather depends on the tagged-ness. Huh?
  • Assuming that we start with "tagged-ish" and T/U set "always-tagged/always-untagged", this means that "tagged-ish" and "always-untagged" do not interfere with the binary-ness, but in the "always-tagged" state it sets "binary-ish". What a surprising decision! (It probably made sense at the time it was written, and is probably also why it is no longer listed in --help.)

… and that finally predicts the correct behavior without any exceptions, hooray!

A simple piece of logic, but so much pain.

End result: https://github.com/BenWiederhake/worsethanfailure_cksum/blob/master/check_model.py#L19

@sylvestre
Copy link
Sponsor Contributor

ah fun, exactly what I was working on yesterday. :)
you will make life significantly easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants