Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make packages reproducible #543

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jodersky
Copy link
Contributor

@jodersky jodersky commented Jul 3, 2017

Don't include time information in jars in order to make them
reproducible.

This allows builds to be verified with hash sums.

Also see https://reproducible-builds.org/

@cvogt
Copy link
Owner

cvogt commented Jul 3, 2017

Nice!! A comment in the code linking to that URL may be nice. Will do it later when I am at a computer or if you want to be quicker ;).

@jodersky
Copy link
Contributor Author

jodersky commented Jul 3, 2017

Will do! Since this is something that can easily be forgotten, I was also thinking of adding a test to CI to make sure that packages are reproducible.

@cvogt
Copy link
Owner

cvogt commented Jul 3, 2017 via email

Don't include time information in jars in order to make them
reproducible.

This allows builds to be verified with hash sums.

Also see https://reproducible-builds.org/
@jodersky
Copy link
Contributor Author

jodersky commented Jul 4, 2017

Added a comment about the website and a test

override def run = {
val pass1 = super.`package`.map(f => hash(f.toPath))
lib.clean(cleanFiles, true, false, false, false)
transientCache.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache invalidation is required here, or else the second pass will fail due to missing files. Should this be moved to the clean helper method in lib?

Also, I'm not sure if I have the same concept of what a clean operation should do, but IMO there should not be the need to prompt the user for a confirmation since we're only deleting generated files.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would probably be safer if we don't try to clear caches from within a single cbt run.

What if in test.scala we call cbt package, then read the info into memory. Call cbt clean, then cbt package, check that the file modified date is different but the jar meta data is still the same.

clean requiring confirmation is less about not trusting the user having called the right command, more about not trusting cbt as a fast moving code base to not silently break behavior here at some point. Once we get to the point where we have solid, in-memory filesystem test for this, we can remove the check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cvogt, should the checksum be generated and checked in the master test file then?

@jodersky
Copy link
Contributor Author

jodersky commented Jul 4, 2017

Linking to #502 as it is related

@jvican
Copy link

jvican commented Jul 28, 2017

Have you tested this does not affect incremental compilation @jodersky ?

@jodersky
Copy link
Contributor Author

jodersky commented Jul 28, 2017

Sorry for not addressing your feedback earlier, I'll try to get some time this weekend.

Regarding the incremental compilation, I haven't tested this, but I don't think that the compiler would change timestamps within packaged jars. If I install a pre-packaged jar in a read-only location (/usr/share/...), would that mean that incremental compilation of depending sources would break?

@jvican
Copy link

jvican commented Jul 28, 2017

@jodersky To the best of my understanding, the JDK does indeed cache jars using that timestamp information. What I don't actually know is which information it uses. It might be the metadata of the whole jar, or of every jar entry.

@cvogt
Copy link
Owner

cvogt commented Jul 28, 2017 via email

@jvican
Copy link

jvican commented Jul 28, 2017

Classloading in general. Not saying it is an issue. It may be. scala/bug#10295

@jodersky
Copy link
Contributor Author

jodersky commented Jul 28, 2017

Hmm, I'm not quite sure how incremental compilation works here. Does it mean that to check if a source needs recompilation, it checks the last modification date of its dependencies and compares it with a local cache? In case it doesn't use any checksum it should be fairly easy to spoof; I'll give it a shot.

The changes proposed here do similar things as this maven plugin https://zlika.github.io/reproducible-build-maven-plugin/ (and also what happens when building a java debian package)

@cvogt
Copy link
Owner

cvogt commented Jul 28, 2017 via email

@jodersky
Copy link
Contributor Author

jodersky commented Jul 28, 2017

That's my understanding too, at least from briefly going over the linked issue.

In that case, how big of deal would this be? Usually, local dependencies use classfiles directly rather than the ones packaged in a jar (this is the export setting in sbt IIRC)

@jvican
Copy link

jvican commented Jul 28, 2017

It doesn't look like something related to Scalac only:

My analysis of the internal caching in java.util.File was incorrect. Even prior to Java 9, the last modified timestamp is used as part of the cache key.

This is the JDK caching jars that are classloaded.

In any case, this could be an intricate use case that may not affect this particular change. We also wanted to experiment with this: sbt/io#58. So please let me know if you find any issue in Zinc. I'll fix it.

@jvican
Copy link

jvican commented Jul 28, 2017

Does it mean that to check if a source needs recompilation, it checks the last modification date of its dependencies and compares it with a local cache?

This doesn't happen for jars. Jars are sha1'd and then compared. See https://github.com/sbt/zinc/blob/d4d29e8ffeecfa7c6a8e834b4ee54c4bfda9af63/zinc/src/main/scala/sbt/internal/inc/MixedAnalyzingCompiler.scala#L184-L186. But please, do double-check in case I'm wrong.

@jodersky
Copy link
Contributor Author

thanks for looking into this @jvican. I assume we can move forward with this change then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants