Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize docker image #459

Open
mojavelinux opened this issue May 17, 2019 · 7 comments
Open

Optimize docker image #459

mojavelinux opened this issue May 17, 2019 · 7 comments

Comments

@mojavelinux
Copy link
Contributor

The docker image that's being published for this repository is severely fragmented. As a result, it takes a much longer time to download than it should and consumes a lot of extra disk space. I count nearly 30 layers, which occupies close to 2GB. It really should be just one layer. Please optimize the image by consolidating the RUN commands or by using the --squash flag when building.

Also, what's going on with the versions? Here are the versions I see on DockerHub:

v0.9
v1.1
v1.1.0
v1.2.0
v1.3
v1.3.1

Can we stick to a consistent pattern? And how was v0.9 published after v1.2.0?

Even more concerning, v1.3.1 is absolutely not a patch release. It changed the version of Python from 2 to 3 and added 200MB to the size of the image. Can we follow semantic versioning?

I expect better software practices from Algolia than this. Let's work towards that.

@s-pace
Copy link
Contributor

s-pace commented Jun 10, 2019

Hi @mojavelinux
Thank you for pointing out the opportunity to optimize the image building! We will add the --squash flash during the building process.

Versioning and release process are definitely something deserving better attention on this project. So far, our main focus is on offering DocSearch configurations and integrations to communities, so that they can benefit from Algolia’s enhanced search experience on their documentation in their websites, for free.

Regarding the versioning scheme: every version should have a “patch” value - even set to 0 - some of them were wrongly trimmed off. Several third-party users rely on them, so we cannot remove them. Sorry for the inconvenience.

About the version 0.9: yes, it was published after 1.2.0, because we wanted to allow users to use the first POC made.

The version 1.3.1 does not change the way the scraper explores the sites, nor does it change the format of the configurations. Arguably, switching to Python 3 could have deserved more than a patch number bump. Please accept our apologies if it broke your custom integration of the scraper.

Thank you again for your candid feedback!

@mojavelinux
Copy link
Contributor Author

mojavelinux commented Jun 12, 2019

We will add the --squash flash during the building process.

Excellent!

our main focus is on offering DocSearch configurations and integrations to communities

With all due respect, that's not a good excuse for not following good software practices, versioning in particular. You're a very high profile company in tech and it's important that your practices demonstrate the quality of your work. Plus, it's a good way to be a role model.

I'm not going to dwell on past releases. What's done is done. The important thing is that versions are consistent from here on out.

Arguably, switching to Python 3 could have deserved more than a patch number bump.

It's not arguably. Those are the rules of semantic versioning. It doesn't matter how much change there is. A breaking change warrants a major version bump (or, at the absolutely minimum, a minor version bump). It's well understood that changing the major version of the runtime constitutes a breaking change.

We can't and shouldn't rationalize about semantic versioning. There are rules and those rules need to be followed or else you aren't doing semantic versioning. And if you aren't doing semantic versioning, you need to be clear about that, because that's the assumption the industry makes (when the version scheme is X.Y.Z). (Otherwise, you break a lot of builds).

Please accept our apologies if it broke your custom integration of the scraper.

No need to apologize as long as I have convinced you to commit to semantic versioning so that upgrades can be smooth and transparent in the future. Trust me when I say it will make everyone happy (and that's hard to do these days).

@s-pace
Copy link
Contributor

s-pace commented Aug 27, 2019

Quick followup regarding the --squash flag.

It seems that it is still an experimental feature. We will wait for this feature to be stable before to use it.
image

@mojavelinux
Copy link
Contributor Author

I have since discovered the same limitation. It sounds so promising.

One way to cut down on the layers is to consolidate some of the commands. So instead of a lot of RUN commands, you combine the commands together using &&.

For example, this:

RUN useradd -d /home/seleuser -m seleuser
RUN chown -R seleuser /home/seleuser
RUN chgrp -R seleuser /home/seleuser

Would become:

RUN useradd -d /home/seleuser -m seleuser && \
    chown -R seleuser /home/seleuser && \
    chgrp -R seleuser /home/seleuser

Several of the other commands are already like that, so I recommend reducing where it makes sense.

@JOHNMDAY-CREATE
Copy link

Yeah I'm just with what I got broken phone with a mind of owns sorry about that. And thanks input will try to utilize what all of you have told me.
Again thanks input

@JOHNMDAY-CREATE
Copy link

And do mind what very high profile company that is. Maybe thats why they told to come here

@JOHNMDAY-CREATE
Copy link

Yeah I'm working on just give time .even though I don't company I own I still the vision that I had when we started.. and I know sounds crazy that I don't know whatcompany I started but I started like 14 I though I ran them into the ground

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants