Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: ignore some files during npm publish #6092

Merged
merged 13 commits into from Dec 17, 2021
Merged

misc: ignore some files during npm publish #6092

merged 13 commits into from Dec 17, 2021

Conversation

Josh-Cena
Copy link
Collaborator

@Josh-Cena Josh-Cena commented Dec 11, 2021

Motivation

This single change should reduce bundle size by 100KB / package. We can even be more aggressive and ignore more stuff in the src folder; we just need those actually required at runtime. For now I decided to not do much, considering that the TS components are used for swizzling. The files that certainly should be ignored:

  • copyUntypedFiles.js
  • .tsbuildinfo (and this file is large: just ignoring this is able to save 50KB)
  • tsconfig
  • __tests__

Have you read the Contributing Guidelines on pull requests?

Yes

@Josh-Cena Josh-Cena added the pr: maintenance This PR does not produce any behavior differences to end users when upgrading. label Dec 11, 2021
@netlify
Copy link

netlify bot commented Dec 11, 2021

✔️ [V2]

🔨 Explore the source changes: d7a5733

🔍 Inspect the deploy log: https://app.netlify.com/sites/docusaurus-2/deploys/61bc291201e3f3000822e7a6

😎 Browse the preview: https://deploy-preview-6092--docusaurus-2.netlify.app

@github-actions
Copy link

github-actions bot commented Dec 11, 2021

⚡️ Lighthouse report for the changes in this PR:

Category Score
🟠 Performance 69
🟢 Accessibility 98
🟢 Best practices 100
🟢 SEO 100
🟢 PWA 95

Lighthouse ran on https://deploy-preview-6092--docusaurus-2.netlify.app/

@github-actions
Copy link

github-actions bot commented Dec 11, 2021

Size Change: 0 B

Total Size: 652 kB

ℹ️ View Unchanged
Filename Size
website/.docusaurus/globalData.json 37.8 kB
website/build/assets/css/styles.********.css 101 kB
website/build/assets/js/main.********.js 484 kB
website/build/index.html 29.3 kB

compressed-size-action

@Josh-Cena Josh-Cena marked this pull request as ready for review December 11, 2021 04:40
@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Dec 11, 2021
@Josh-Cena
Copy link
Collaborator Author

Josh-Cena commented Dec 11, 2021

Making another action to monitor the package size probably doesn't make a lot of sense because we are a node app anyways and package sizes aren't critical, but I still did two local publishes to compare the bundle sizes:

image

For future references, here are the scripts:

Main script
CUSTOM_REGISTRY_URL="http://localhost:4873"
NEW_VERSION="$(node -p "require('./packages/docusaurus/package.json').version").NEW"
CONTAINER_NAME="verdaccio"

publish () {
  docker run -d --rm --name "$CONTAINER_NAME" -p 4873:4873 -v "$PWD/admin/verdaccio.yaml":/verdaccio/conf/config.yaml verdaccio/verdaccio:latest
  yarn build:packages
  npx --no-install lerna publish --exact --yes --no-verify-access --no-git-reset --no-git-tag-version --no-push --registry "$CUSTOM_REGISTRY_URL" "$NEW_VERSION" > "$1" 2> "$1"
  git diff --name-only -- '*.json' | sed 's, ,\\&,g' | xargs git checkout --
  docker container stop $CONTAINER_NAME > /dev/null
}

CUR_BRANCH=$(git symbolic-ref --short HEAD)
publish publish-new.log

git checkout main
publish publish-main.log

python3 visualize.py
git checkout $CUR_BRANCH
Parse log & visualization
import re
import matplotlib.pyplot as plt
import numpy as np

def getSizes(filename):
    file = open(filename, 'r')
    lines = file.readlines()
    names = []
    unpacked = []
    for line in lines:
        if ' name: ' in line:
            names.append(re.compile('lerna notice name: *([a-z@/-]+)').findall(line)[0])
        elif 'unpacked size:' in line:
            unpacked.append(float(re.compile('lerna notice unpacked size: *([0-9.]+)').findall(line)[0]))
    sort = sorted(zip(names, unpacked))
    return [*zip(*sort)]

names, main_unpacked = getSizes('publish-main.log')
names, new_unpacked = getSizes('publish-new.log')

plt.barh(np.arange(len(names)) * 3, main_unpacked, 1, label='main')
plt.barh(np.arange(len(names)) * 3 + 1, new_unpacked, 1, label='jc/npmignore')
plt.yticks(np.arange(len(names)) * 3 + 0.5, names)
plt.xlabel('Unpacked size (kB)')
plt.legend()
plt.savefig('file.pdf', bbox_inches='tight')

Copy link
Contributor

@lex111 lex111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! Let's also ignore directory with source code in following packages. Apparently it should be perfectly safe?

docusaurus
docusaurus-mdx-loader
docusaurus-plugin-client-redirects
docusaurus-plugin-content-blog
docusaurus-plugin-content-docs
docusaurus-plugin-content-pages
docusaurus-plugin-google-analytics
docusaurus-plugin-google-gtag
docusaurus-plugin-sitemap
docusaurus-preset-classic

@Josh-Cena
Copy link
Collaborator Author

Josh-Cena commented Dec 11, 2021

Mmm, some of them contain the type declaration files, and these declarations in turn import from the src files, so refactoring them is going to be a little risky. I will err on the safe side and only do refactors later. The safer ones (utils, core, etc.) already have their source files ignored.

@Josh-Cena Josh-Cena changed the title misc: add npmignore for some files misc: ignore some files during npm publish Dec 11, 2021
@lex111
Copy link
Contributor

lex111 commented Dec 11, 2021

We can ignore all files in source directory except for certain files, in our case it should work for docs plugin:

src/*
!src/plugin-content-docs.d.ts

@Josh-Cena
Copy link
Collaborator Author

Done! I was a bit worried about the safety of doing so (accidentally not publishing something in the future) but for now it seems safe

@Josh-Cena
Copy link
Collaborator Author

image

Yep, this is significant 👀

@merceyz
Copy link
Contributor

merceyz commented Dec 13, 2021

Alternatively you can use the package.json files field to create an array of file patterns to include instead of files to exclude.
https://yarnpkg.com/configuration/manifest#files

@Josh-Cena
Copy link
Collaborator Author

Our exclude pattern is quite complex, so we'd rather opt-in for ignore rather than opt-out. (cf #6087 for a publish mistake) I originally didn't even plan to exclude the TS source files.

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options is not the same as PluginOptions

Also, how can we prevent bad publishes to happen if we miss a single exclusion? I'd rather not exclude src by default, and handle exception manually, but rather the opposite: only exclude files that you are 100% sure they'll never be needed. Unless we have a good way to prevent bad publishes to happen.

I personally find it nice to be able to easily inspect sources of any node dependency, without having to open an uglified dist folder or a GitHub link (probably not a good enough reason to include src 🤷‍♂️). Many libs still publish src, it's worth doing a bit of research on this.

@@ -7,7 +7,7 @@

import fs from 'fs-extra';
import path from 'path';
import {PluginOptions} from './types';
import type {Options} from '@docusaurus/plugin-sitemap';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same type on purpose 🤪 (also a good reason why one PR = one change is a good idea)

  • Options is the partial type that the users are using in configuration, documented with JSDoc annotations (somehow it's TS public API surface)
  • PluginOptions is the complete normalized type with defaults applied

Users should use Options in config

Internally, we should use PluginOptions because we know that normalization was already applied in docusaurus core and it's the actual type that gets injected in the plugin

I know that the naming convention is not very explicit, not sure how to find better names.

Options looks to be better for public API surface as it's simpler to use convention in JSDoc annotations without reading any doc.

Does it make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes I already made have taken that into account. These options are never normalized and are fully partial throughout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see

Still it might be useful to keep a convention of having 2 distinct types in each plugin so that they all look the same, even if the 2 types are the same in this case. If one day we refactor and add some option with a default value to this plugin, we'll have to revert this change, while we could just update the normalized type otherwise. Type aliases are also useful to convey meaning type PluginOptions = Options 🤪

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in this case it's not 100% true either. These options are normalized, all values have defaults, and the type that the plugin constructor should receive is not partial, but a complete type with all values being set:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well—that means the types we had previously were problematic 🙈 If you look at the deleted types file, the values are still optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes that might totally be the case 😄 PluginOptions should have non-optional types

src/**
!src/plugin-content-blog.d.ts
!src/types.ts
!src/blogFrontMatter.ts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases we should protect ourselves against

Just adding a little import can break a release for TS users, and we wouldn't notice before publish that this happened.

image

Is there a way that we can protect ourselves from this kind of issue before publishing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, next step would be to refactor this and get rid of any import('foo.ts') in .d.ts files and reverse that to importing from declaration files. But since we agreed that src should be published this is not much a problem

@slorber
Copy link
Collaborator

slorber commented Dec 15, 2021

Asked here about publishing src or not: https://twitter.com/sebastienlorber/status/1471156386050592778

Seems there's a usecase for using sourcemaps in Node.js + TS

Otherwise, there's no strong consensus so far. I'm ok to not publish sources if we find a sustainable way to manage exclusions and prevent mistakes.

As a first step we could merge asap the exclusion of everything else, and figure out about src in another PR?

@slorber
Copy link
Collaborator

slorber commented Dec 15, 2021

Maybe we should keep sources for now, and test publishing source/declaration maps?

We'll see then if those features are useful or not in practice, or if it's worth optimizing the package size even more?

@Josh-Cena
Copy link
Collaborator Author

Josh-Cena commented Dec 15, 2021

@slorber So my ideas are:

  • If we publish source, we should also publish source maps;
  • We shouldn't publish src for the plugins / themes since they never expose APIs;
  • We should publish src for the utility packages that will be imported;
  • We should invert the way we import the types and make sure the declaration files are always working on their own so that all the .ts files can be safely npm ignored.

@slorber
Copy link
Collaborator

slorber commented Dec 16, 2021

If we publish source, we should also publish source maps;

Yes, that seems like a nice thing to do (+ declaration maps?)

We shouldn't publish src for the plugins / themes since they never expose APIs;

I don't understand, you said the opposite in the line just before? 😅

src/theme should rather always be published for swizzle + if we have some typedefs importing src files it will also break. So in current state it doesn't look safe to me

We should publish src for the utility packages that will be imported;

Do we really need to make any distinction? It seems that it can be useful to publish sources all the time, for all packages, with source/declaration maps + ability to read/inspect the code more easily + swizzle

To me, it does not look worth it to optimize further and publish src partially according to complex rules: we probably have much lower handing fruits in our dependencies.

Also, there is much more impactful research we can do to make Docusaurus significantly faster to install. Next.js is pre-compiling its dependencies as single files with ncc: https://github.com/vercel/ncc (ex with Babel: vercel/next.js#18768)

For these reasons, I think including/excluding src is not really time-worthy to discuss, at least not atm. It is really a micro-optimization with a very limited impact.

I'd only focus on the obvious easy changes to exclude in this PR

We should invert the way we import the types and make sure the declaration files are always working on their own so that all the .ts files can be safely npm ignored.

Agree that there is remaining cleanup to be done. Exposed types should probably be .d.ts and we should find a way to prevent type imports from .ts

@Josh-Cena
Copy link
Collaborator Author

Josh-Cena commented Dec 17, 2021

I don't understand, you said the opposite in the line just before? 😅

The previous sentence was a logical implication: if we publish the source, then we publish the source maps :D However publishing source isn't really worthwhile for things that would never be imported.

I would agree to not bikeshed on whether src should be ignored and just include it for now, but let's not pretend that the line is unclear. There are three types of packages:

  1. Utility packages like theme-common and utils that exist to be imported.
  2. Plugins and themes that will only be required by the core. They are never imported in other codes, and if they ever need to be, it means we should refactor those methods into utils instead.
  3. CLI packages like migration, create-docusaurus and core which don't export anything (core does but that should be refactored) and should only be called as binaries.

Now sources should definitely be published for (1), it could be for (2), but doesn't make much sense for (3) and just makes installation slower when running npx. Furthermore, the core already has src ignored and I probably won't change that.

@Josh-Cena
Copy link
Collaborator Author

Still looking good despite generating source maps. Removing test files and tsbuildinfo certainly helps:

image

@slorber
Copy link
Collaborator

slorber commented Dec 17, 2021

👍 that's already a decent win thanks

For themes/plugins, I believe there's value to publish sources:

  • easily inspect sources
  • it may be imported once we support an API to extend existing plugins
  • it may need to contain sources to implement the docusaurus theme diff feature I suggested in RFC: Improve theming DX (swizzle) #6114)

@slorber slorber merged commit 77c93cb into main Dec 17, 2021
@slorber slorber deleted the jc/npmignore branch December 17, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: maintenance This PR does not produce any behavior differences to end users when upgrading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants