Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canvas should not be a dependency if possible #15652

Closed
tamuratak opened this issue Oct 31, 2022 · 14 comments · Fixed by #15655
Closed

canvas should not be a dependency if possible #15652

tamuratak opened this issue Oct 31, 2022 · 14 comments · Fixed by #15655
Labels

Comments

@tamuratak
Copy link
Contributor

tamuratak commented Oct 31, 2022

canvas has been added to the dependency, not devDependency, since v3.0.279. I don't think it is a good idea. Because node-canvas prebuilds aren't available for ARM, users with macOS on ARM will request helps from you, which would increase the maintenance burden. We can say the same thing for users with old Linux for which node-canvas prebuilds aren't available.

When GitHub actions supports macOS on ARM, the situation would change.

  "dependencies": {
    "canvas": "^2.10.1",
    "web-streams-polyfill": "^3.2.1"
  },

Configuration:

  • PDF.js version: 3.0.279
@Snuffleupagus
Copy link
Collaborator

What do you actually propose that we do instead, since the canvas package is necessary when using the PDF.js library in Node.js environments and we support browsers/Node.js in the same builds?

@tamuratak
Copy link
Contributor Author

One of options would be moving canvas into optionalDependencies from dependencies. Then, npm i pdfjs-dist will install canvas as a dependency by default, and npm will not stop installing pdfjs-dist even if installing canvas fails.

Users also can executenpm i --omit=optional pdfjs-dist if they want not to install canvas.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Nov 1, 2022

So by using optionalDependencies we basically change an install-time error into a potential run-time error for some (hopefully small number of) Node.js users, while avoiding any issues for browser users?
If so, I suppose that we can try that; please submit a patch.

tamuratak added a commit to tamuratak/pdf.js that referenced this issue Nov 1, 2022
…stalling pdfjs-dist

even if the installation of canvas fails. Close mozilla#15652
@sunknudsen
Copy link

@tamuratak Thanks for raising flag… just hit use case on M1 Mac.

@Snuffleupagus, will #15655 makes it’s way to pdfjs-dist?

@timvandermeij
Copy link
Contributor

Fixed in release 3.1.81.

shaymerrill-mc added a commit to MasterControlIncPublic/pdf.js that referenced this issue Jan 6, 2023
* Use a consistent `outline` for all UI buttons (PR 15438 follow-up)

Currently the `viewBookmark`-button, which is actually a `href`-element, gets an inconsistent `outline`.
Similarly, the `dialog`-buttons also have an inconsistent `outline` after the changes in PR 15438.

Finally, simplifies a couple of `border` rules since setting a border-width when "none" is being used doesn't seem meaningful.

* Restore the old fonts in the `errorWrapper` (PR 15438 follow-up)

This only applies to the GENERIC viewer, hence we use the pre-processor to exclude it from the Firefox PDF Viewer.

* Use `stopImmediatePropagation` without checking for its existence first

These checks were added years ago, but given the following compatibility data we should just be able to call the method directly: https://developer.mozilla.org/en-US/docs/Web/API/Event/stopImmediatePropagation#browser_compatibility

* Remove some unused code from the `_queueLoadingCallback` method (PR 3477 follow-up)

The existing `loadingContext` class-property can be simplified slightly, since we've not been using the `id`-property on the requests ever since PR 3477 (which landed nine years ago).
Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.

* Replace the `BaseFontLoader` classes with one `FontLoader` class

By having just *one* class, and using pre-processor blocks directly in the relevant methods, we reduce the size of this code in the *built* `pdf.js` file.
Originally, when the `BaseFontLoader` abstraction was added in PR 9982, the idea was probably that additional build-targets would get their own implementations. Given that this hasn't happened in the four years since that landed, it doesn't seem meaningful to keep it around.

* Remove the unused `rules` parameter from the `_prepareFontLoadEvent` method (PR 3477 follow-up)

This is yet another small piece of functionality that became unused in PR 3477 (which landed nine years ago).

* Remove the ability to pass in more than one font to the `_prepareFontLoadEvent` method (PR 10539 follow-up)

After the changes in PR 10539 (which landed over three years ago) the `FontLoader.bind` method can only be called with *a single* font at a time, hence the `_prepareFontLoadEvent` method obviously don't need to support multiple fonts any more.

* [Firefox viewer] Skip some unnecessary code in the `FontLoader.bind` method

Given that Firefox supports *synchronous* font loading, when the Font Loading API isn't being used, there's really no point including code which if called would just throw in the MOZCENTRAL build. (This is safe, since the `FontLoader.isSyncFontLoadingSupported`-getter always return `true` there.)

* [JS] Override the `Doc.external`-getter to avoid `alert`-modals on load (issue 15509)

This property is documented in https://web.archive.org/web/20201112021418if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/AcrobatDC_js_api_reference.pdf#G5.1977075

Given that PR 14207, which is *somewhat* similar, landed without tests that's hopefully fine here as well.

* Replace the `DOMMatrix` polyfill, used with Node.js, with the one from `node-canvas`

Fewer dependencies shouldn't be a bad idea in general, and given that the `node-canvas` package already include a `DOMMatrix` polyfill we can simply use that one instead.

* Revert "Don't listen for window resolution changes in old browsers (PR 15319 follow-up)"

* Add more non-standard ligatures in the `glyphlist.js` file (issue 15516)

Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.

* [Editing] Disable the HandTool during editing (bug 1792422)

This extends the approach used in PresentationMode to also cover the AnnotationEditor, and tries to handle the combination of both cases correctly.
In order to simplify the overall implementation we simply track the *first* seen "previous" cursorTool, and don't allow it to be reset as long as either PresentationMode or an AnnotationEditor is being used.

* [JS] Add the function AFExactMatch

* Re-factor the `toggleButton` l10n in the `PDFSidebar` class

Rather than "manually" looking up the l10n-string and then updating the button, we can (and probably even should) just update the l10n-id and then trigger proper translation for the button DOM-element.

* Change how `src/shared/compatibility.js` is imported

Currently the compatibility-file is loaded using a standard `import`-statement and while its code is enclosed in a pre-processor block, and thus is excluded in e.g. the MOZCENTRAL build-target, it still results in the *built* `pdf.js`/`pdf.worker.js` files having an effectively empty closure as a result.
By moving the checks from `src/shared/compatibility.js` and into `src/shared/util.js` instead, we can load the file using a build-time `require`-statement and thus avoid that closure.

Note that with these changes the compatibility-file will no longer be loaded in development mode, i.e. when `gulp server` is used. However, this shouldn't be a big issue given that none of its included polyfills could be loaded then anyway (since `require`-statements are being used) and that it's really only intended for the `legacy`-builds of the library.

* [GENERIC viewer] Remove the `errorWrapper` UI

In the Firefox PDF Viewer this has never been used, with the error message simply printed in the web-console, and (somewhat) recently we've also updated the viewer code to avoid bundling the relevant code there. Furthermore, in the Firefox PDF Viewer we're not even display the *browser* fallback bar any more; see https://bugzilla.mozilla.org/show_bug.cgi?id=1705327.

Hence it seems slightly strange to keep this UI around in the GENERIC viewer, and this patch proposes that we simply remove it to simplify/unify the relevant code in the viewer. In particular this also allows us to remove a couple of l10n-strings, which have always been unused in the Firefox PDF Viewer.

* [Firefox viewer] Skip unused printing-string in `DEFAULT_L10N_STRINGS`

Given that the Firefox PDF Viewer uses the *browser* print UI, this fallback l10n-string isn't necessary in the MOZCENTRAL build.

* Update dependencies to the most recent versions

* Update translations to the most recent versions

* Fix intermittent errors in the "check that first text field has focus" scripting test

This commit fixes the "Expected null to equal '401R'" errors that
surfaced after the Puppeteer 18 upgrade. Note that even before that
this would have been an improvement because it takes some time between
scripting being reported ready (i.e., triggering the execution of any
OpenActions) and those OpenActions actually completing execution, so
it's only safe to check which element is focused if we know an element
actually became focused.

* [api-minor] Stop setting an `id` on the styleElement used with CSS font-loading

This is yet another small piece of clean-up of the `FontLoader`-code, since we've not used this `id`-property for anything ever since PR 6571 (which landed almost seven years ago). Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.

*Please note:* This is tagged `[api-minor]` since it's technically observable from the outside, however no user ought to be directly interacting with these CSS font rules.

* Remove duplicated `destroy`-calls in the "custom ownerDocument" unit-tests

Given that `PDFDocumentProxy.destroy` is nothing but an alias for `PDFDocumentLoadingTask.destroy` calling both methods is obviously not useful.

* Use more `for...of` loops in the code-base

Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.

* [api-minor] Add partial support for the "GoToE" action (issue 8844)

*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.

See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909

---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.

* Log the `build` number, in addition to the `version`, in the viewer

Given that the `build` number allows you to *directly* find the relevant commit, it cannot hurt to log that one as well.

* Stop localizing error *details* in the viewer (PR 15533 follow-up)

While it can't hurt to localize the main error-messages, also localizing the error *details* has always seemed somewhat unnecessary since those are only intended for debugging/development purposes. However, I can understand why that's done since the GENERIC viewer used to expose this information in the UI; via the `errorWrapper` UI that's removed in PR 15533.

At this point, when any errors are simply logged in the console, it no longer seems necessary to keep localizing the error *details* in the default viewer.

* Replace loop with `TypedArray.prototype.set` in the `DecryptStream.readBlock` method

There's no reason to use a manual loop, when a native method exists.

* [api-minor] Make `isOffscreenCanvasSupported` configurable via the API (issue 14952)

This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`).

The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular:
 - When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes.
 - If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.

* Remove the unused `Util.apply3dTransform` method

This method was originally added in PR 1157 (back in 2012), however its only call-site was then removed in PR 2423 (also in 2012).
Hence this method has been completely unused for nearly a decade, and it should thus be safe to remove it.

* Simplify the way to compute the remainder modulo 3 in PDF20Hash function

I noticed the 256 % 3 (which is equal to 1) so I slighty simplify the code.
The sum of the 16 Uint8 doesn't exceed 2^12, hence we can just take the
sum modulo 3.

* [XFA] Add some padding inline in selects

Because of https://bugzilla.mozilla.org/show_bug.cgi?id=1582545, the padding-inline is by default 0.
0 is not really enough because of the outline, so just set it to 2px (it was 4px before the patch)
in order to have something visually correct.

* Simplify the `dropdownToolbarButton`-select width computation

The way that we set the width of the `dropdownToolbarButton`-select is very old, and despite some improvements over the years this is still somewhat hacky.
In particular, note how we're assigning the select-element a larger width than its containing `dropdownToolbarButton`-element. This was done to prevent displaying *two* separate icons, i.e. the native and the PDF.js one, since it's the only way to handle this in older browsers (particularly Internet Explorer).

Given the currently supported browsers, there's however a better solution available: use `appearance: none;` to disable native styling of the select-element. [According to MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/appearance#browser_compatibility), this is supported in all reasonably modern browsers.
This way we're able to simplify both the CSS rules and the JS-code that's used to adjust the `dropdownToolbarButton` width in a localization aware way.

* Remove unnecessary `size` attribute from the pageNumber-input

This attribute is overridden by the explicit `width` that's specified in the CSS rules, hence we can remove one more small piece of very old code; see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input#size

* Remove the unused `CMapCompressionType.STREAM` value

This was added in PR 8064, over five years ago, for a possible future CMap file-format that was never implemented.

* [api-major] Remove some deprecated constants

All of the these constants have been deprecated for a while, and with the upcoming *major* version this seems like a good time to remove them.
For the string-constants we can simply remove them, but the number-constants are left commented out since we don't want to re-number the list to prevent third-party breakage.

* Stop sending the unused `source.url` property in "GetDocRequest"

It seems that this property became *effectively* unused already in PR 8617, however we missed removing it as part of the clean-up in PR 10376.

* Group the `evaluatorOptions` on the main-thread, when sending "GetDocRequest"

Rather than sending all of these parameters individually and then grouping them together on the worker-thread, we can simply handle that in the API instead.

* Stop using the `source`-object when sending "GetDocRequest"

Looking at the code on the worker-thread, there doesn't appear to be any particular reason for placing *some* of the properties in a `source`-object when sending them with "GetDocRequest".
As is often the case the explanation for this structure is rather "for historical reasons", since originally we simply sent the `source`-object as-is. Doing that was obviously a bad idea, for a couple of reasons:
 - It makes it less clear what is/isn't actually needed on the worker-thread.
 - Sending unused properties will unnecessarily increase memory usage.
 - The `source`-object may contain unclonable data, which would break the library.

* Ensure that `Page.getOperatorList` handles Annotation parsing errors correctly (issue 15557)

*Fixes a regression from PR 15246, sorry about that!*

The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.

* Slightly re-factor the version fetching in `PDFDocument.checkHeader`

Note how after having found the "%PDF-" prefix we then read both the prefix and the version in the loop, only to then remove the prefix at the end.
It seems better to instead advance the stream position past the "%PDF-" prefix, and then read only the version data.

Finally the loop-condition can also be simplified slightly, to further clean-up some very old code.

* [GENERIC viewer] Ensure that the we register the `editorTypes` for each `AnnotationEditorUIManager`-instance (issue 15564)

When a new PDF document is opened in the GENERIC viewer we (obviously) create a new `AnnotationEditorUIManager`-instance, since those are document-specific, and thus we need to ensure that we actually register the `editorTypes` for each one.

* [Editor] Change the caret cursor into the arrow one only when a text editor isn't empty (bug 1794717)

When a text editor is empty, clicking outside will create a new editor, hence it makes sense
to keep a caret cursor.

* Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559)

*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).

* [api-minor] Stop sending "UnsupportedFeature" from the worker-thread GetOperatorList-handling

This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since:
 - We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays.
 - The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway.
 - We're also no longer displaying a notification-bar, in the *built-in* Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.

* [JS] Take into account all the required fields for some computations

- Fix Field::getArray in order to collect only the fields which have a value;
- Fix AFSimple_Calculate:
  * allow to have a string with a list of field names as argument;
  * since a field can be non-terminal, use Field::getArray to collect
    the field under it and then apply the calculation on all the descendants.

* Slightly re-factor `PartialEvaluator._simpleFontToUnicode`

Given the sheer number of heuristics added to this method over the years, moving the *valid* unicode found case to the top should improve readability of the code.

* Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, mozilla#15573)

* Don't trigger worker-thread cleanup when destruction has already started

Note how we're currently skipping all main-thread cleanup when document destruction has started, but for some reason we're still dispatching the "Cleanup" message.
This seems like a simple oversight, since destruction will already invoke the `BasePdfManager.cleanup` method (on the worker-thread) to fully clear-out all caches.

* Re-factor the PDF version parsing in the worker-thread

Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.

* [Editor] Ink editors must have their dimensions in percents after having been resized

* Tweak the vertical position of the sidebar notification icon

Given that the new sidebar icon is slightly shorter than the old one, it cannot hurt to ever so slightly tweak the vertical position of the notification icon.

(While the patch also changes the CSS rule used for the horizontal position, this is a no-op and was done to improve consistency between the two values.)

* Update npm packages

* Update l10n files

* [Editor] Commit the current editor before setting the new viewport

* Fallback and try a *previous* generation if all else fails in `XRef.indexObjects` (issue 15577)

When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.

* Remove the `Glyph.matchesForCache` method (PR 13494 follow-up)

This method, and its class, was originally added in PR 4453 to reduce memory usage when parsing text. Then PR 13494 extended the `Glyph`-representation slightly to also include the `charCode`, which made the `matchesForCache` method *effectively* redundant since most properties on a `Glyph`-instance indirectly depends on that one. The only exception is potentially `isSpace` in multi-byte strings.

Also, something that I noticed when testing this code: The `matchesForCache` method never worked correctly for `Glyph`s containing `accent`-data, since Objects are passed by reference in JavaScript. For affected fonts, of which there's only a handful of examples in our test-suite, we'd fail to find an already existing `Glyph` because of this.

* Re-factor the glyph-cache lookup in the `Font._charToGlyph` method

With the changes in the previous patch we can move the glyph-cache lookup to the top of the method and thus avoid a bunch of, in *almost* every case, completely unnecessary re-parsing for every `charCode`.

* [Annotation] Take the border into account when computing the font size (bug 1794403)

* Relax the /Pages dictionary /Count check for corrupt documents (issue 9105)

After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document.
Hence it seems strange to keep this requirement for *corrupt* PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.

* [Editor] Make FreeText annotations visible for screen readers when in editing mode (bug 1793419)

- When we're editing some annotations, keeping the role="text-box" make them visible
as editable and VoiceOver (Mac) is able to read the contents when they're focused;
- Add an attribute "aria-activedescendant" in order to make the content discoverable
by NVDA on Windows.

* Extend `getSupplementalGlyphMapForCalibri` with some umlauts (issue 15594)

* [Annotation] Replace use of id by data-element-id to have the correct id

* Restore a weaker version of the /Pages dictionary /Count check for corrupt documents (PR 15593 follow-up)

It appears that PR 15593 broke `issue12402`, and we thus need to partially restore the /Count check.
 I completely missed this when looking at the test-results for PR 15593, both locally and on the bots, since the `Driver._getLastPageNumber` method would "swallow" an unavailable page number.

* Let `Lexer.getNumber` treat more invalid "numbers" as zero (issue 15604)

In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.

* Update icons (last and final update)

* Revert "Avoid all rendering breaking completely when CanvasPattern.setTransform() is unsupported" (PR 13725 follow-up)

PR 13725 was only intended as a temporary work-around, and it seems that we can now revert that.
 - Firefox 102 is the currently maintained ESR-branch, and the PDF.js project only supports the active one.
 - Node.js now works, thanks to the `node-canvas` package, and I've confirmed locally that following the STR in issue 13724 generates a correct image.

* Let the `PdfManager.requestLoadedStream` method return the stream

*This is very old code, and it could thus do with some simplification.*

Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.

* Update the l10n-strings for the download-buttons (bug 1662416)

According to https://bugzilla.mozilla.org/show_bug.cgi?id=1662416#c8

* Fix editor tests on Windows

- In mozilla#15373, we implemented copy/paste actions in using the system
clipboard.
For any reasons, on Windows, the clipboard doesn't contain the expected
data when the tests are ran in parallel, hence the tests which are
using the clipboard need to be ran sequentially.
- Make sure that we can paste after having copied.

* Fix font for the 'current view' entry in the secondary toolbar (bug 1797310)

* [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)

When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.

* Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629)

* [JS] Avoid to trigger a commit event on 'ENTER' when the textfield is multiline

* [api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)

* Combine `Array.from` and `Array.prototype.map` calls

This isn't just a tiny bit more compact, but it also avoids an intermediate allocation; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#description

* [JS] Some functions (print, alert,...) must be called only after a user activation

- Some events, which require a user interaction, will allow those functions to be called.
But after few seconds, if there are no more user interaction, it won't be possible
anymore.
The idea is to give an opportunity to the user to leave the pdf.
- Disable print function when we're printing, the same with saving and disallow to save
on open events.

* Update npm packages

* Update l10n files

* [Editor] Change the cursor icons

* Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up)

After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.

* Bump minimist

Bumps [minimist](https://github.com/minimistjs/minimist) and [minimist](https://github.com/minimistjs/minimist). These dependencies needed to be updated together.

Updates `minimist` from 1.2.0 to 1.2.6
- [Release notes](https://github.com/minimistjs/minimist/releases)
- [Changelog](https://github.com/minimistjs/minimist/blob/main/CHANGELOG.md)
- [Commits](minimistjs/minimist@v1.2.0...v1.2.6)

Updates `minimist` from 1.2.5 to 1.2.6
- [Release notes](https://github.com/minimistjs/minimist/releases)
- [Changelog](https://github.com/minimistjs/minimist/blob/main/CHANGELOG.md)
- [Commits](minimistjs/minimist@v1.2.0...v1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump versions in `pdfjs.config`

* Prevent mouse interaction with form elements in PresentationMode (issue 12232)

* Prevent keyboard interaction with form elements in PresentationMode (issue 12232)

This uses the relatively new `HTMLElement.inert` property, see https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/inert for additional information. The only "problem" is that this isn't yet available in all Firefox channels, but until https://bugzilla.mozilla.org/show_bug.cgi?id=1764263 is fixed we're no worse off than before.

* Changed link for "Gulp's getting started guide"

Gulp's getting started guide changed location to https://github.com/gulpjs/gulp/tree/master/docs/getting-started. Link updated in readme to reflect that.

* Move canvas to optionalDependencies, which enables npm to continue installing pdfjs-dist
even if the installation of canvas fails. Close mozilla#15652

* Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)

Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.

This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.

* [api-minor] Let `Catalog.getAllPageDicts` return an *empty* dictionary when loading the first /Page fails (issue 15590)

In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.

* Cache the normalized unicode-value on the `Glyph`-instance

Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.

Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.

*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.

* Use private fields in a few more viewer classes

These properties were always intended to be *private*, so let's use modern JS features to actually enforce that.

* [api-minor] Initialize the unicode-category *lazily* on the `Glyph`-instance

The purpose of this patch is twofold:
 - Initialize the unicode-category data *lazily* during text-extraction, since this is completely unused during general parsing/rendering.
 - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was *accidentally* included.

Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).

* Propagate the translated font name to TextContentItems.

This allows font data for system fonts to be looked up in the
PDFObjects.

* Use the *full* inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)

*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.

* Initialize the dictionary *lazily* when parsing inline images

This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).

* Remove a couple of unnecessary temporary variables in `MurmurHash3_64.hexdigest`

These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).

* [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)

- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.

* Use actually private fields in the `AnnotationStorage` class

These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this.
Also, this patch removes a couple of JSDoc comments that we generally don't use.

* Remove the constructor in the `StatTimer` class

With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is *disabled* by default (see the `pdfBug` API option).
Also, we can (slightly) simplify the two loops used in the `toString` method.

* Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one

Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally.

---
[1] In those cases, for example a `FormatError` should have been thrown instead.

* Add a *linked* test-case for issue 2618

Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.

* Take the mask-offset into account when rendering repeated image masks (bug 1799927)

*Please note:* As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least *appears* to work.

* Update npm packages

* Update l10n files

* Bump minimatch from 3.0.4 to 3.1.2

Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2.
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Commits](isaacs/minimatch@v3.0.4...v3.1.2)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Stop Dependabot from creating its own, otherwise unused, labels

Currently all Dependabot update PRs get tagged with a "javascript" label, which is annoying since we don't actually use that one.
To try and avoid this we specify the labels explicitly, please see https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#labels

* Normalize fullwidth, halfwidth and circled chars when searching

* Initialize the find-related `DIACRITICS_EXCEPTION_STR` constant lazily

Adding some logging with `console.{time, timeEnd}` around all the constant definitions at the top of the `web/pdf_find_controller.js` file, I noticed that computing `DIACRITICS_EXCEPTION_STR` took close to half the total time.
My first idea was just to try and make it slightly more efficient, by reducing the amount of iterations and intermediate allocations. However, with this constant only being used during "match diacritics" searches it thus seemed like a good candidate for lazy initialization.

*Please note:* Given that this is a micro optimization, I fully understand if the patch is rejected.

* Move the `_isOffscreenCanvasSupported` property to the base `Annotation` class

Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console.
The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.

* Combine the `stringToUTF16String` and `stringToUTF16BEString` helper functions

Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.

* Move the `isAscii` helper function into the worker-thread

Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.

* Move the `escapeString` helper function into the worker-thread

Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.

* Add a basic `stringToUTF16HexString` unit-test

* Remove unnecessary function names in the `src/core/worker.js` file

Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.

* Remove an unnecessary variable in `getPdfManager`, in the `src/core/worker.js` file

Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.

* Add localization support for the `annotationLayer` reference tests (issue 10791)

* Reduce duplication when creating a fallback appearance for `MarkupAnnotation`s

Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.

* Tweak assignment of common parameters in the `Annotation` classes

This is slightly more compact, and also unifies the format across the various classes.

* Remove the overflowing text special-case from `scrollIntoView` (issue 15714)

With the changes made in PR 14564 this *should* no longer be necessary now, however we still need to keep the `scrollMatches` parameter to handle textLayers with markedContent correctly when searching.

* Support FileAttachments with hash-signs in the filename (issue 15729)

The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].

---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.

* Unblock the load event when the pdf has a password (bug 1801341)

* Simplify the `getFilenameFromUrl` helper function

* Add support for Optional Content in TilingPatterns (issue 15716)

This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.

* Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719)

* Revert "Remove the overflowing text special-case from `scrollIntoView` (issue 15714)"

* [api-minor] Deprecate the TextLayer `timeout` parameter

This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant.
Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless).
While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting).

*Please note:* While unrelated here, also removes a now unused property that I forgot in PR 15259.

---
[1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.

* Re-factor and simplify the `getQuadPoints` helper function

The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.

* Ensure that the initial document position is always correct with non-default Scroll/Spread modes (issue 15695)

Please refer to the inline comment for additional details. The patch also improves internal consistency when `#scrollIntoView` is called directly.

* Fix mc functions with new syntax in gulpfile

* change main-color to mc style and remove dark theme

* FontAwesome icons, work in progress

* hide page scrolling from toolbar sidebar and edit mode buttons from toolbar

* fixed #pageotateCw::before icon

* fixed toggled icons when selected

* removed Layers from left sidebar and simplified how the four sidebarViewButtons are hidden

* added presentation mode to toolbar and wired it up

* Current View bookmark icon updated

* zoom drop down arrow on hover

* spacing for spreads

* Re-fixing findbar

* changed spreads from spacing to mozilla icons

* added horizontal separator between selection and scrolling sections

* Disable Save as well as download

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
Co-authored-by: calixteman <calixte.denizet@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mitchell Gale <mitchyg99@gmail.com>
Co-authored-by: Takashi Tamura <tamuratak@users.noreply.github.com>
Co-authored-by: Samuel Yuan <syuan@google.com>
Co-authored-by: Sarah Benson <sbenson@mastercontrol.com>
Co-authored-by: Barry Nix <82537651+mcbnix@users.noreply.github.com>
@nikitakot
Copy link

nikitakot commented Mar 15, 2023

The npm i --omit=optional pdfjs-dist doesn't seem to work. The canvas dependency is still installed and appears in node_modules and package-lock.json.
I use electon.js and electron-rebuild to build my electron app and electron-rebuild tries to rebuild the canvas from the node modules which I don't event need.

@silverwind
Copy link

silverwind commented Apr 12, 2023

canvas module does not work inside worker threads, which are in use by testing frameworks like vitest.

Can it be dropped or be made optional so the consumer can provide it to pdf.js on-demand? At minimum, the require for it should be wrapped in a try-catch clause.

@bn3t
Copy link

bn3t commented Apr 28, 2023

For me it worked when doing:

npm i --omit=optional pdfjs-dist@3.5.141

Since, my issue was specifically when running unit tests combining vitest and jsdom, I also did:

npm i --omit=optional jsdom

@silverwind
Copy link

omit works (best set in .npmrc), but it's dangerous because it will omit all optional dependencies from all your dependencies, not just pdfjs-dist.

@bn3t
Copy link

bn3t commented May 2, 2023

At the end, I have to come back at what I've written in an above comment. --omit=optional didn't work as other dependencies were marked as optional (esbuild-*) but were necessary anyway. I am using another solution which also looks like a hack but seems to work. It comes from this SO answer: https://stackoverflow.com/a/73846274. It consists in adding this to package.json:

"overrides": {
    "canvas": "../_EXCLUDED_"
  }

Since canvas, is an optional dependency, an override to a non existing location is ignored silently.

@lucasgadams
Copy link

Is there any way to reduce the needed size of the canvas package? when I use pdfjs in a nextjs environment, the canvas package is like 48mb and the limit for Vercel is 50mb! This makes it incredibly difficult to build an app. No other package is even close to this size. Kind of pulling my hair out over this, i currently cant deploy my app. Any ideas?

@MattyBalaam
Copy link

I found a workaround to stop installing canvas if that might help? We add a new entry to the resolutions object in package.json for canvas:

#16463

@net
Copy link

net commented Jul 11, 2023

I wonder if using @napi-rs/canvas instead would be a good solution for both M1 and Vercel environments.

@jakepgoldman
Copy link

@net how would you go about replacing the canvas dep with the package you listed above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.