Move font subsetting from render to output #2149

simonberger · 2020-05-10T20:36:21Z

The major part of the font processing is now triggered from the output instead of the render step.
The needed chars are collected from text output calls to CPDF instead of walking the frame tree another time. This also moves all work to CPDF.

Todos:

Testing
- This is covering decimal list bullets
- Multiple embedded fonts are working
- It is working with arabic font shaping of RTL Language Support #2107 - No reason for it to not work
Just do font subsetting when the option is activated
- Maybe we want to activate it by default now as the performance overhead should be low.

Addresses: #750

…rid of double frame tree processing

simonberger · 2020-05-10T20:39:02Z

@bsweeney please have a look at this. It worked well in short examples. But I had no run with multiple fonts and I might overlook things I don't know yet. ;)
Other limitations are listed above.

lib/Cpdf.php

Prior to this change Dompdf did not pass the font subsetting option through to PDFLib. The default option for PDFLib is to enable subsetting.

bsweeney

I ran this through my test gauntlet and didn't notice any issues.

I do think we should tweak the way we do this to behave similar to how PDFLib does it. For PDFLib the specifies the font subsetting preference when adding a font to the document. PDFLib handles the actual determination of the characters for the subset. For CPDF we would have to make a few changes to what we have so far:

Add a new argument to CPDF::selectFont (bool subset) to indicate if subsetting should be enabled for the font.
Add a new key to the font options to indicate if it is subsetted.
Modify CPDF::addText to check if the font subset option is true. When the option is true register the passed-in text for subsetting for the font.
Subsetting can now be determined based on the font option rather than if a subset exists.
Update the CPDF::o_font "out" action to modify the BaseFont value per section 5.5.3 of the PDF specification. This section indicates that the font name for subsetted fonts should be six characters followed by a plus followed by the font name, e.g. ABCDEF+DejaVu-Sans-Normal.

There are two reasons to make this change:

To better align with the PDF specification where subsetted fonts are concerned.
Provide a better user experience for subsetting text placed through direct access to the CPDF instance (i.e. the user just has to indicate that the font is subsetted, they don't have to register the text since CPDF will take care of it).

This is what I was thinking about earlier, but probably didn't sufficiently explain myself. Thoughts?

FYI, I added a commit to your branch to add support for honoring the font subset option in PDFLib.

And, finally, I do think with this update that font subsetting works more consistently. As such we should go ahead and set the default font subsetting option to true.

bsweeney · 2020-05-16T20:44:00Z

Once we finish up with this I'll return to the RTL branch so we can finish that up.

simonberger · 2020-05-16T22:35:35Z

Your requested changes are making sense. I'll implement them.

…/dompdf into move_font_subsetting

simonberger · 2020-05-17T04:39:03Z

In detail I did the following changes which should cover all your points:

selectFont gets an additional parameter isSubsetting which is true by default.
- The calls coming from the adapter are reading the dompdf option.
- The calls from other CPDF methods selecting the defaultFont if none is available now have enabled subsetting.
Collecting subsetting data happens directly in CPDF addText instead of the adpater.
A tag SUB[A-Z][A-Z][A-Z]+ is added in front of the font name. It is increased by fontnum starting with SUBAAB and has plenty of space (26^3).
The dompdf subsetting option is now TRUE by default.

simonberger · 2020-05-17T14:43:40Z

Unicode and file suffix detection are now taken into account early to never collect text data unnecessarily.

bsweeney

look good

This should not be needed anymore since #2149. The text is registered when it is added to the canvas, as with regular text nodes.

Move font processing (with subsetting) from render to output and get …

8b62cf4

…rid of double frame tree processing

Fix composer.json

1130453

simonberger commented May 10, 2020

View reviewed changes

lib/Cpdf.php Show resolved Hide resolved

Just do font subsetting if the option is activated

e10a866

simonberger marked this pull request as ready for review May 11, 2020 00:01

simonberger requested a review from bsweeney May 11, 2020 08:20

Honor font subsetting in the PDFLib adapter

5b73ff6

Prior to this change Dompdf did not pass the font subsetting option through to PDFLib. The default option for PDFLib is to enable subsetting.

bsweeney requested changes May 16, 2020

View reviewed changes

simonberger added 4 commits May 17, 2020 06:09

Font subsetting enabled and saved now per font

92971a8

Merge branch 'move_font_subsetting' of https://github.com/simonberger…

92ff1f0

…/dompdf into move_font_subsetting

Fixed options test due to changed default subsetting value

b87711a

Fixed code smell

2fc2d2b

simonberger added 2 commits May 17, 2020 16:14

Deactivate subsetting early on non unicode fonts

103ea31

Several optimizations

3e5657c

bsweeney approved these changes May 17, 2020

View reviewed changes

simonberger added this to the 0.8.6 milestone May 17, 2020

simonberger merged commit 2aa40ad into dompdf:develop May 17, 2020

Mellthas added a commit that referenced this pull request Sep 12, 2021

Remove obsolete font-subsetting handling for generated content

5cb0468

This should not be needed anymore since #2149. The text is registered when it is added to the canvas, as with regular text nodes.

Mellthas mentioned this pull request Sep 12, 2021

Remove obsolete font-subsetting handling for generated content #2544

Merged

bsweeney pushed a commit that referenced this pull request Sep 18, 2021

Remove obsolete font-subsetting handling for generated content

9ff94db

This should not be needed anymore since #2149. The text is registered when it is added to the canvas, as with regular text nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move font subsetting from render to output #2149

Move font subsetting from render to output #2149

simonberger commented May 10, 2020 •

edited

simonberger commented May 10, 2020

bsweeney left a comment

bsweeney commented May 16, 2020

simonberger commented May 16, 2020

simonberger commented May 17, 2020

simonberger commented May 17, 2020

bsweeney left a comment

Move font subsetting from render to output #2149

Move font subsetting from render to output #2149

Conversation

simonberger commented May 10, 2020 • edited

simonberger commented May 10, 2020

bsweeney left a comment

Choose a reason for hiding this comment

bsweeney commented May 16, 2020

simonberger commented May 16, 2020

simonberger commented May 17, 2020

simonberger commented May 17, 2020

bsweeney left a comment

Choose a reason for hiding this comment

simonberger commented May 10, 2020 •

edited