Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move font subsetting from render to output #2149

Merged
merged 10 commits into from
May 17, 2020

Conversation

simonberger
Copy link
Member

@simonberger simonberger commented May 10, 2020

The major part of the font processing is now triggered from the output instead of the render step.
The needed chars are collected from text output calls to CPDF instead of walking the frame tree another time. This also moves all work to CPDF.

Todos:

  • Testing
    • This is covering decimal list bullets
    • Multiple embedded fonts are working
    • It is working with arabic font shaping of RTL Language Support #2107 - No reason for it to not work
  • Just do font subsetting when the option is activated
    • Maybe we want to activate it by default now as the performance overhead should be low.

Addresses: #750

@simonberger
Copy link
Member Author

@bsweeney please have a look at this. It worked well in short examples. But I had no run with multiple fonts and I might overlook things I don't know yet. ;)
Other limitations are listed above.

@simonberger simonberger marked this pull request as ready for review May 11, 2020 00:01
@simonberger simonberger requested a review from bsweeney May 11, 2020 08:20
Prior to this change Dompdf did not pass the font subsetting option through to PDFLib. The default option for PDFLib is to enable subsetting.
Copy link
Member

@bsweeney bsweeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this through my test gauntlet and didn't notice any issues.

I do think we should tweak the way we do this to behave similar to how PDFLib does it. For PDFLib the specifies the font subsetting preference when adding a font to the document. PDFLib handles the actual determination of the characters for the subset. For CPDF we would have to make a few changes to what we have so far:

  • Add a new argument to CPDF::selectFont (bool subset) to indicate if subsetting should be enabled for the font.
  • Add a new key to the font options to indicate if it is subsetted.
  • Modify CPDF::addText to check if the font subset option is true. When the option is true register the passed-in text for subsetting for the font.
  • Subsetting can now be determined based on the font option rather than if a subset exists.
  • Update the CPDF::o_font "out" action to modify the BaseFont value per section 5.5.3 of the PDF specification. This section indicates that the font name for subsetted fonts should be six characters followed by a plus followed by the font name, e.g. ABCDEF+DejaVu-Sans-Normal.

There are two reasons to make this change:

  • To better align with the PDF specification where subsetted fonts are concerned.
  • Provide a better user experience for subsetting text placed through direct access to the CPDF instance (i.e. the user just has to indicate that the font is subsetted, they don't have to register the text since CPDF will take care of it).

This is what I was thinking about earlier, but probably didn't sufficiently explain myself. Thoughts?


FYI, I added a commit to your branch to add support for honoring the font subset option in PDFLib.


And, finally, I do think with this update that font subsetting works more consistently. As such we should go ahead and set the default font subsetting option to true.

@bsweeney
Copy link
Member

Once we finish up with this I'll return to the RTL branch so we can finish that up.

@simonberger
Copy link
Member Author

Your requested changes are making sense. I'll implement them.

@simonberger
Copy link
Member Author

In detail I did the following changes which should cover all your points:

  • selectFont gets an additional parameter isSubsetting which is true by default.
    • The calls coming from the adapter are reading the dompdf option.
    • The calls from other CPDF methods selecting the defaultFont if none is available now have enabled subsetting.
  • Collecting subsetting data happens directly in CPDF addText instead of the adpater.
  • A tag SUB[A-Z][A-Z][A-Z]+ is added in front of the font name. It is increased by fontnum starting with SUBAAB and has plenty of space (26^3).
  • The dompdf subsetting option is now TRUE by default.

@simonberger
Copy link
Member Author

Unicode and file suffix detection are now taken into account early to never collect text data unnecessarily.

Copy link
Member

@bsweeney bsweeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look good

@simonberger simonberger added this to the 0.8.6 milestone May 17, 2020
@simonberger simonberger merged commit 2aa40ad into dompdf:develop May 17, 2020
Mellthas added a commit that referenced this pull request Sep 12, 2021
This should not be needed anymore since #2149. The text is registered
when it is added to the canvas, as with regular text nodes.
bsweeney pushed a commit that referenced this pull request Sep 18, 2021
This should not be needed anymore since #2149. The text is registered
when it is added to the canvas, as with regular text nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants