Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I convert TIFF to PDF, the PDF size is 10 times that of TIFF #6453

Closed
344672699 opened this issue Jul 20, 2022 · 10 comments · Fixed by #6470
Closed

When I convert TIFF to PDF, the PDF size is 10 times that of TIFF #6453

344672699 opened this issue Jul 20, 2022 · 10 comments · Fixed by #6470

Comments

@344672699
Copy link

When I convert TIFF to PDF, the PDF size is 10 times that of TIFF
Why is that?

@344672699
Copy link
Author

344672699 commented Jul 20, 2022

当我将TIFF转换为PDF时,PDF大小是TIFF的10倍
为什么会这样

#tiff path path='XXX';
image = Image.open(path)
image.save(path, save_all=True)

@radarhere
Copy link
Member

radarhere commented Jul 20, 2022

A note for others - according to Google, the previous comment just translates to the first comment.

If I run your code over one of our test images, https://github.com/python-pillow/Pillow/blob/main/Tests/images/hopper.tif, I get a PDF that is almost 10 times smaller.

So your situation does not apply to all TIFFs. Could you upload a copy of your image?

@344672699
Copy link
Author

344672699 commented Jul 20, 2022

image = Image.open('tif\\20220720170924738.TIF')
image.save('tif\\dst\\20220720170924738.pdf', save_all=True)

source : 20220720170924738.TIF 8.51kb
to Pdf : 20220720170924738.pdf 180kb

files: https://github.com/344672699/Pillow/blob/main/20220720170924738.rar

@344672699
Copy link
Author

Thank you for your help

@344672699
Copy link
Author

python: 3.8
pillow : 9.1.1

@radarhere
Copy link
Member

The compression used in your TIFF image is "group4".

https://en.wikipedia.org/wiki/Group_4_compression

It is only used for bitonal (black-and-white) images.

When the PDF is saved by Pillow, the "DCTDecode" filter is used.

https://www.gemboxsoftware.com/pdf/docs/GemBox.Pdf.Filters.PdfDCTDecodeFilter.html

The DCTDecode filter decodes grayscale or color image data that has been encoded in the JPEG baseline format.

For comparison, I tried converting your 9kb TIFF to PDF using ImageMagick. It came out as 12kb, rather than Pillow's 185kb. Looking at that PDF, it used the "CCITTFaxDecode" filter. This looks to also be using group4 compression, so that is why it is so similar to your original image size.

Because "group4" compression is dedicated for only black and white images, it doesn't seem surprising that it is smaller. I mentioned that DCTDecode is for JPEG images, and Pillow is converting your image to a JPEG before saving it in the PDF file. If I convert your TIFF image to JPEG images using ImageMagick, they come out as a 63kb and a 113kb image. 63kb + 113kb = 176kb, close to the size of the final PDF.

So the answer to your question is that Pillow is not using the compression method dedicated to black-and-white images, but one that allows for more colours.

@344672699
Copy link
Author

感谢你的帮助。讲解的非常详细。谢谢。
不过,我还有个疑问,当我使用pillow时,我可以自定义设置use the "CCITTFaxDecode" filter吗?或者有其他设置方式,使我的PDF减小到像ImageMagick一样的“12kb”吗,或者像ITEXT转换后的PDF一样小也可以。
如果不能,是否意味着,我不能使用pillow了,或者只能接收它转换后的10倍以上大小的PDF?

Thank you for your help. The explanations were very detailed. thank you.

However, I still have a question. When I use pillow, can I customize the "ccittfaxdecode" filter? Or is there any other setting method to reduce my PDF to "12KB" like ImageMagick, or as small as the PDF converted by iText.

If not, does it mean that I can't use pillow, or only receive PDF of more than 10 times its converted size?

@radarhere
Copy link
Member

At the moment, that option doesn't exist in Pillow, no.

If you would like, this issue can be left open as a request for someone to add the feature.

@radarhere
Copy link
Member

Alternatively, you may be interested in img2pdf. The following generates a 10kb file.

import img2pdf

with open("out.pdf", "wb") as f:
    f.write(img2pdf.convert('20220720170924738.tif'))

@radarhere
Copy link
Member

I've created PR #6470 to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants