Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pdf export for large image sizes results in wrong colors #25806

Closed
simonwm opened this issue May 2, 2023 · 6 comments · Fixed by #25824
Closed

[Bug]: pdf export for large image sizes results in wrong colors #25806

simonwm opened this issue May 2, 2023 · 6 comments · Fixed by #25824

Comments

@simonwm
Copy link

simonwm commented May 2, 2023

Bug summary

Exporting a figure filled with imshow as pdf is using the wrong colors if the image size/resolution becomes large: E.g. gray cells become red.

Code for reproduction

import numpy as np
import matplotlib.pyplot as plt

# data: four background tiles with one of them highlighted
background = [0.9]*3 # gray
highlight = [1,0,0] # red
X = np.array([[background,highlight],[background,background]]) # varying colors and combinations gives a rich spectrum of bug-phenotypes

# big figure
dpi, figsize = 500, 20 # less dpi or figsize avoids the bug
fig,ax = plt.subplots(figsize=(figsize,figsize),dpi=dpi)
ax.imshow(X)

# export it
fig.savefig(f'minimal.png') # shows correctly three tiles in gray and one tile in red
fig.savefig(f'minimal.pdf') # shows incorrectly all four tiles in red

Actual outcome

The png export shows the correct colors: 3 quarters gray, 1 quarter red. The pdf export shows incorrect colors: 4 quarters red.
minimal
minimal.pdf

Expected outcome

The png and pdf export shows the correct colors: 3 quarters gray, 1 quarter red. I.e. the pdf should also look like the png pasted above.

Additional information

This bug only appears for large figure sizes and dpi (maybe if figuresize * dpi is larger than some threshold).
It is also dependent on the colors chosen and the content of the image. In my real-world use case I have a large heatmap with many fields and one color is consistently replaced by some other color from the plot. I also observed variations which replace a color with a darker version of it, e.g. if one switches the roles of red and gray in the minimal example above: Then instead of red the background is dark red in the pdf.

This minimal example was executed in a fresh conda env with matplotlib as only dependency. I tried it before with matplotlib 3.6.3 on a less clean environment with the same results. I dont have an example of it working in previous versions.

I guess it is an issue with a buffer of fixed size in the pdf backend, as it works with the png backend.

A workaround can be the reduction of the dpi - for vector graphics this still contains all the information. But it changes the "physical" size of the pdf.

Operating system

RHEL 7.9

Matplotlib Version

3.7.1

Matplotlib Backend

No response

Python version

3.11.3

Jupyter version

No response

Installation

conda

@saranti
Copy link
Contributor

saranti commented May 2, 2023

Possibly related to #18871. The workaround works in this case.

@tacaswell
Copy link
Member

To be explict, the work around is to set

ax.imshow(X, interpolation='none')  # the default is `nearest`

which suggests something is going wrong in the resampling / rasterization pipeline with in the pdf backend.

@simonwm
Copy link
Author

simonwm commented May 3, 2023

Thanks! The workaround works nicely, indeed.
I wonder why errors in the interpolation are triggered by using a large image size/resolution...

@tacaswell
Copy link
Member

tacaswell commented May 3, 2023

My guess (and to be clear this is a guess) is that there are floats and rounding involved. Because of how floats work they lose (absolute) precision as they get bigger, thus if there is something near an edge it may work reliably with small absolute scales and fail at large ones.


This is the correct and expected behavior:


In [1]: 1e20 == (1e20 + 1)
Out[1]: True

as at the scale of 1e20 the gap between successive expressible floats is greater than 1!

@QuLogic
Copy link
Member

QuLogic commented May 3, 2023

I could not reproduce this problem. By some coincidence, it turns out I had disabled PDF compression for some other checks, and turning it back on does reproduce the issue. With compression on, we output images as compressed PNG and though we have not worked out the exact difference, there is obviously something going wrong there.

You may be able to work around the problem by disabling PDF compression, and then (because the file size is then huge), running it through Ghostscript.

@QuLogic
Copy link
Member

QuLogic commented May 4, 2023

When compression is on, we output PNG images instead of raw data. Since #17895, we've also started making indexed PNG when possible. We do this by asking Pillow to convert with an adaptive palette. It's kind of undocumented what that does, but looking at various issues, it appears that that does not guarantee that it will use the same colours as the original image. I think this explains why you don't always get the same wrong colour.

If we print img.getextrema() around this conversion:

img = img.convert(
mode='P', dither=dither, palette=pmode, colors=num_colors
)

we get

((229, 255), (0, 229), (0, 229))
(1, 1)

That is, before converting, there are different limits, but after converting, everything is palette index 1. Looking at Pillow's issues, I see some reference that there's no guarantee that adaptive palettes replicate exact colours. I did not look too deeply into Pillow code, but based on python-pillow/Pillow#1852, I think this may only be a visible problem for low-colour-count but high-pixel-count images. Of course, it might actually be slightly off for any image that is palettized.

I believe the fix is to request quantization with an explicit palette. This mostly appears to work, though there is some kind of off-by-one error somewhere as I get a strange banding effect when just directly doing that.

QuLogic added a commit to QuLogic/matplotlib that referenced this issue May 5, 2023
Asking Pillow for an "adaptive palette" does not appear to guarantee
that the chosen colours will be the same, even if asking for exactly the
same number as exist in the image. Instead, create an explicit palette,
and quantize using it.

Additionally, since now the palette may be smaller than 256 colours,
Pillow may choose to encode the image data with fewer than 8 bits per
component, so we need to properly reflect that in the decode parameters
(this was already done for the image parameters).

The effect on test images with _many_ colours is small, with a maximum
RMS of 1.024, but for images with few colours, the result can be
completely wrong as in the reported matplotlib#25806.
QuLogic added a commit to QuLogic/matplotlib that referenced this issue May 6, 2023
Asking Pillow for an "adaptive palette" does not appear to guarantee
that the chosen colours will be the same, even if asking for exactly the
same number as exist in the image. Instead, create an explicit palette,
and quantize using it.

Additionally, since now the palette may be smaller than 256 colours,
Pillow may choose to encode the image data with fewer than 8 bits per
component, so we need to properly reflect that in the decode parameters
(this was already done for the image parameters).

The effect on test images with _many_ colours is small, with a maximum
RMS of 1.024, but for images with few colours, the result can be
completely wrong as in the reported matplotlib#25806.
QuLogic added a commit to QuLogic/matplotlib that referenced this issue Jun 9, 2023
Asking Pillow for an "adaptive palette" does not appear to guarantee
that the chosen colours will be the same, even if asking for exactly the
same number as exist in the image. And asking Pillow to quantize with an
explicit palette does not work either, as Pillow uses a cache that trims
the last two bits from the colour and never makes an explicit match.
python-pillow/Pillow#1852 (comment)

So instead, manually calculate the indexed image using some NumPy
tricks.

Additionally, since now the palette may be smaller than 256 colours,
Pillow may choose to encode the image data with fewer than 8 bits per
component, so we need to properly reflect that in the decode parameters
(this was already done for the image parameters).

The effect on test images with _many_ colours is small, with a maximum
RMS of 1.024, but for images with few colours, the result can be
completely wrong as in the reported matplotlib#25806.
QuLogic added a commit to QuLogic/matplotlib that referenced this issue Jun 10, 2023
Asking Pillow for an "adaptive palette" does not appear to guarantee
that the chosen colours will be the same, even if asking for exactly the
same number as exist in the image. And asking Pillow to quantize with an
explicit palette does not work either, as Pillow uses a cache that trims
the last two bits from the colour and never makes an explicit match.
python-pillow/Pillow#1852 (comment)

So instead, manually calculate the indexed image using some NumPy
tricks.

Additionally, since now the palette may be smaller than 256 colours,
Pillow may choose to encode the image data with fewer than 8 bits per
component, so we need to properly reflect that in the decode parameters
(this was already done for the image parameters).

The effect on test images with _many_ colours is small, with a maximum
RMS of 1.024, but for images with few colours, the result can be
completely wrong as in the reported matplotlib#25806.
melissawm pushed a commit to melissawm/matplotlib that referenced this issue Jun 15, 2023
Asking Pillow for an "adaptive palette" does not appear to guarantee
that the chosen colours will be the same, even if asking for exactly the
same number as exist in the image. And asking Pillow to quantize with an
explicit palette does not work either, as Pillow uses a cache that trims
the last two bits from the colour and never makes an explicit match.
python-pillow/Pillow#1852 (comment)

So instead, manually calculate the indexed image using some NumPy
tricks.

Additionally, since now the palette may be smaller than 256 colours,
Pillow may choose to encode the image data with fewer than 8 bits per
component, so we need to properly reflect that in the decode parameters
(this was already done for the image parameters).

The effect on test images with _many_ colours is small, with a maximum
RMS of 1.024, but for images with few colours, the result can be
completely wrong as in the reported matplotlib#25806.
@QuLogic QuLogic added this to the v3.7.2 milestone Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants