Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tp_richcompare handler for Imaging_Type/ImagingCore #7260

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
96 changes: 96 additions & 0 deletions Tests/test_lib_image.py
@@ -1,7 +1,37 @@
import secrets

import pytest

from PIL import Image

mode_names_not_bgr = (
"1",
"L",
"LA",
"La",
"P",
"PA",
"F",
"I",
"I;16",
"I;16L",
"I;16B",
"I;16N",
"RGB",
"RGBA",
"RGBa",
"RGBX",
# Image.frombytes() doesn't work with BGR modes:
# unknown raw mode for given image mode
# "BGR;15",
# "BGR;16",
# "BGR;24",
Comment on lines +26 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed these errors in #7303, but have since discovered that is not sufficient to get this PR working for those modes. The problem is that those modes are using their lines efficiently, rather than spacing out their data into 4 bytes per pixel.

im->pixelsize = 3;
im->linesize = (xsize * 3 + 3) & -4;

"CMYK",
"YCbCr",
"HSV",
"LAB",
)


def test_setmode():
im = Image.new("L", (1, 1), 255)
Expand Down Expand Up @@ -30,3 +60,69 @@
im.im.setmode("L")
with pytest.raises(ValueError):
im.im.setmode("RGBABCDE")


@pytest.mark.parametrize("mode", mode_names_not_bgr)
def test_equal(mode):
num_img_bytes = len(Image.new(mode, (2, 2)).tobytes())
# alternatively, random.randbytes() in Python 3.9
data = secrets.token_bytes(num_img_bytes)
Comment on lines +70 to +71
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from oss-fuzz, Pillow doesn't use random data in the test suite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different image modes need different amounts of bytes to create them, so this seemed like the best way to get enough bytes, and then get a second set of bytes that are different.

img_a = Image.frombytes(mode, (2, 2), data)
img_b = Image.frombytes(mode, (2, 2), data)
assert img_a.tobytes() == img_b.tobytes()
assert img_a.im == img_b.im


# With mode "1" different bytes can map to the same value,
# so we have to be more specific with the values we use.
def test_not_equal_mode_1():
data_a = data_b = bytes(secrets.choice(b"\x00\xff") for i in range(4))
while data_a == data_b:
data_b = bytes(secrets.choice(b"\x00\xff") for i in range(4))
# We need to use rawmode "1;8" so that each full byte is interpreted as a value
# instead of the bits in the bytes being interpreted as values.
img_a = Image.frombytes("1", (2, 2), data_a, "raw", "1;8")
img_b = Image.frombytes("1", (2, 2), data_b, "raw", "1;8")
assert img_a.tobytes() != img_b.tobytes()
assert img_a.im != img_b.im


@pytest.mark.parametrize("mode", [mode for mode in mode_names_not_bgr if mode != "1"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 1 excluded here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if you scroll up a bit, there's test_not_equal_mode_1 just for mode 1 images, with comments explaining why it's different.

Basically, it's easier to create a mode 1 image from bytes if we use rawmode "1;8".

def test_not_equal(mode):
num_img_bytes = len(Image.new(mode, (2, 2)).tobytes())
# alternatively, random.randbytes() in Python 3.9
data_a = data_b = secrets.token_bytes(num_img_bytes)
while data_a == data_b:
data_b = secrets.token_bytes(num_img_bytes)
img_a = Image.frombytes(mode, (2, 2), data_a)
img_b = Image.frombytes(mode, (2, 2), data_b)
assert img_a.tobytes() != img_b.tobytes()
assert img_a.im != img_b.im


@pytest.mark.parametrize(
("mode", "rawmode"),
(("RGB", "RGBX"), ("YCbCr", "YCbCrX"), ("HSV", None), ("LAB", None)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more semantically.

Suggested change
(("RGB", "RGBX"), ("YCbCr", "YCbCrX"), ("HSV", None), ("LAB", None)),
[("RGB", "RGBX"), ("YCbCr", "YCbCrX"), ("HSV", None), ("LAB", None)],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A previous decision was to use tuples. #6525 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've provided a detailed answer there. Hope @radarhere will agree with me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example you can look at the Django settings documentation. Settings are almost always immutable, but both lists and tuples are used for different things. Tuples are used for ADMINS items (first value is a name, second is an email), LANGUAGES items, SECURE_PROXY_SSL_HEADER (first value is a name, second is a value of header). While lists are used for all sorts of… well… lists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly concerned about which is used, and I don't think performance considerations are exactly a priority for tests, so I'm happy to go with @homm's preference here.

)
def test_equal_three_channels_four_bytes(mode, rawmode):
if rawmode is None:
pytest.skip("no 4-byte rawmode for " + mode)
img_a = Image.frombytes(mode, (2, 2), b"ABC1DEF2GHI3JKL4", "raw", rawmode)
img_b = Image.frombytes(mode, (2, 2), b"ABC5DEF6GHI7JKL8", "raw", rawmode)
assert img_a.tobytes() == b"ABCDEFGHIJKL"
assert img_b.tobytes() == b"ABCDEFGHIJKL"
assert img_a.im == img_b.im


@pytest.mark.skip(reason="no way to directly set C bytes from Python")
@pytest.mark.parametrize("mode", ("LA", "La", "PA"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("mode", ("LA", "La", "PA"))
@pytest.mark.parametrize("mode", ["LA", "La", "PA"])

def test_equal_two_channels_four_bytes(mode):
img_a = Image.frombytes("RGBA", (2, 2), b"1AB23CD45EF67GH8")
img_b = Image.frombytes("RGBA", (2, 2), b"1IJ23KL45MN67OP8")

Check warning on line 121 in Tests/test_lib_image.py

View check run for this annotation

Codecov / codecov/patch

Tests/test_lib_image.py#L120-L121

Added lines #L120 - L121 were not covered by tests
# this only sets the mode in Python, not C
img_a.mode = mode
img_b.mode = mode
assert img_a.tobytes() == b"12345678"
assert img_b.tobytes() == b"12345678"

Check warning on line 126 in Tests/test_lib_image.py

View check run for this annotation

Codecov / codecov/patch

Tests/test_lib_image.py#L123-L126

Added lines #L123 - L126 were not covered by tests
# this fails because the C code still thinks the mode is RGBA
assert img_a.im == img_b.im

Check warning on line 128 in Tests/test_lib_image.py

View check run for this annotation

Codecov / codecov/patch

Tests/test_lib_image.py#L128

Added line #L128 was not covered by tests
12 changes: 8 additions & 4 deletions src/PIL/Image.py
Expand Up @@ -597,14 +597,18 @@ def _dump(self, file=None, format=None, **options):
return filename

def __eq__(self, other):
return (
if self is other:
return True
if not (
self.__class__ is other.__class__
and self.mode == other.mode
and self.size == other.size
and self.info == other.info
and self.getpalette() == other.getpalette()
and self.tobytes() == other.tobytes()
)
):
return False
self.load()
other.load()
return self.im == other.im

def __repr__(self):
return "<%s.%s image mode=%s size=%dx%d at 0x%X>" % (
Expand Down
134 changes: 133 additions & 1 deletion src/_imaging.c
Expand Up @@ -3721,6 +3721,138 @@
(ssizessizeobjargproc)NULL, /*sq_ass_slice*/
};

/*
Returns 0 if all of the pixels are the same, otherwise 1.
Skips unused bytes based on the given mode.
*/
static int
_compare_pixels(
const char *mode, const int ysize, const int linesize,
const UINT8 **pixels_a, const UINT8 **pixels_b
) {
// Fortunately, all of the modes that have extra bytes in their pixels use four bytes for their pixels.
UINT32 mask = 0xffffffff;
if (
!strcmp(mode, "RGB") || !strcmp(mode, "YCbCr") ||
!strcmp(mode, "HSV") || !strcmp(mode, "LAB")
) {
// These modes have three channels in four bytes,
// so we have to ignore the last byte.
#ifdef WORDS_BIGENDIAN
mask = 0xffffff00;
#else
mask = 0x00ffffff;
#endif
} else if (!strcmp(mode, "LA") || !strcmp(mode, "La") || !strcmp(mode, "PA")) {
// These modes have two channels in four bytes,
// so we have to ignore the middle two bytes.
mask = 0xff0000ff;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to ignore the middle two bytes, they should be equal to the first two. See

static void
rgb2la(UINT8 *out, const UINT8 *in, int xsize) {
int x;
for (x = 0; x < xsize; x++, in += 4, out += 4) {
/* ITU-R Recommendation 601-2 (assuming nonlinear RGB) */
out[0] = out[1] = out[2] = L24(in) >> 16;
out[3] = 255;
}
}
for example.

I initially wondered if you were adding this mask for speed, but looking at the next block, removing this mask would allow memcmp to be used, which I expect would be faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they should be equal to the first two" You'd think so, but they're not: https://github.com/Yay295/Pillow/actions/runs/5471546420/jobs/9962793523#step:8:1757

This is why I added those last two tests. Except I couldn't find a way to set this up from Python so I marked them to skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some logging: https://github.com/Yay295/Pillow/actions/runs/5472920560/jobs/9965699214#step:8:1759

After im.reduce() the data is 0xffb8b8b8, but after im.crop().reduce() the data is 0xff0000b8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they should be equal to the first two

They set to the same with rgb2la conversion, but this doesn't mean that they should be equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new method to ImagingCore to allow directly setting the internal bytes. All of the tests work now.

}

if (mask == 0xffffffff) {
// If we aren't masking anything we can use memcmp.
int y;
for (y = 0; y < ysize; y++) {
if (memcmp(pixels_a[y], pixels_b[y], linesize)) {
return 1;
}
}
} else {
const int xsize = linesize / 4;
int y, x;
for (y = 0; y < ysize; y++) {
UINT32 *line_a = (UINT32*)pixels_a[y];
UINT32 *line_b = (UINT32*)pixels_b[y];
for (x = 0; x < xsize; x++, line_a++, line_b++) {
if ((*line_a & mask) != (*line_b & mask)) {
return 1;
}
}
}
}
return 0;
}

static PyObject *
image_richcompare(const ImagingObject *self, const PyObject *other, const int op) {
if (op != Py_EQ && op != Py_NE) {
Py_RETURN_NOTIMPLEMENTED;

Check warning on line 3779 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3779

Added line #L3779 was not covered by tests
}

// If the other object is not an ImagingObject.
if (!PyImaging_Check(other)) {
if (op == Py_EQ) {
Py_RETURN_FALSE;

Check warning on line 3785 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3785

Added line #L3785 was not covered by tests
} else {
Py_RETURN_TRUE;

Check warning on line 3787 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3787

Added line #L3787 was not covered by tests
}
}

const Imaging img_a = self->image;
const Imaging img_b = ((ImagingObject*)other)->image;

if (
strcmp(img_a->mode, img_b->mode)
|| img_a->xsize != img_b->xsize
|| img_a->ysize != img_b->ysize
) {
if (op == Py_EQ) {
Py_RETURN_FALSE;

Check warning on line 3800 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3800

Added line #L3800 was not covered by tests
} else {
Py_RETURN_TRUE;

Check warning on line 3802 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3802

Added line #L3802 was not covered by tests
}
}

const ImagingPalette palette_a = img_a->palette;
const ImagingPalette palette_b = img_b->palette;
if (palette_a || palette_b) {
const UINT8 *palette_a_data = palette_a->palette;
const UINT8 *palette_b_data = palette_b->palette;
const UINT8 **palette_a_data_ptr = &palette_a_data;
const UINT8 **palette_b_data_ptr = &palette_b_data;
if (
!palette_a || !palette_b
|| palette_a->size != palette_b->size
|| strcmp(palette_a->mode, palette_b->mode)
|| _compare_pixels(
palette_a->mode,
1,
palette_a->size * 4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this multiplied by 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third parameter is the line size, but palettes don't have a line size, so it has to be calculated from the number of pixels and the pixel size. Palettes also don't have a stored pixel size, but currently it's always 4. It's not good to hardcode it like this, but there isn't anywhere to actually get it from.

palette_a_data_ptr,
palette_b_data_ptr
)
) {
if (op == Py_EQ) {
Py_RETURN_FALSE;

Check warning on line 3826 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3826

Added line #L3826 was not covered by tests
} else {
Py_RETURN_TRUE;

Check warning on line 3828 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3828

Added line #L3828 was not covered by tests
}
}
}

if (
_compare_pixels(
img_a->mode,
img_a->ysize,
img_a->linesize,
(const UINT8 **)img_a->image,
(const UINT8 **)img_b->image
)
) {
if (op == Py_EQ) {
Py_RETURN_FALSE;
} else {
Py_RETURN_TRUE;
}
} else {
if (op == Py_EQ) {
Py_RETURN_TRUE;
} else {
Py_RETURN_FALSE;

Check warning on line 3851 in src/_imaging.c

View check run for this annotation

Codecov / codecov/patch

src/_imaging.c#L3851

Added line #L3851 was not covered by tests
}
}
}

/* type description */

static PyTypeObject Imaging_Type = {
Expand All @@ -3747,7 +3879,7 @@
0, /*tp_doc*/
0, /*tp_traverse*/
0, /*tp_clear*/
0, /*tp_richcompare*/
(richcmpfunc)image_richcompare, /*tp_richcompare*/
0, /*tp_weaklistoffset*/
0, /*tp_iter*/
0, /*tp_iternext*/
Expand Down