-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated JPEG magic number #4707
Conversation
… function by more correct version.
Tests/test_file_jpeg.py
Outdated
@@ -706,6 +706,26 @@ def test_icc_after_SOF(self): | |||
with Image.open("Tests/images/icc-after-SOF.jpg") as im: | |||
assert im.info["icc_profile"] == b"profile" | |||
|
|||
def test_reading_not_whole_file_for_define_it_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'for define it type'... sorry, this confuses me. What is this name intended to communicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... English is not my native language. May be more correct to name this test as "test_not_read_entire_file_to_determine_its_type".
I've created Cykooz#1 with some suggestions. |
Could you give a brief description of the 'more complicated problem' you allude to? |
My initial problem is that Pillow, in some cases, reads an entire file just to determine its type. Pillow/src/PIL/JpegImagePlugin.py Lines 357 to 386 in db7186b
This loop may read an entire file if content of file is something like this (with old version of "magic number"):
Correct "magic number" doesn't fix this problem completely. But it reduces the likelihood of encountering a problem. I am not an expert in JPEG-format and I don't know how many bytes need to read from a file to decide it's not a JPEG file. |
I wouldn't phrase it exactly like this. It's not in order to determine its type. Pillow is attempting to open the file, hits an error and then gives up on the idea that it is a valid JPEG. So a large amount of the file has to be read to determine its validity. |
In my app, reading a file is a very expensive operation. Reading 3-4 Mb data from a file may take several seconds. This may be used in DoS attacks. |
We guard against images with large dimensions. We don't presently guard against images with large file sizes. Setting aside the matter of 0xFF bytes for one second, part of the problem is that Pillow reads data from before the SOS marker on the initial open. Without changing that, the only way to solve your dilemma would be if valid JPEGs only allowed a limited number of markers before SOS. Considering that
I don't see a way forward there. |
Updated _open check to match _accept
Current version of function
JpegImagePlugin._accept()
uses a very primitive "magic number" to check that file may be a JPEG-file:In some cases this leads to reading an entire file to decide it's not a JPEG-file. (Actually, it's a more complicated problem. Maybe I'll do another pull-request.)
Other libraries for decode JPEG-files uses more complex value of "magic number" -
b'\xFF\xD8\xFF'
. This value is also given on Wikipedia (https://en.wikipedia.org/wiki/JPEG).