Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading image from bytes #4097

Closed
kdschlosser opened this issue Sep 28, 2019 · 15 comments · Fixed by #4302
Closed

Loading image from bytes #4097

kdschlosser opened this issue Sep 28, 2019 · 15 comments · Fixed by #4302

Comments

@kdschlosser
Copy link

What did you do?

Convert string or bytes to PIL.Image

What did you expect to happen?

have the PIL.Image instance returned

What actually happened?

Got a Traceback.

What are your OS, Python and Pillow versions?

  • OS: Windows 7 x64
  • Python: 2.7 and also 3.7
  • Pillow: 6.1.0

OK so this is the skinny. when running python 2.7 everything works as expected.
when I run the same code using Python 3.7 I get the following Traceback

 File "C:\Program Files\Python37\lib\site-packages\PIL\Image.py", line 2822, in open
    raise IOError("cannot identify image file %r" % (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x0000000003465F68>

Here is the test code to use.

import sys
from PIL import Image
from io import BytesIO

# PNG data
LEFT_THUMB = (
    '\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52\x00\x00'
    '\x00\x13\x00\x00\x00\x0B\x08\x06\x00\x00\x00\x9D\xD5\xB6\x3A\x00\x00\x01'
    '\x2E\x49\x44\x41\x54\x78\x9C\x95\xD2\x31\x6B\xC2\x40\x00\x05\xE0\x77\x10'
    '\x42\x09\x34\xD0\x29\x21\x82\xC9\x9C\x2E\x72\x4B\x87\x40\x50\xB9\xBF\x5B'
    '\x28\x35\xA1\xA4\x94\x76\x68\x1C\x1C\x74\xCD\x9A\xE8\x20\x0A\x12\xA5\x5A'
    '\xE4\x72\xC9\x75\x10\x6D\xDC\xCE\xF7\x03\x3E\xDE\x83\x47\xA4\x94\x68\x67'
    '\xB5\xD9\x4E\xBF\xBF\x3E\xE8\x78\x3C\x86\x6A\x3C\xCF\x43\x10\x04\x20\x6D'
    '\x6C\xB5\xD9\x4E\x93\xF8\x95\x5A\x96\x05\xC6\x98\x32\x56\x14\x05\x46\xA3'
    '\x11\xB4\x36\x14\xBD\x3C\xD3\x4E\xA7\x03\xC6\x18\x8E\xC7\x23\x9A\xA6\x51'
    '\xC2\x5C\xD7\x45\x9E\xE7\x27\xEC\x0C\x39\x8E\x03\xC6\x18\x0E\x87\x83\x32'
    '\x04\x00\xE7\x75\x1A\xE7\x7C\xF2\xF9\xFE\x46\x6D\xDB\x06\x63\x0C\xFB\xFD'
    '\x1E\x75\x5D\x2B\x43\x57\x58\xF9\xF3\xAB\xAD\xD7\x6B\x98\xA6\x09\x21\x04'
    '\x76\xBB\x1D\x84\x10\x37\x61\x86\x61\x9C\x30\x00\x70\x1C\x07\x49\x92\x80'
    '\x10\x82\x7E\xBF\x8F\xE5\x72\x79\x13\x78\x69\xF6\x70\x6F\x88\x5E\xAF\x37'
    '\x2B\xCB\x92\xC6\x71\x0C\x42\x08\xC2\x30\xC4\x7C\x3E\x57\x06\x2F\x98\xAE'
    '\xEB\x4F\xAE\xEB\x4E\x06\x83\xC1\x4C\x4A\x49\xA3\x28\x82\x94\x12\x61\x18'
    '\x2A\x37\x5B\x2C\x16\xE8\x76\xBB\xFF\x3F\xE3\x9C\x4F\x8A\xA2\xD0\xD2\x34'
    '\xA5\x59\x96\xA1\xAA\x2A\x65\xCC\xB2\x2C\x0C\x87\xC3\xEB\xD3\x9E\xC1\xAA'
    '\xAA\xEE\x38\xE7\x4A\x90\xAE\xEB\x00\x00\xDF\xF7\x1F\xFF\x00\x09\x7C\xA7'
    '\x93\xB1\xFB\xFA\x11\x00\x00\x00\x00\x49\x45\x4E\x44\xAE\x42\x60\x82'
)

PY3 = sys.version_info[0] > 2

if PY3:
    stream = BytesIO(LEFT_THUMB.encode())
else:
    stream = BytesIO(LEFT_THUMB)

image = Image.open(stream).convert("RGBA")
stream.close()
image.show()

Now This is where it goes a little bit sideways. when running Python 2.7 if I use cStringIO.StringIO everything works as expected.. But if I use io.StringIO I get the traceback listed above. I think it has got something to do with the first 16 bytes but I am not 100% sure.

any help is appreciated.

@radarhere
Copy link
Member

radarhere commented Sep 28, 2019

How's this - I find that changing the string declaration to use 'b' (for bytes) makes it work in both Python 2 and 3.

import sys
from PIL import Image
from io import BytesIO

# PNG data
LEFT_THUMB = (
    b'\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52\x00\x00'
    b'\x00\x13\x00\x00\x00\x0B\x08\x06\x00\x00\x00\x9D\xD5\xB6\x3A\x00\x00\x01'
    b'\x2E\x49\x44\x41\x54\x78\x9C\x95\xD2\x31\x6B\xC2\x40\x00\x05\xE0\x77\x10'
    b'\x42\x09\x34\xD0\x29\x21\x82\xC9\x9C\x2E\x72\x4B\x87\x40\x50\xB9\xBF\x5B'
    b'\x28\x35\xA1\xA4\x94\x76\x68\x1C\x1C\x74\xCD\x9A\xE8\x20\x0A\x12\xA5\x5A'
    b'\xE4\x72\xC9\x75\x10\x6D\xDC\xCE\xF7\x03\x3E\xDE\x83\x47\xA4\x94\x68\x67'
    b'\xB5\xD9\x4E\xBF\xBF\x3E\xE8\x78\x3C\x86\x6A\x3C\xCF\x43\x10\x04\x20\x6D'
    b'\x6C\xB5\xD9\x4E\x93\xF8\x95\x5A\x96\x05\xC6\x98\x32\x56\x14\x05\x46\xA3'
    b'\x11\xB4\x36\x14\xBD\x3C\xD3\x4E\xA7\x03\xC6\x18\x8E\xC7\x23\x9A\xA6\x51'
    b'\xC2\x5C\xD7\x45\x9E\xE7\x27\xEC\x0C\x39\x8E\x03\xC6\x18\x0E\x87\x83\x32'
    b'\x04\x00\xE7\x75\x1A\xE7\x7C\xF2\xF9\xFE\x46\x6D\xDB\x06\x63\x0C\xFB\xFD'
    b'\x1E\x75\x5D\x2B\x43\x57\x58\xF9\xF3\xAB\xAD\xD7\x6B\x98\xA6\x09\x21\x04'
    b'\x76\xBB\x1D\x84\x10\x37\x61\x86\x61\x9C\x30\x00\x70\x1C\x07\x49\x92\x80'
    b'\x10\x82\x7E\xBF\x8F\xE5\x72\x79\x13\x78\x69\xF6\x70\x6F\x88\x5E\xAF\x37'
    b'\x2B\xCB\x92\xC6\x71\x0C\x42\x08\xC2\x30\xC4\x7C\x3E\x57\x06\x2F\x98\xAE'
    b'\xEB\x4F\xAE\xEB\x4E\x06\x83\xC1\x4C\x4A\x49\xA3\x28\x82\x94\x12\x61\x18'
    b'\x2A\x37\x5B\x2C\x16\xE8\x76\xBB\xFF\x3F\xE3\x9C\x4F\x8A\xA2\xD0\xD2\x34'
    b'\xA5\x59\x96\xA1\xAA\x2A\x65\xCC\xB2\x2C\x0C\x87\xC3\xEB\xD3\x9E\xC1\xAA'
    b'\xAA\xEE\x38\xE7\x4A\x90\xAE\xEB\x00\x00\xDF\xF7\x1F\xFF\x00\x09\x7C\xA7'
    b'\x93\xB1\xFB\xFA\x11\x00\x00\x00\x00\x49\x45\x4E\x44\xAE\x42\x60\x82'
)

stream = BytesIO(LEFT_THUMB)

image = Image.open(stream).convert("RGBA")
stream.close()
image.show()

@radarhere
Copy link
Member

Also be aware that Pillow is part of https://python3statement.org/, and that after this year, Pillow will no longer support Python 2.

@kdschlosser
Copy link
Author

OK so question here then. why does .encode() not work properly. this mechanism is supposed to convert the string to bytes isn't it??

@radarhere
Copy link
Member

radarhere commented Sep 28, 2019

It does convert it to bytes, but that is not the end of the story.

>>> x = '\x89'
>>> x.encode()
b'\xc2\x89'

I found an explanation at https://stackoverflow.com/questions/48367128/string-to-bytes-python-without-change-in-encoding

@radarhere
Copy link
Member

Closing, as it seems that the Pillow question has been answered.

@kdschlosser
Copy link
Author

kdschlosser commented Oct 2, 2019

it has been explained.. the reason why it is behaving the way that it is not good. If there is another way to be able to create an image directly from a string? without needing to do a conversion or passing a BytesIO instance? there are use cases where image data may not be hard coded and may come in as a string. And since there is no reliable way to convert the data from a string into bytes this could pose an issue. I know there is the frombytes function in the Image module and I attempted to use it but was not successful. I am not sure exactly what is supposed to go into the decoder_name parameter. the description of ":param decoder_name: What decoder to use." in the docstring is a tad lacking and could use some improvement, the function fails if i leave it as raw. The other thing is that you need to know the size of the image ahead of time in order to be able to use that function which is not always available.

@radarhere radarhere reopened this Oct 2, 2019
@radarhere
Copy link
Member

Okay. To take a step in the right direction, I found https://stackoverflow.com/questions/51754731/python-convert-strings-of-bytes-to-byte-array to work with your original data format.

import sys, struct
from PIL import Image
from io import BytesIO

# PNG data
LEFT_THUMB = (
    '\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52\x00\x00'
    '\x00\x13\x00\x00\x00\x0B\x08\x06\x00\x00\x00\x9D\xD5\xB6\x3A\x00\x00\x01'
    '\x2E\x49\x44\x41\x54\x78\x9C\x95\xD2\x31\x6B\xC2\x40\x00\x05\xE0\x77\x10'
    '\x42\x09\x34\xD0\x29\x21\x82\xC9\x9C\x2E\x72\x4B\x87\x40\x50\xB9\xBF\x5B'
    '\x28\x35\xA1\xA4\x94\x76\x68\x1C\x1C\x74\xCD\x9A\xE8\x20\x0A\x12\xA5\x5A'
    '\xE4\x72\xC9\x75\x10\x6D\xDC\xCE\xF7\x03\x3E\xDE\x83\x47\xA4\x94\x68\x67'
    '\xB5\xD9\x4E\xBF\xBF\x3E\xE8\x78\x3C\x86\x6A\x3C\xCF\x43\x10\x04\x20\x6D'
    '\x6C\xB5\xD9\x4E\x93\xF8\x95\x5A\x96\x05\xC6\x98\x32\x56\x14\x05\x46\xA3'
    '\x11\xB4\x36\x14\xBD\x3C\xD3\x4E\xA7\x03\xC6\x18\x8E\xC7\x23\x9A\xA6\x51'
    '\xC2\x5C\xD7\x45\x9E\xE7\x27\xEC\x0C\x39\x8E\x03\xC6\x18\x0E\x87\x83\x32'
    '\x04\x00\xE7\x75\x1A\xE7\x7C\xF2\xF9\xFE\x46\x6D\xDB\x06\x63\x0C\xFB\xFD'
    '\x1E\x75\x5D\x2B\x43\x57\x58\xF9\xF3\xAB\xAD\xD7\x6B\x98\xA6\x09\x21\x04'
    '\x76\xBB\x1D\x84\x10\x37\x61\x86\x61\x9C\x30\x00\x70\x1C\x07\x49\x92\x80'
    '\x10\x82\x7E\xBF\x8F\xE5\x72\x79\x13\x78\x69\xF6\x70\x6F\x88\x5E\xAF\x37'
    '\x2B\xCB\x92\xC6\x71\x0C\x42\x08\xC2\x30\xC4\x7C\x3E\x57\x06\x2F\x98\xAE'
    '\xEB\x4F\xAE\xEB\x4E\x06\x83\xC1\x4C\x4A\x49\xA3\x28\x82\x94\x12\x61\x18'
    '\x2A\x37\x5B\x2C\x16\xE8\x76\xBB\xFF\x3F\xE3\x9C\x4F\x8A\xA2\xD0\xD2\x34'
    '\xA5\x59\x96\xA1\xAA\x2A\x65\xCC\xB2\x2C\x0C\x87\xC3\xEB\xD3\x9E\xC1\xAA'
    '\xAA\xEE\x38\xE7\x4A\x90\xAE\xEB\x00\x00\xDF\xF7\x1F\xFF\x00\x09\x7C\xA7'
    '\x93\xB1\xFB\xFA\x11\x00\x00\x00\x00\x49\x45\x4E\x44\xAE\x42\x60\x82'
)

def convert_string_to_bytes(string):
    bytes = b''
    for i in string:
        bytes += struct.pack("B", ord(i))
    return bytes

stream = BytesIO(convert_string_to_bytes(LEFT_THUMB))

image = Image.open(stream).convert("RGBA")
stream.close()
image.save('out.png')

What is causing you to want to avoid using BytesIO?

@radarhere radarhere changed the title Cannot identify Image Loading image from bytes Oct 3, 2019
@radarhere
Copy link
Member

I find the fact that you have byte data expressed in this way unusual. If you could explain in detail how you have come by this, that could be interesting.

I believe I have mentioned a way to accomplish your goal using Python code. As far as integrating that solution into Pillow, because I find your situation unusual, I'm less keen on that. If this situation proves common, then maybe so after all.

@wiredfool
Copy link
Member

I’m with @radarhere on this. This isn’t going to generally work on python 3, as one of the fundamental changes was to clarify the difference between bytes and strings. Trying to do this is working against the language.

I’d rate this as close, wontfix for pillow, with a side of reconsider your requirements and objectives for your project.

@kdschlosser
Copy link
Author

all I am saying is that in my use case I have control over the data because it is hard coded into the file. But If someone reads data from some kind of an external source it could happen then as well. Python 3 has been around for a long while I am sure if it was some kind of a major issue you would have had a lot more people opening issues. I would be shocked if I am the first to bring this up.

I do know if I run this code on Python 2 it works without an issue.

import sys
from PIL import Image
from StringIO import StringIO

# PNG data
LEFT_THUMB = (
    '\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52\x00\x00'
    '\x00\x13\x00\x00\x00\x0B\x08\x06\x00\x00\x00\x9D\xD5\xB6\x3A\x00\x00\x01'
    '\x2E\x49\x44\x41\x54\x78\x9C\x95\xD2\x31\x6B\xC2\x40\x00\x05\xE0\x77\x10'
    '\x42\x09\x34\xD0\x29\x21\x82\xC9\x9C\x2E\x72\x4B\x87\x40\x50\xB9\xBF\x5B'
    '\x28\x35\xA1\xA4\x94\x76\x68\x1C\x1C\x74\xCD\x9A\xE8\x20\x0A\x12\xA5\x5A'
    '\xE4\x72\xC9\x75\x10\x6D\xDC\xCE\xF7\x03\x3E\xDE\x83\x47\xA4\x94\x68\x67'
    '\xB5\xD9\x4E\xBF\xBF\x3E\xE8\x78\x3C\x86\x6A\x3C\xCF\x43\x10\x04\x20\x6D'
    '\x6C\xB5\xD9\x4E\x93\xF8\x95\x5A\x96\x05\xC6\x98\x32\x56\x14\x05\x46\xA3'
    '\x11\xB4\x36\x14\xBD\x3C\xD3\x4E\xA7\x03\xC6\x18\x8E\xC7\x23\x9A\xA6\x51'
    '\xC2\x5C\xD7\x45\x9E\xE7\x27\xEC\x0C\x39\x8E\x03\xC6\x18\x0E\x87\x83\x32'
    '\x04\x00\xE7\x75\x1A\xE7\x7C\xF2\xF9\xFE\x46\x6D\xDB\x06\x63\x0C\xFB\xFD'
    '\x1E\x75\x5D\x2B\x43\x57\x58\xF9\xF3\xAB\xAD\xD7\x6B\x98\xA6\x09\x21\x04'
    '\x76\xBB\x1D\x84\x10\x37\x61\x86\x61\x9C\x30\x00\x70\x1C\x07\x49\x92\x80'
    '\x10\x82\x7E\xBF\x8F\xE5\x72\x79\x13\x78\x69\xF6\x70\x6F\x88\x5E\xAF\x37'
    '\x2B\xCB\x92\xC6\x71\x0C\x42\x08\xC2\x30\xC4\x7C\x3E\x57\x06\x2F\x98\xAE'
    '\xEB\x4F\xAE\xEB\x4E\x06\x83\xC1\x4C\x4A\x49\xA3\x28\x82\x94\x12\x61\x18'
    '\x2A\x37\x5B\x2C\x16\xE8\x76\xBB\xFF\x3F\xE3\x9C\x4F\x8A\xA2\xD0\xD2\x34'
    '\xA5\x59\x96\xA1\xAA\x2A\x65\xCC\xB2\x2C\x0C\x87\xC3\xEB\xD3\x9E\xC1\xAA'
    '\xAA\xEE\x38\xE7\x4A\x90\xAE\xEB\x00\x00\xDF\xF7\x1F\xFF\x00\x09\x7C\xA7'
    '\x93\xB1\xFB\xFA\x11\x00\x00\x00\x00\x49\x45\x4E\x44\xAE\x42\x60\x82'
)

stream = StringIO(LEFT_THUMB)

image = Image.open(stream).convert("RGBA")
stream.close()

But if I run this code in Python 3 only changing from StringIO import StringIO to from io import StringIO

Because there is no StringIO module in Python 3.

import sys
from PIL import Image
from io import StringIO

# PNG data
LEFT_THUMB = (
    '\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52\x00\x00'
    '\x00\x13\x00\x00\x00\x0B\x08\x06\x00\x00\x00\x9D\xD5\xB6\x3A\x00\x00\x01'
    '\x2E\x49\x44\x41\x54\x78\x9C\x95\xD2\x31\x6B\xC2\x40\x00\x05\xE0\x77\x10'
    '\x42\x09\x34\xD0\x29\x21\x82\xC9\x9C\x2E\x72\x4B\x87\x40\x50\xB9\xBF\x5B'
    '\x28\x35\xA1\xA4\x94\x76\x68\x1C\x1C\x74\xCD\x9A\xE8\x20\x0A\x12\xA5\x5A'
    '\xE4\x72\xC9\x75\x10\x6D\xDC\xCE\xF7\x03\x3E\xDE\x83\x47\xA4\x94\x68\x67'
    '\xB5\xD9\x4E\xBF\xBF\x3E\xE8\x78\x3C\x86\x6A\x3C\xCF\x43\x10\x04\x20\x6D'
    '\x6C\xB5\xD9\x4E\x93\xF8\x95\x5A\x96\x05\xC6\x98\x32\x56\x14\x05\x46\xA3'
    '\x11\xB4\x36\x14\xBD\x3C\xD3\x4E\xA7\x03\xC6\x18\x8E\xC7\x23\x9A\xA6\x51'
    '\xC2\x5C\xD7\x45\x9E\xE7\x27\xEC\x0C\x39\x8E\x03\xC6\x18\x0E\x87\x83\x32'
    '\x04\x00\xE7\x75\x1A\xE7\x7C\xF2\xF9\xFE\x46\x6D\xDB\x06\x63\x0C\xFB\xFD'
    '\x1E\x75\x5D\x2B\x43\x57\x58\xF9\xF3\xAB\xAD\xD7\x6B\x98\xA6\x09\x21\x04'
    '\x76\xBB\x1D\x84\x10\x37\x61\x86\x61\x9C\x30\x00\x70\x1C\x07\x49\x92\x80'
    '\x10\x82\x7E\xBF\x8F\xE5\x72\x79\x13\x78\x69\xF6\x70\x6F\x88\x5E\xAF\x37'
    '\x2B\xCB\x92\xC6\x71\x0C\x42\x08\xC2\x30\xC4\x7C\x3E\x57\x06\x2F\x98\xAE'
    '\xEB\x4F\xAE\xEB\x4E\x06\x83\xC1\x4C\x4A\x49\xA3\x28\x82\x94\x12\x61\x18'
    '\x2A\x37\x5B\x2C\x16\xE8\x76\xBB\xFF\x3F\xE3\x9C\x4F\x8A\xA2\xD0\xD2\x34'
    '\xA5\x59\x96\xA1\xAA\x2A\x65\xCC\xB2\x2C\x0C\x87\xC3\xEB\xD3\x9E\xC1\xAA'
    '\xAA\xEE\x38\xE7\x4A\x90\xAE\xEB\x00\x00\xDF\xF7\x1F\xFF\x00\x09\x7C\xA7'
    '\x93\xB1\xFB\xFA\x11\x00\x00\x00\x00\x49\x45\x4E\x44\xAE\x42\x60\x82'
)

stream = StringIO(LEFT_THUMB)

image = Image.open(stream).convert("RGBA")
stream.close()

I get this traceback.

Traceback (most recent call last):
  File "\scratches\scratch_29.py", line 31, in <module>
    image = Image.open(stream).convert("RGBA")
  File "\Python37\lib\site-packages\PIL\Image.py", line 2822, in open
    raise IOError("cannot identify image file %r" % (filename if filename else fp))
OSError: cannot identify image file <_io.StringIO object at 0x0000000001DE2F78>

which indicates there is an issue in Pillow. The error is generated from within Pillow. I am passing a string to a StringIO just like it should be done. I should not have to worry about monkey farting around with encodings and using struct to convert the thing. that means I would need to run 2 different code blocks for loading the thing. one for Python 2 and the other for Python 3. I do not know why PIL is unable to identify the stream.

@wiredfool
Copy link
Member

No, It's not an issue in Pillow. Pillow has never worked with io.StringIO, it works with io.BytesIO. The io module is a backport from python3 into python 2.7 to ease porting from 2 -> 3. Part of the pain with a python 3 port is disentangling where a string is a string and a string is really bytes.

Other people have had this issue before, and when it's pointed out to use BytesIO, they use it and it's all fine. (#1463 #495 )

Binary data in strings isn't supported in python3. Attempting to force the encode/decode machinery to do that is against the intent of the language, and we're not going to be doing that in Pillow.

@kdschlosser
Copy link
Author

ahh ok. so if a StringIO object is not supported why is there no error that states that when it is passed? instead it does work but only for python 2 and it would appear that either a TypeError or an OSError showing an error that leaves the user scratching their head when using python 3. So this issue has come up on at least 3 occasions now.. I am wondering at what point does someone make a change to check if the object is a StringIO instance and raises an error so issues do not have to get opened up for something so simple.

@radarhere
Copy link
Member

I've created PR #4285 to resolve this - instead of rejecting StringIO when passing it into Image.open though, I'm suggesting that we allow StringIO to specify a filename.

If you are concerned about clarity, if you were to try passing in bytes with StringIO, you would then receive a FileNotFoundError: [Errno 2] No such file or directory

@radarhere
Copy link
Member

Okay, #4302 is a PR to raise the error.

@CXX22

This comment has been minimized.

@python-pillow python-pillow locked as resolved and limited conversation to collaborators Sep 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants