-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no way to set msgpack max_bin_len limits use of cache to small files #200
Comments
cdent
added a commit
to cdent/etcd-compute
that referenced
this issue
Jan 21, 2019
The cachecontrol library can't cache large images [1], which we work around here by using our own serializer to allow msgpage to load big files. It's quite likely this caching is the wrong way to go, and missing some important details, but it is a useful way to speed up the experimentation. [1] psf/cachecontrol#200
It appears that this got fixed somehow. bob $ pip3 freeze | egrep -i 'requests|msgpack|cache'
CacheControl==0.12.6
msgpack==1.0.2
requests==2.25.1
alice $ (
printf 'HTTP/1.0 200 OK\n'
printf 'Date: '; LC_ALL=C date -u '+%a, %d %b %Y %X %Z'
printf 'Content-Length: 500000000\n'
printf 'Cache-Control: max-age=6000\n\n'
yes | dd iflag=count_bytes count=500MB
) | nc -l 8000
bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000
bob $ python3 -c '
import requests
import cachecontrol.caches
s = requests.session()
c = cachecontrol.caches.FileCache("./cache")
a = cachecontrol.CacheControlAdapter(c)
s.mount("http://", a)
print(len(s.get("http://localhost:8000/foo.txt").content))
'
500000000 The second request is definitely served from cache because nc stops listening after the first client disconnects. 500000000 is enough to exceed the default max_bin_len: bob $ MSGPACK_PUREPYTHON=1 python -c '
import msgpack, sys
with open(sys.argv[1], "rb") as f:
f.read(5)
u = msgpack.Unpacker(f)
u.unpack()
' ./cache/5/c/a/8/b/5ca8b7d8184924c60c5c454a874bf5ed7b4741d0660cb7d295185d63
Traceback (most recent call last):
File "<string>", line 6, in <module>
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 723, in unpack
ret = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 671, in _unpack
ret[key] = self._unpack(EX_CONSTRUCT)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 625, in _unpack
typ, n, obj = self._read_header(execute)
File "/path/to/python3.9/site-packages/msgpack/fallback.py", line 467, in _read_header
raise ValueError("%s exceeds max_bin_len(%s)" % (n, self._max_bin_len))
ValueError: 500000000 exceeds max_bin_len(104857600) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When trying to use cachecontrol with very large files (disk images in the case I'm considering), there's no easy way to pass a max_bin_len to msgpack.loads to say "yeah, I really do want to be able to load huge files".
cachecontrol will write the huge files, but then when it comes round to read them, msgpack will produce a ValueError and cachecontrol will return None to the deserialization routines.
It appears that the way to hack around it would be to subclass Serializer and replace
loads_v4
to give some args tomsgpack.loads
.Is there a better way? Is this something that you'd be interested in seeing as a kwarg passed down from
CacheControl
?The text was updated successfully, but these errors were encountered: