Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pocket import fails with memory error during pdf parsing #7460

Open
R-Rudolf opened this issue May 3, 2024 · 0 comments
Open

Pocket import fails with memory error during pdf parsing #7460

R-Rudolf opened this issue May 3, 2024 · 0 comments

Comments

@R-Rudolf
Copy link

R-Rudolf commented May 3, 2024

Environment

  • Version: 2.6.9
  • Installation: Container execution (Host OS Fedora12, container runtime Podman 4.3.1)
  • PHP version:
    PHP 8.1.27 (cli) (built: Feb 21 2024 14:48:59) (NTS)
    Copyright (c) The PHP Group
    Zend Engine v4.1.27, Copyright (c) Zend Technologies
  • OS: (within the container) Alpine Linux v3.18.6
  • Database: SQLite
  • Parameters: Default, only domain name changed.

What steps will reproduce the bug?

Initiated Pocket import. After I click "Authorize" from the getpocket domain, it loads and after a while, the error message appears on the self-hosted domain:

500: Internal Server Error
Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 15816539 bytes)

Detailed error logs from within the container

$ tail var/logs/prod.log

[2024-05-03T13:09:35.847268+00:00] httplug.INFO: Sending request: GET http://arxiv.org/pdf/2305.16291 1.1 {"request":{"GuzzleHttp\\Psr7\\Request":[]},"uid":"6634e20fced934.16553197"} []
[2024-05-03T13:09:36.547080+00:00] httplug.INFO: Received response: 200 OK 1.1 {"milliseconds":700,"uid":"6634e20fced934.16553197"} []
[2024-05-03T13:09:36.664665+00:00] graby.INFO: Data fetched: array{"effective_url":"http://arxiv.org/pdf/2305.16291","body":"(only length for debug): 18830859","headers":{"connection":"keep-alive","content-length":"18830859","content-type":"application/pdf","etag":"\"sha256:4ad0e876edf36c97290bf0a5431b28771580f39d44ebfefa463f1315387d0be9\"","last-modified":"Fri, 20 Oct 2023 01:18:19 GMT","access-control-allow-origin":"*","cache-control":"max-age=86400","content-disposition":"inline; filename=\"2305.16291v2.pdf\"","x-cloud-trace-context":"f31e38a84b3f8a29400e49cfd015b2a5;o=1","server":"Google Frontend","via":"1.1 google, 1.1 google, 1.1 varnish, 1.1 varnish","accept-ranges":"bytes","age":"6300","date":"Fri, 03 May 2024 13:09:35 GMT","x-served-by":"cache-lga21930-LGA, cache-vie6336-VIE","x-cache":"HIT, HIT","x-timer":"S1714741776.881680,VS0,VE1"},"status":200} {"data":{"effective_url":"http://arxiv.org/pdf/2305.16291","body":"(only length for debug): 18830859","headers":{"connection":"keep-alive","content-length":"18830859","content-type":"application/pdf","etag":"\"sha256:4ad0e876edf36c97290bf0a5431b28771580f39d44ebfefa463f1315387d0be9\"","last-modified":"Fri, 20 Oct 2023 01:18:19 GMT","access-control-allow-origin":"*","cache-control":"max-age=86400","content-disposition":"inline; filename=\"2305.16291v2.pdf\"","x-cloud-trace-context":"f31e38a84b3f8a29400e49cfd015b2a5;o=1","server":"Google Frontend","via":"1.1 google, 1.1 google, 1.1 varnish, 1.1 varnish","accept-ranges":"bytes","age":"6300","date":"Fri, 03 May 2024 13:09:35 GMT","x-served-by":"cache-lga21930-LGA, cache-vie6336-VIE","x-cache":"HIT, HIT","x-timer":"S1714741776.881680,VS0,VE1"},"status":200}} []
[2024-05-03T13:09:36.940533+00:00] request.CRITICAL: Uncaught PHP Exception Symfony\Component\ErrorHandler\Error\OutOfMemoryError: "Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 15816539 bytes)" at /var/www/wallabag/vendor/smalot/pdfparser/src/Smalot/PdfParser/RawData/FilterHelper.php line 244 {"exception":"[object] (Symfony\\Component\\ErrorHandler\\Error\\OutOfMemoryError(code: 0): Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 15816539 bytes) at /var/www/wallabag/vendor/smalot/pdfparser/src/Smalot/PdfParser/RawData/FilterHelper.php:244)"} []

I double-checked and the container itself had enough memory (2GB), and it did not even use much, only around ~120 MB before it crashed.

I would expect, that if a single article fails to be imported, the import continues and maybe just lists the failures.
The best would be if the process just uses more memory to do the import without crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant