Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Storage Plugin OversizedAllocationException when field present in page 1 is absent in page 2 #2896

Open
hyperbolix opened this issue Mar 26, 2024 · 2 comments
Assignees
Labels

Comments

@hyperbolix
Copy link

hyperbolix commented Mar 26, 2024

Describe the bug
When Apache Drill is configured with the HTTP storage plugin, and INDEX pagination mode is enabled, and the HTTP service configured with that plugin includes a field in all records from page 1, but that field is absent in page 2, a query that spans into page 2 results in an OversizedAllocationException.

To Reproduce
Steps to reproduce the behavior:

  1. Start Drill.
  2. Start the Python3 HTTP server that demonstrates the issue:
    https://gist.github.com/hyperbolix/24b1696dbb03c6032fb7af1a06b75145
  3. Enable the http storage plugin and configure it using this configuration: https://gist.github.com/hyperbolix/7df108c8169cf54bcc72fdc499be1dce
  4. Open the Drill query interface and issue this query:
    select * from http.a_bug_test where total_results_val=300 and page_size_val=10 limit 15
  5. See OversizedAllocationException error. Here is an example error:
    https://gist.github.com/hyperbolix/5e2d1e4e10eb92d5a6fa9604acd5cac8
  6. Modify the Python3 HTTP server, commenting out line 46 and fixing indentation on line 47.
  7. Restart the Python3 HTTP server and issue the query again from step 4.
  8. Observe that now the error does not occur and result span into page 2.

Expected behavior
Drill is supposed to be resilient to flexible schemas with certain limitations. It should be reasonable for a field to be absent on some pages and present on others, or documentation should indicate the contrary. This would result in the response including the data from page 2 and not producing an exception. If this will not be supported the exception should indicate the issue with the result schema, not indicate that a buffer cannot be expanded.

Error detail, log output or screenshots
https://gist.github.com/hyperbolix/5e2d1e4e10eb92d5a6fa9604acd5cac8
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: OversizedAllocationException: Unable to expand the buffer. Max allowed buffer size is reached.

Drill version
Observed in 1.21.1 and also in the latest commit as of 2024-03-26: 749772c

@hyperbolix hyperbolix added the bug label Mar 26, 2024
@hyperbolix
Copy link
Author

Additional notes: Omitting the limit 15 clause results in the query succeeding (reading all pages). Adding an order by foo_id clause causes the query to fail in the same way as it failed with the limit 15 clause.

@cgivre cgivre self-assigned this Mar 27, 2024
@cgivre
Copy link
Contributor

cgivre commented Mar 27, 2024

Ok... this is really weird. This query only seems to fail if the limit is between 13 and 19. It works if the limit is 12 or 20-22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants