Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to parse tree: Error occurred during parsing #209

Closed
kaspernj opened this issue Apr 6, 2021 · 13 comments · Fixed by #238
Closed

Failed to parse tree: Error occurred during parsing #209

kaspernj opened this issue Apr 6, 2021 · 13 comments · Fixed by #238

Comments

@kaspernj
Copy link

kaspernj commented Apr 6, 2021

After upgrading I am suddenly starting to get these errors? There isn't much to go from in the error message.

The query is pretty long and generated using CanCan and Ransack:

SELECT "contacts"."contact_type", "contacts"."created_at", "contacts"."deleted_at", "contacts"."id", "contacts"."name", "contacts"."organization_id", "contacts"."person_id", "contacts"."state", "contacts"."student_council_id", "contacts"."updated_at" FROM "contacts" LEFT OUTER JOIN "organizations" ON "organizations"."id" = "contacts"."organization_id" LEFT OUTER JOIN "people" ON "people"."id" = "contacts"."person_id" WHERE (EXISTS (SELECT 1 FROM "contacts" AS "aliased_table" WHERE "aliased_table"."id" = "contacts"."id" AND EXISTS (SELECT 1 FROM "contacts" LEFT OUTER JOIN "contact_relationships" ON "contact_relationships"."relationship_type" = 'relative' AND "contact_relationships"."child_id" = "contacts"."id" LEFT OUTER JOIN "contacts" "parents_contact_relationships" ON "parents_contact_relationships"."id" = "contact_relationships"."parent_id" LEFT OUTER JOIN "photo_sessions" ON "photo_sessions"."customer_id" = "contacts"."id" LEFT OUTER JOIN "photo_session_groups" ON "photo_session_groups"."photo_session_id" = "photo_sessions"."id" LEFT OUTER JOIN "photo_session_group_contacts" ON "photo_session_group_contacts"."photo_session_group_id" = "photo_session_groups"."id" WHERE (("photo_sessions"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("photo_session_group_contacts"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "photo_session_group_contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594') OR (("photo_session_group_contacts"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "photo_session_group_contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("photo_session_group_contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("photo_session_groups"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR ((EXISTS (SELECT 1 FROM "election_post_contacts" WHERE (EXISTS (SELECT 1 FROM "election_post_contacts" AS "aliased_table" WHERE "aliased_table"."id" = "election_post_contacts"."id" AND EXISTS (SELECT 1 FROM "election_post_contacts" LEFT OUTER JOIN "election_posts" ON "election_posts"."id" = "election_post_contacts"."election_post_id" LEFT OUTER JOIN "elections" ON "elections"."id" = "election_posts"."election_id" LEFT OUTER JOIN "activities" ON "activities"."id" = "elections"."activity_id" LEFT OUTER JOIN "activity_participants" ON "activity_participants"."activity_id" = "activities"."id" WHERE (("election_post_contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("activity_participants"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "activity_participants"."state" IN ('approved', 'enrolled') AND "election_post_contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("election_post_contacts"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "election_post_contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("activity_participants"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "activity_participants"."state" IN ('approved', 'enrolled') AND "election_post_contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594') OR ("election_post_contacts"."contact_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "election_post_contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594'))))) AND ("election_post_contacts"."id" = "aliased_table"."id")))) AND (election_post_contacts.contact_id = contacts.id))) OR ((EXISTS (SELECT 1 FROM activities WHERE location_id = contacts.id)) OR (("contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("contacts"."organization_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("contacts"."person_id" = 'b5865666-66bb-47de-b500-b3fff5134fe2' AND "contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR (("contacts"."organization_id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594') OR (("contacts"."person_id" = 'b5865666-66bb-47de-b500-b3fff5134fe2' AND "contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594') OR (("parents_contact_relationships"."id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5') OR ("parents_contact_relationships"."id" = '{6E7B2DB8-513F-86D0-D9F2-70E6B254BC93}' AND "contacts"."tenant_id" = '095b0a5d-c735-4f09-a906-49dfbe179594')))))))))))))) AND ("contacts"."id" = "aliased_table"."id")))) AND (organizations.uni_c_active_code NOT IN (3,5,8) OR organizations.uni_c_active_code IS NULL OR organizations.id IS NULL) AND "contacts"."deleted_at" IS NULL AND (contacts.person_id IS NULL OR people.anonymous = false) AND "contacts"."tenant_id" = 'bebb31db-8295-4daf-87ac-09ae875066d5' GROUP BY "contacts"."id" LIMIT 30 OFFSET 0

The SQL doesn't throw an error when used with Contact.find_by_sql(query).

@lfittl
Copy link
Member

lfittl commented Apr 6, 2021

@kaspernj Thanks for the report!

This most likely is due to the parse tree depth exceeding 64 levels, and hitting this issue in the Protobuf library: protocolbuffers/protobuf#1493

Note the C level functions work fine (i.e. PgQuery.normalize and PgQuery.fingerprint), it's just the Ruby layer that can't deal with this right now.

What's your use case for PgQuery, could you use the functions that don't require the parse tree to be in Ruby for now?

@kaspernj
Copy link
Author

kaspernj commented Apr 6, 2021

What's your use case for PgQuery, could you use the functions that don't require the parse tree to be in Ruby for now?

I am making a very dynamic API using Ransack and CanCan. Postgres has a lot of limitations in regards to distinct, order and group which requires used columns to be selected or whatever. I am using pg_query to scan the query and fix that automatically based on the used Ransack parameters (which the user can freely set as he wishes) and CanCan (which restricts which records he can read, update, destroy or whatever).

This is the gem I wrote to automatically fix the Postgres queries in ActiveRecord:
https://github.com/kaspernj/active_record_query_fixer

@kaspernj
Copy link
Author

kaspernj commented Apr 6, 2021

Everything worked fine in the previous version :-)

@lfittl
Copy link
Member

lfittl commented Apr 6, 2021

@kaspernj Nice - thats a great use case!

And unfortunately that does mean you rely on the parse tree itself - so that'd require the patch in the protobuf gem to support customizing the nesting (the API is already there internally, it just needs to be made configurable from the Ruby side).

I also have plans to work on this, but a few other things on my plate right now. Two ways forward:

  1. Investigate the protobuf issue I linked and submit a PR to the protobuf library (we'd likely want to raise the depth to a few hundred messages for the pg_query use case)

  2. Rely on the JSON parse tree format, which is still supported by libpg_query (currently not available in the Ruby library, but it would be easy enough to add a function to make this available)

Generally I would recommend to focus (1), since it gets you a fully typed tree to work with, and is faster for parsing complex queries, but the downside is that it'd require diving a bit into the protobuf library.

@kaspernj
Copy link
Author

kaspernj commented Apr 7, 2021

@lfittl Would it be possible to support getting the result as a hash again until the issue is resolved? I would love to stay on the latest version :-)

@kaspernj
Copy link
Author

@lfittl Are there any updates on this? :-)

@lfittl
Copy link
Member

lfittl commented Apr 21, 2021

@kaspernj No time to work on this currently, though it will bubble up in priority at some point.

PRs welcome though - it should be relatively straightforward to add this to pg_query_ruby.c, e.g. named as PgQuery.parse_json.

Please also add a quick test in case you work on this :)

@Tassosb
Copy link
Contributor

Tassosb commented Jun 24, 2021

For what it's worth, I think this is a blocker for me too. I'm using pg_query to infer dependencies between arbitrarily long/complex SQL statements to figure out what order to execute them in. I'll need to use the parse tree in Ruby, and I expect depth will sometimes exceed 64 levels.

@Tassosb
Copy link
Contributor

Tassosb commented Nov 12, 2021

This issue is still blocking us from upgrading to version 2, which we'd really like to do (there have been so many nice improvements since then!). We've confirmed several queries we currently parse exceed the 64 depth limit. We'd really need to go back to a max depth of 1000 levels.

I spent a little bit of time trying to do this:

Investigate the protobuf issue I linked and submit a PR to the protobuf library (we'd likely want to raise the depth to a few hundred messages for the pg_query use case)

But I've never touched C before and don't have the motivation to learn/struggle through it right now. It sounds like the right idea.

Is there any way I could add to someone else's motivation to tackle this issue? I'm guessing it's a fairly simple change?

lfittl added a commit that referenced this issue Nov 13, 2021
Depends on protocolbuffers/protobuf#9218, until
that is merged this can only be used with a custom built protobuf gem.

Fixes #209
lfittl added a commit that referenced this issue Nov 13, 2021
Depends on protocolbuffers/protobuf#9218, until
that is merged this can only be used with a custom built protobuf gem.

Fixes #209
@lfittl
Copy link
Member

lfittl commented Nov 13, 2021

@Tassosb Thanks for bumping this up - I ended up investing the time to fix the root cause here, in the Ruby protobuf implementation: protocolbuffers/protobuf#9218

With that in place, the change in pg_query is rather small: #238

To test this change, you can do the following:

gem 'google-protobuf', github: 'lfittl/protobuf', branch: 'pr-9218-and-9219', glob: 'ruby/*.gemspec'
gem 'pg_query', github: 'pganalyze/pg_query', branch: 'specify-protobuf-max-recursion-depth-setting'

(note that you also need protocolbuffers/protobuf#9219 since otherwise you can't load google-protobuf via the bundler git source)

Could you (and @kaspernj) confirm whether this fix works for your use cases?

lfittl added a commit that referenced this issue Nov 13, 2021
Depends on protocolbuffers/protobuf#9218, until
that is merged this can only be used with a custom built protobuf gem.

Fixes #209
@Tassosb
Copy link
Contributor

Tassosb commented Nov 15, 2021

Amazing! Thank you @lfittl

To test this change, you can do the following:

gem 'google-protobuf', github: 'lfittl/protobuf', branch: 'pr-9218-and-9219', glob: 'ruby/*.gemspec'
gem 'pg_query', github: 'pganalyze/pg_query', branch: 'specify-protobuf-max-recursion-depth-setting'

After running bundle install with ^, I get an error when I try to load google-protobuf:

LoadError:
  cannot load such file -- google/protobuf/any_pb

I'm on ruby 2.7.4 with bundler 1.17.0. This issue seems maybe similar to the one addressed in PR 9219.

@mvgijssel
Copy link

HackerOne is running into this problem as well! Any updates on this?

@lfittl
Copy link
Member

lfittl commented Jan 29, 2022

@mvgijssel Unfortunately we're still waiting for protocolbuffers/protobuf#9218

If you have time available, you can help with that change (it might need a rebase due to the recent ubp changes in that codebase), or investigate the JSON output method as described above.

lfittl added a commit that referenced this issue Apr 13, 2022
Requires at least google-protobuf v3.20.0, if using older releases this
change has no effect. To avoid complex dependency updates, we do not
require google-protobuf to be current when you use this library, so you
may want to pin it to the newer version if you depend on this behaviour.

Fixes #209
lfittl added a commit that referenced this issue Apr 13, 2022
Requires at least google-protobuf v3.20.0, if using older releases this
change has no effect. To avoid complex dependency updates, we do not
require google-protobuf to be current when you use this library, so you
may want to pin it to the newer version if you depend on this behaviour.

Fixes #209
lfittl added a commit that referenced this issue Apr 13, 2022
Requires at least google-protobuf v3.20.0, if using older releases this
change has no effect. To avoid complex dependency updates, we do not
require google-protobuf to be current when you use this library, so you
may want to pin it to the newer version if you depend on this behaviour.

Fixes #209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants