Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why you shouldn't use in in SQL #20

Open
ctapobep opened this issue Jun 27, 2023 · 0 comments
Open

Why you shouldn't use in in SQL #20

ctapobep opened this issue Jun 27, 2023 · 0 comments

Comments

@ctapobep
Copy link
Owner

ctapobep commented Jun 27, 2023

Short story

The operator in should NOT be used in parameterized queries like in (?) or in (:params). If you have a dynamic array, prefer using any(?).

Intro

Suppose you have a query:

select * from t where id in(?)

You want to pass a list of 4 IDs, but unfortunately you can't. In order to do this you need:

select * from t where id in(?, ?, ?, ?)

So for every element in the array you need to have a separate ? and pass a separate parameter.

You may be have seen this versions of SQL with named params:

select * from t where id in(:param_name) -- Java's Spring JDBC, Hibernate, JetBrains products
select * from t where id in(%(param_name)s) -- Python
select * from t where id in(@param_name) -- .Net

But it appears this isn't really a correct solution! Databases don't actually support this syntax. What happens under the hood is this: your DB Connector (the library that you use to access the DB) or some kind of high-level wrapper that you work with replaces those named parameters with ?. So you still end up with a query that has in(?, ?, ?, ?) - and that's what's sent to the database.

Named parameters are not the right solution!

Named parameters will actually do the job - if you have 100 elements in the array, you'll have 100 params in the query. It will work. But this is a bad solution:

  1. Ideologically, you're not trying to pass many parameters to the query. You want to pass a single parameter which is an array.
  2. When DB accepts a query, it has to parse it and then decide how exactly to execute it. E.g. it can choose to select from one table first, then from the other. Or to use an index or a sequential table scan. All this takes time, and DB can reuse the already built query plan if the query is repeated. But since the query will be different if the number of params is different, this will prevent caching of the query plan.
  3. If you're writing a SQL function, you can't pass variable number of params (at least not in PostgreSQL). You have to use an array. But your ?, ?, ?, ? is not an array - these would be 4 different params. So you can't pass this expression to the function.
  4. DBs like Oracle will not allow passing more than 1000 ?. Not a frequent use case, but still. Though to be honest, I didn't check the right solution (see below) with Oracle - maybe it won't work either :)

Passing arrays to SQL (PG-specific, Java examples)

A better solution would be to pass an actual array. In PG an array is represented by this kind of string: {element1,element2,etc}. Then we can pass this array to the query like this:

select * from t where id = any('{a,b,c,d}')
-- Which has an implicit conversion, what's actually executed is:
select * from t where id = any('{a,b,c,d}'::text[])

Of course you can make it a parameterized query with ?. But if you pass it as an actual string, you'll have to deal with escaping of , and {. Thankfully, DB Connectors have special methods for passing arrays that will deal with the escaping. In fact, they won't really send a string, but rather a binary representation of the array. It will contain a type of elements, their count, and the delimiter will be a zero byte \0. For details see org.postgresql.jdbc.ArrayEncoding#toBinaryRepresentation() in the JDBC Connector.

JDBC solution to pass an array of strings (text is the type):

PreparedStatement ps = connection.prepareStatement("select * from t where id = any(?)");
ps.setArray(1, connection.createArrayOf("text", myArray));

Hibernate doesn't seem to support it. You can use an external library or just write a class like StringArrayType (it has bugs), and pass it:

sessionFactory.getCurrentSession()
        .createNativeQuery("select * from t where id = any(:vals)")
        .setParameter("vals", myArray, StringArrayType.INSTANCE)
        .getResultList();

But if it's a native query, feel free to switch to Spring JDBC:

jdbcTemplate.query("select * from t where id = any(?)", new Object[]{myArray});

Misc.

Interestingly, even if you use an in(?,?,?), in the SQL Plan PG will show an array! So either it actually replaces it with any(...) or... maybe it's just a the way PG displays it in the Query Plan :)

@ctapobep ctapobep changed the title How to properly work with dynamic arrays in SQL in() How to properly work with dynamic arrays in SQL in(?) Jun 27, 2023
@ctapobep ctapobep changed the title How to properly work with dynamic arrays in SQL in(?) Why you shouldn't use in in SQL Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant