"hive"."information_schema"."columns" with 'like' hangs #487

rajitsaha · 2017-01-18T08:36:22Z

We, at LendingClub, using Presto "0.157.1.t.0.5-1.x86_64" and we access Presto with JDBC in SQLWorkBench. while we are trying to list Columns in any database in "Database Explorer" , most of the time it hangs.

We saw in Discovery URI the query its trying is

SELECT * FROM "hive"."information_schema"."columns" WHERE table_schema like '<schema_name>' AND table_name LIKE '<table_name>' AND column_name LIKE '%'

In this scenario, most of the time this hangs .

We also reproduced the issue by smaller query

SELECT * FROM "hive"."information_schema"."columns" WHERE table_schema like '<schema_name>' hangs

But

SELECT * FROM "hive"."information_schema"."columns" WHERE table_schema = '<schema_name>'

send result immediately

The text was updated successfully, but these errors were encountered:

maciejgrzybek · 2017-01-18T15:42:21Z

How many tables you have in your Hive setup?
Do you have any non-ascii chars in table names maybe?
Can you provide a Hive setup to reproduce the problem? E.g. what tables should I create in Hive to reproduce it?

My example setup works fine (no delay):

presto> SELECT * FROM "hive"."information_schema"."columns" WHERE table_schema like 'default';
 table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | comment
---------------+--------------+------------+-------------+------------------+----------------+-------------+-----------+---------
 hive          | default      | foo        | a           |                1 | NULL           | YES         | bigint    | NULL
(1 row)

Also, do you use JDBC drivers distributed along with TD release or open source version?

rajitsaha · 2017-01-18T18:57:56Z

Our HiveMetaStore is pretty huge

Number of Schemas ~ 1000
Number of Tables 200K
Number of Columns ~10M

We noticed, whenever we use
SELECT * FROM "hive"."information_schema"."columns" WHERE table_schema like '<schema_name>'

Presto send a call for each combination of
cmd=get_table : db= tbl=

to Hive Metastore

So there are too many calls to Hive Metastore and either it hangs or timeout or sometime we get error like

[Teradata]Presto Query failed: outputFormat should not be accessed from a null StorageFormat. [SQL State=HY000, DB Errorcode=100050]

we are using Teradata JDBC

maciejgrzybek · 2017-01-18T19:34:06Z

What's your caching policy (hive.metastore-cache-ttl in catalog/hive.properties)?
This could help with reducing the load on subsequent queries, however with your setup of 200k tables and 1k schemas query like ... where table_shema like ... will have to retrieve all possible schema x table combinations from metastore. The way to solve that is to have more specific filtering predicate, i.e. having some of the predicates of equality type instead of like.
E.g. where table_schema like <schema_name> and table_name = <table_name> would reduce the time of query significantly because it would need to scan only 1000 entries in metastore instead of 200 million.
Is the query you presented generated by SQLWorkBench itself?

rajitsaha · 2017-01-18T19:53:27Z

Thanks for your reply.

We have not set hive.metastore-cache-ttl , its default. What would be the appropriate value for this
Do we also need to change hive.metastore-timeout?

For second part - Actually we found that where table_schema like <schema_name> and table_name = <table_name> would be better choice , but the problem is

Presto JDBC is sending like query while exploring Columns in SQLWorkbench in "Database Explorer"

maciejgrzybek · 2017-01-18T20:18:36Z

Default value for metastore-cache-ttl is 1 hour which should be sufficiently long for you.
The general idea behind this property is to avoid subsequent queries to the metastore. Instead, the schema is resolved from cache. There's an obvious speed up because you don't have to query the same metadata twice within the same cache-ttl time frame. However, you may see outdated schema if it changed after last cache refresh but before cache-ttl time elapsed.
E.g. with cache-ttl = 1h:
At time t0 = 0 you query metadata, it takes long time to retrieve but it's stored for future use.
At time t1 = 10 min you query the same metadata again, then it's fast because it's retrieved from cache
At time t2 = 15 min you modify schema (e.g. add/remove column in some table)
At time t3 = 20 min (still before cache-ttl time since last cache retrieval) you query metadata again and see the same results as in t0 and t1 (without the change from t2 reflected).
At time t4 = 60 min query will refresh the cache (takes longer than yet before) and the change from t2 starts to be visible.

metastore-timeout should be increased only when the default (10s) time is not sufficient for restoring metadata from Hive Metastore. This could be your case. However, you want to lower the time spend on retrieving metadata, not only suppress errors caused by its longer than usual operation time.

We are looking into the JDBC drivers to make them perform better in your use case. However, the earliest possible patch would be in the next release (after 157t).

rajitsaha · 2017-01-18T22:04:56Z

Thanks for your help. Any suggestion how we can resolve this issue for now .
All our customers are having very bad experience as SQLWorkbench is hung.

rajitsaha · 2017-01-19T20:38:53Z

Seems like queries with "like" schema name is passed as Empty or "*" , so it is processing all the tables from all the schema

rajitsaha · 2017-02-01T23:12:57Z

Hi @rschlussel any pointer to solve this issue , Thanks

mattsfuller · 2017-02-02T02:36:53Z

@rajitsaha We believe this is a driver issue. We are working with the vendor to resolve the issue. We will update this issue once we have more information.

rajitsaha · 2017-02-02T20:36:00Z

Thanks

maciejgrzybek assigned rschlussel-zz Jan 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"hive"."information_schema"."columns" with 'like' hangs #487

"hive"."information_schema"."columns" with 'like' hangs #487

rajitsaha commented Jan 18, 2017 •

edited

maciejgrzybek commented Jan 18, 2017 •

edited

rajitsaha commented Jan 18, 2017

maciejgrzybek commented Jan 18, 2017

rajitsaha commented Jan 18, 2017

maciejgrzybek commented Jan 18, 2017

rajitsaha commented Jan 18, 2017

rajitsaha commented Jan 19, 2017

rajitsaha commented Feb 1, 2017

mattsfuller commented Feb 2, 2017

rajitsaha commented Feb 2, 2017

"hive"."information_schema"."columns" with 'like' hangs #487

"hive"."information_schema"."columns" with 'like' hangs #487

Comments

rajitsaha commented Jan 18, 2017 • edited

maciejgrzybek commented Jan 18, 2017 • edited

rajitsaha commented Jan 18, 2017

maciejgrzybek commented Jan 18, 2017

rajitsaha commented Jan 18, 2017

maciejgrzybek commented Jan 18, 2017

rajitsaha commented Jan 18, 2017

rajitsaha commented Jan 19, 2017

rajitsaha commented Feb 1, 2017

mattsfuller commented Feb 2, 2017

rajitsaha commented Feb 2, 2017

rajitsaha commented Jan 18, 2017 •

edited

maciejgrzybek commented Jan 18, 2017 •

edited