Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8259: Supports advanced HBase persistence storage options #2596

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

luocooong
Copy link
Member

@luocooong luocooong commented Jul 13, 2022

DRILL-8259: Supports advanced HBase persistence storage options

Description

Maximize performance with HBase as persistent storage.

Documentation

Example in drill-override.conf

sys.store.provider: {
  class: "org.apache.drill.exec.store.hbase.config.HBasePStoreProvider",
  hbase: {
    table : "drill_store",
    config: {
      "hbase.zookeeper.quorum": "zk_host3,zk_host2,zk_host1",
      "hbase.zookeeper.property.clientPort": "2181",
      "zookeeper.znode.parent": "/hbase-test"
    },
    table_config : {
      "durability": "ASYNC_WAL",
      "compaction_enabled": false,
      "split_enabled": false,
      "max_filesize": 10737418240,
      "memstore_flushsize": 536870912
    },
    column_config : {
      "versions": 1,
      "ttl": 2626560,
      "compression": "SNAPPY",
      "blockcache": true,
      "blocksize": 131072,
      "data_block_encoding": "FAST_DIFF",
      "in_memory": true,
      "dfs_replication": 3
    }
  }
}

Configuration requirements

Key Type Value Example Reference
durability String ASYNC_WAL / SYNC_WAL / SKIP_WAL Durability
compaction_enabled Boolean false COMPACTION_ENABLED
split_enabled Boolean false SPLIT_ENABLED
max_filesize Number 10737418240 MAX_FILESIZE
memstore_flushsize Number 536870912 MEMSTORE_FLUSHSIZE
versions Number 1 MAX_VERSIONS
ttl Number 2626560 TTL
compression String SNAPPY / LZ4 Compression$Algorithm
blockcache Boolean true BLOCKCACHE
blocksize Number 131072 BLOCKSIZE
data_block_encoding String FAST_DIFF / PREFIX DataBlockEncoding
in_memory Boolean true IN_MEMORY
dfs_replication Number 3 DFS_REPLICATION

Testing

Added the TestHBaseTableProvider#testStoreTableAttributes()

@luocooong luocooong added updates doc-impacting PRs that affect the documentation performance PRs that Improve Performance labels Jul 13, 2022
@luocooong luocooong self-assigned this Jul 13, 2022
@cgivre
Copy link
Contributor

cgivre commented Jul 22, 2022

@luocooong Thanks for submitting this. I was wondering, is there a reason why we are storing these variables in drill-override.conf instead of the configuration for the storage plugin? IMHO, it is better to put it in the config so that you don't have to restart Drill any time you make a config change

@luocooong
Copy link
Member Author

@cgivre Hi, Thank you for the questions. Actually, Drill PStore' variables are split from the storage configurations, because the goal is to define the initial variables before the Drill startup. And then, it has a different lifecycle from the storage configuration, so it is not recommended to be placed in the storage plugin.

@Z0ltrix
Copy link
Contributor

Z0ltrix commented Jul 23, 2022

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

@luocooong
Copy link
Member Author

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

As you know, HBase is a nightmare for operational services due to the complexity of the settings. The actual value in the above example is not a recommended value, no unique value is appropriate for every case, but is simply the type of value that this parameter has to fill, is "true/false", not "0/1".

And, would you mind helping me append this updated document to the drill-site?

@Z0ltrix
Copy link
Contributor

Z0ltrix commented Jul 23, 2022

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

As you know, HBase is a nightmare for operational services due to the complexity of the settings. The actual value in the above example is not a recommended value, no unique value is appropriate for every case, but is simply the type of value that this parameter has to fill, is "true/false", not "0/1".

And, would you mind helping me append this updated document to the drill-site?

Of course :)

@luocooong
Copy link
Member Author

@cgivre @Z0ltrix I added two new options. If namespace is used, the namespace:table semantics are applied. And we can also use the table configuration only. The family is also an optional.

@cgivre
Copy link
Contributor

cgivre commented Aug 12, 2022

@Z0ltrix Would you mind doing a formal review on this PR? @luocooong asked me but I don't really have enough experience with HBase to comment intelligently on this. If you're already happy with this, all you have to do is leave a +1.

@Z0ltrix
Copy link
Contributor

Z0ltrix commented Aug 31, 2022

@Z0ltrix Would you mind doing a formal review on this PR? @luocooong asked me but I don't really have enough experience with HBase to comment intelligently on this. If you're already happy with this, all you have to do is leave a +1.

sorry for the late response, i would love to do the review :)

} else {
columnConfig = Maps.newHashMap();
}
hbaseConf = HBaseConfiguration.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you know, HBase is a nightmare for operational services due to the complexity of the settings. The actual value in the above example is not a recommended value, no unique value is appropriate for every case, but is simply the type of value that this parameter has to fill, is "true/false", not "0/1".

hi @luocooong im still worried about the defaults, escpecially when drill creates the table on his own...

am i correcth that you dont set any defaults except SYS_STORE_PROVIDER_HBASE_TABLE, SYS_STORE_PROVIDER_HBASE_NAMESPACE and SYS_STORE_PROVIDER_HBASE_FAMILY?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-impacting PRs that affect the documentation performance PRs that Improve Performance updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants