Releases · apache/druid

24 Feb 00:32

xvrl

druid-0.6.172

14dfd1d

Druid 0.6.172 - Stable

Druid 0.6.172 fixes a few bugs to make the upgrade path towards Druid 0.7.0 seamless:

Fixes ingestion schema forward-compatibility with 0.7.0
Fixes dynamic worker configuration and worker affinity settings for the indexing service

Updating

If you are not already running 0.6.171, please see the 0.6.171 release notes for important notes on the upgrade procedure.

Assets 2

21 Jan 21:36

fjy

druid-0.6.171

c9cd47c

Druid 0.6.171 - Stable

Druid 0.6.171 is a bug fix stable mainly meant to enable a less painful update to Druid 0.7.0. Going forward, we will be backporting fixes to 0.6.x as required for the community and continuing to develop major features on 0.7.x.

Download

http://static.druid.io/artifacts/releases/druid-services-0.6.171-bin.tar.gz

Updating, Things to be Aware

Both this version and 0.7.0-RC1 provide much better out of the box support for PostgreSQL as a metadata store. In order to provide this functionality, we had to make some small changes to the way data is stored in metadata storage for MySQL setups.

Before updating to 0.6.171, please make sure that:
All Druid MySQL metadata tables are using UTF-8 encoding for all string/text columns,
The default character set for the Druid MySQL database has been changed to UTF-8.
Druid Coordinator and Overlord will refuse to start if the database default character set is not UTF-8.

To check column character encoding, use
SHOW CREATE TABLE <table>;.
If the default table encoding is not UTF-8 or if any columns are encoded using anything other than UTF-8 you will need to convert those tables.

To check the database default encoding, use
SHOW VARIABLES LIKE 'character_set_database';

If you are not already using UTF-8 encoding for your columns, you can convert your tables and change the database default using the following commands. Please keep in mind that table conversion can take a while (order of minutes) and segment loading / handoff will be interrupted for the duration of the upgrade.

Make a backup of your database before performing the upgrade!

ALTER TABLE druid_config    CONVERT TO CHARSET utf8;
ALTER TABLE druid_rules     CONVERT TO CHARSET utf8;
ALTER TABLE druid_segments  CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasks     CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklogs  CONVERT TO CHARSET utf8;
ALTER TABLE druid_tasklocks CONVERT TO CHARSET utf8;

-- replace druid with your Druid database name here 
ALTER DATABASE druid DEFAULT CHARACTER SET utf8;

Improvements

We introduced several query optimizations, mainly for topNs and HLLs
The overlord can now optionally choose what worker to send tasks to #904
Improved retry logic for realtime plumbers when handoffs fail during the final merge step

Bug Fixes

Fixed searching with same value in multiple columns
Fixed jetty defaults to increase number of threads and prevent lockups
Fixed query/wait metrics being emitted twice
Fixed default dimension exclusions for timestamp and aggregators in ingestion schema
Fixed missing origin in cache key for period granularities
Fixed default FilteredServerView to actually be filtered
Fixed files not cleaning up correctly in segment cache directory
Fixed results sometimes coming in out of order
Fixed bySegment TopN queries not returning at the broker level
Fixed a few bugs related to filtered aggregators
Fixed crazy amounts of logging when coordinator loses leadership
Updated jetty and spymemcached libraries for various fixes
Fixed cardinality aggregator caching schema problem
Fixed Coordinator and overlord '/status' page should not be redirected to the leader instances
Made postgres actually work out of the box in 0.6.x

Assets 2

04 Nov 19:29

fjy

druid-0.6.160

dcab299

Druid 0.6.160 - Stable

Improvements

Broker nodes now only start up after reading all information about segments in Zookeeper
Nested groupBy queries should now work with post aggregations.
Nested groupBy queries should now work with complex metrics.
The overlord in the indexing service can now assign tasks to workers based on strategies.
Local firehose can now find all files under a directory.
Timestamp and metrics are now automatically added to dimension exclusions.
Improved failure handling during real-time hand-offs.
Parallel downloading of segments. Multiple threads can now be used to download segments from deep storage.
Segments can be announced and queried as a node is initially loading up.
Native filtered aggregators for selector type filters
Custom Broker selection strategy for Router can now be written in JavaScript

Documentation

Example Hadoop Configuration now available
Best Practices and Recommendations now updated
Experimental Router node is now documented (druid.io/docs/latest/Router.html).
Local firehose is now documented (http://druid.io/docs/latest/Firehose.html).
Numerous improvements to FAQS, segment metadata docs improved, ingest firehose docs improved, full - cluster view explained. Thanks @pdeva!
Updates to Cassandra documentation. Thanks @lexicalunit!

Bug Fixes

Added a workaround for a jetty half open connection issue that appears when client connections terminate a long running query. The symptoms when this bug appears are that the cluster appears stuck and unresponsive. Another workaround for this issue is to simply use query context timeouts.
Fixed merging results from partitions with time gaps, which could cause out of order unmerged results (#796).
HDFS should now work for non-default filesystems. Thanks @flowbehappy!
Multiple spatial dimensions can now be ingested.
Fixed a bug with approximate histograms not working with groupBy queries.
Fixed last 8kb not working for non-s3 task logs.
Fixed dynamic configuration not working for replication throttling.
Fix search queries throwing exceptions if querying for non-existing dimensions
Fix ingest firehose breaking for non-present dimensions.
Select queries now work if you specify non-existing dimensions (#778)
groupBy cache now works with complex metrics
Fixed some serde problems that existed with RabbitMQ (#794)

Assets 2

21 Aug 20:02

fjy

druid-0.6.146

4f0b994

Druid 0.6.146 - Stable

New features

Reschema capabilities added. You can now ingest an existing Druid segment and change the name, dimensions, metrics, rollup, etc. of the segment. (More info: http://druid.io/docs/0.6.146/Ingestion-FAQ.html)
Approximate histograms and quantiles. We’ve open sourced a new module, druid-histogram that includes a new aggregator to build approximate distributions and can be used for quantiles. Depending on the accuracy of the desired results, this aggregator can be slower than the other Druid aggregators. This features is still somewhat experimental, but we would really love to work with the community to make it more production stable.
(More info: http://druid.io/docs/0.6.146/ApproxHisto.html)
Query timeout and cancellation. You can now specify an optional “timeout” key and a long value in the Druid query context to cancel queries that have been running for too long. You can also issue explicit query cancellation.
(More info: http://druid.io/docs/0.6.146/Querying.html)
groupBy and select query caching (disabled by default). Select and groupBy queries do not cache by default. This is to prevent large result sets from these queries overflowing the cache. However, if your workload generates groupBy results of reasonable size and you’d like to enable the cache for these queries, you can override the default values for druid.*.cache.unCacheable (http://druid.io/docs/0.6.146/Broker-Config.html).
Middle-managers can now be blacklisted. This allows for rolling updates of middleManagers. See new docs on rolling Druid updates. (http://druid.io/docs/0.6.146/Rolling-Updates.html)
S3 credentials can now be read from file. Thanks @metacret!
HDFS task logs for the indexing service now supported. Thanks @realfun!
Index tasks now support manual specification of shardSpecs and the ability to skip the determine partitions step.
TimeBoundary queries can now return just the max or min time.
http://druid.io/docs/0.6.146/TimeBoundaryQuery.html

Improvements

Nested groupBy queries now support post aggregators and all functionality of normal groupBy queries.
groupBy queries now support cardinality aggregators.
Port finding strategies for peons are smarter and can now reuse ports.
Existing complete sinks will now try to be handed off much sooner after real-time updates or restarts.
More flexible userData for indexing service autoscaling on EC2 that is no longer tied to our deployment environment.
The async logic in the Druid router was improved significantly.
Routers now support optional routing strategy overrides.
Druid 0.6.x deployments now work with Apache Whirr. We are going to create a way of deploying Druid with docker soon as well.
Cleaned up some redundant configs in the indexing service.
A whole bunch of query and caching unit tests were added.
Explicit job properties can now be added for Hadoop ingestion tasks.

Docs

There are now docs about how to do rolling Druid updates and restarts.
http://druid.io/docs/0.6.146/Rolling-Updates.html
New docs for configuring logging in Druid.
http://druid.io/docs/0.6.146/Logging.html
Kafka 8 docs now added. Thanks @r4j4h
http://druid.io/docs/0.6.146/Kafka-Eight.html
Added docs for inverted topNs
http://druid.io/docs/0.6.146/TopNMetricSpec.html#inverted-topnmetricspec
Updated Cassandra documentation. Thanks @lexicalunit
https://github.com/metamx/druid/pull/680

Misc

Curator version bumped to 2.6.0
Jetty version bumped to 9.2.2
Guava version bumped to 16.0.1
Logging for coordinator and historical nodes is now less verbose

Assets 2

09 Jun 20:25

xvrl

druid-0.6.121

0265ae9

Druid 0.6.121 - Stable

This is a small release with mainly stability and performance updates.

Updating

If updating from 0.6.105, no particular steps need to be taken.
If updating from an older release, see the notes for Druid 0.6.105

Release Notes

New features

new cardinality estimation aggregator: uses hyperUnique (the optimized HyperLogLog aggregator) to estimate the cardinality of a dimension
we have completely redone the ingestion schemas to consolidate batch and real-time ingestion. Everything is backwards compatible for the time being, and we hope to have new examples and tutorials that show how to use the new schema. It should hopefully simplify ingestion.
alphanumeric sorted topNs
a new union query (right now this only works if there are commonly named columns and metrics among your datasources)
allow config-based overriding of Hadoop job properties for batch ingestion
multi-threaded the coordinator cost balancing algorithm for faster load balancing decisions (the number of threads to use is dynamically configurable, it is 1 by default)
added a context parameter to force a 2-pass topN optimization algorithm (previously this was done a heuristic that was rarely used)
additional coordinator endpoints to return more info about cluster state

Improvements

improved real-time ingestion memory usage. Depending on the number of total segments in your cluster, much less memory can now be used for real-time ingestion.
faster batch ingestion when there are numerous individual raw data files. Thanks @deepujain.
more resilient rabbitMQ firehoses. Thanks @tucksaun.
JavaScript aggregator now supports multi-valued dimensions.
inverted topN now works with lexicographic sorting
lexicographic topN now supports dimension extraction functions

Bug Fixes

several fixes for hyperUnique aggregator where large errors in estimates could be reported in certain edge cases
fixed an edge case race condition in the coordinator where it could load/drop segments incorrectly when disconnecting/reconnecting from Zookeeper
fixed an edge condition with real-time ingestion where a bad sink can be created with delayed events
updated jetty to 9.1.5 for a fix of a half-open connection problem that occurs occasionally (it’s been extremely difficult for us to reproduce this -- but when it occurs nodes appear to have their jetty threads stalled while writing to a channel that is already closed)
fixed a bug where cached results would get combined in arbitrary order
fixed additional casing bugs
Druid now passes tests with Java 8

Documentation

new documentation about possible hardware for production nodes and configuration for them. Look for more improvements to configuration coming soon.
Fixed several broken links in docs. Thanks @jcollum.

Assets 2

15 Jul 18:52

xvrl

druid-0.6.120

033f81c

druid-0.6.120

[maven-release-plugin]  copy for tag druid-0.6.120

Assets 2

07 Jun 04:58

xvrl

druid-0.6.105

dd0004b

Druid 0.6.105 - Stable

Updating

When updating Druid with no downtime, we highly recommend updating historical nodes and real-time nodes before updating the broker layer. Changes in queries are typically compatible with an old broker version and a new historical node version, but not vice versa. Our recommended rolling update process is:

indexing service/real-time nodes
historical nodes (with a wait in between each node, the wait time corresponds to how long it takes for a historical node to restart and load all locally cached segments)
broker nodes
coordinator nodes

Release Notes

Historical nodes can now use and maintain a local cache (disabled by default). This cache can either be heap based or memcached. This allows historical nodes to merge results locally and reduces much of the memory pressure seen on brokers while pulling a large number of results from the cache. Populating the cache is also now done in an asynchronous manner.
Experimental router node. We’ve been experimenting with a fully asynchronous router node that can route queries to different brokers depending on the actual query. Currently, the router node makes decisions about which broker to talk to based on rules from the coordinator node. It is our goal to at some point merge the router and broker logic and move towards hierarchical brokers.
Post aggregation optimization. We’ve optimized calculations of post aggregations (previously post aggs were being calculated more than necessary). In some initial benchmarks, this can lead to 20%-30% improvement in queries that involve post aggregations.
Support hyperUnique in groupBys. We’ve fixed a reported problem where groupBys would report incorrect results when using complex metrics (especially hyperUnique).
Support dimension extraction functions in groupBy
Persist and persist-n-merge threads now no longer block each other during real-time ingestion. We added a parameter for throttling real-time ingestion a few months ago, and what we’ve seen is that very high ingestion rates that lead to a high number of intermediate persists can be blocked while waiting for a hand-off operation to complete. This behavior has now been improved. You are also now able to set maxPendingPersists in the plumber.
hyperUnique performance optimizations: ~30-50% faster aggregations

Miscellaneous other things

Fix integer overflow in hash based partitions
Support for arbitrary JSON objects in query context
Request logs now include query timing statistics
Hadoop 2.3 support by default
Update to Jetty 9
Do not require valid database connections for testing
Gracefully handle NaN / Infinity returned by compute nodes
better error reporting for cases where the ChainedExecutionQueryRunner throws NPEs

Extensions:

HDFS Storage should now work better with Cloudera CDH4
S3 Storage: object ACLs now consistently default to "bucket owner full control"

Assets 2

18 Jun 20:37

fjy

druid-0.6.73

6fbc3b1

Druid 0.6.73 - Stable

We are pleased to announce a new Druid stable, version 0.6.73. New features include:

A production tested dimension cardinality estimation module

We recently open sourced our HyperLogLog module described in bit.ly/1fIEpjM and //bit.ly/1ebLnNI . Documentation has been added on how to use this module as an aggregator and as part of post aggregators.

Hash-based partitioning

We recently introduced a new sharding format for batch indexing. We use the HyperLogLog module to estimate the size of a data set and create partitions based on this size. In our tests, partitioning via this hash based method is both faster and leads to more evenly partitioned segments.

Cross-tier replication

We can now replicate segments across different tiers. This means that you can create a “hot” tier that loads a single copy of the data on more powerful hardware and a “cold” tier that loads another copy of the data on less powerful hardware. This can lead to significant reductions in infrastructure costs.

Nested GroupBy Queries

Thanks to an awesome contribution from Yuval Oren et. al, we can do multi-level aggregation with groupBys. More info here: https://groups.google.com/forum/#!topic/druid-development/8oL28iuC4Gw

GroupBy memory improvements

We’ve made improvements as to how multi-threaded groupBy queries utilize memory. This should help reduce memory pressure on nodes with concurrent, expensive groupBy queries.

Real-time ingestion stability improvements

We’ve seen some stability issues with real-time ingestion with a high number of concurrent persists and have added smarter throttling to handle this type of workload.

Additional features

multi-data center distribution (experimental)
request tracing
restore tasks (to restore archived segments)
memcached stability improvements
indexing service stability improvements
smarter autoscaling in the indexing service
numerous bug fixes
new documentation for production configurations

Things on our plate

Reducing CPU usage on the broker nodes when interacting with the cache (we are seeing query bottlenecks when merging too many results from memcached)
Having historical nodes populate memcached (so bySegment results are no longer returned and historical nodes can do their own local merging)
Consolidating batch and real-time ingestion schemas so we can move towards a simpler data ingestion model
Scaling groupBys with off-heap result merging
Improving real-time ingestion stability and performance by moving to more off-heap data structures
Autoscaling and sharding the real-time ingestion pipeline
Evaluating append only style updates for streaming data (https://github.com/metamx/druid/issues/418)

Assets 2

18 Jun 20:40

fjy

druid-0.6.52

6540b2a

Druid 0.6.52 - Stable

druid-0.6.52

[maven-release-plugin]  copy for tag druid-0.6.52

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating

Download

Updating, Things to be Aware

Improvements

Bug Fixes

Improvements

Documentation

Bug Fixes

New features

Improvements

Docs

Misc

Updating

Release Notes

New features

Improvements

Bug Fixes

Documentation

Updating

Release Notes

Miscellaneous other things

Extensions:

Releases: apache/druid

Druid 0.6.172 - Stable

Updating

Druid 0.6.171 - Stable

Download

Updating, Things to be Aware

Improvements

Bug Fixes

Druid 0.6.160 - Stable

Improvements

Documentation

Bug Fixes

Druid 0.6.146 - Stable

New features

Improvements

Docs

Misc

Druid 0.6.121 - Stable

Updating

Release Notes

New features

Improvements

Bug Fixes

Documentation

druid-0.6.120

Druid 0.6.105 - Stable

Updating

Release Notes

Miscellaneous other things

Extensions:

Druid 0.6.73 - Stable

Druid 0.6.52 - Stable