20 Apr 17:47

6a44073

druid-0.18.0

Apache Druid 0.18.0 contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 42 contributors. Check out the complete list of changes and everything tagged to the milestone.

# New Features

# Join support

Join is a key operation in data analytics. Prior to 0.18.0, Druid supported some join-related features, such as Lookups or semi-joins in SQL. However, the use cases for those features were pretty limited and, for other join use cases, users had to denormalize their datasources when they ingest data instead of joining them at query time, which could result in exploding data volume and long ingestion time.

Druid 0.18.0 supports real joins for the first time ever in its history. Druid supports INNER, LEFT, and CROSS joins for now. For native queries, the join datasource has been newly introduced to represent a join of two datasources. Currently, only the left-deep join is allowed. That means, only a table or another join datasource is allowed for the left datasource. For the right datasource, lookup, inline, or query datasources are allowed. Note that join of Druid datasources is not supported yet. There should be only one table datasource in the same join query.

Druid SQL also supports joins. Under the covers, SQL join queries are translated into one or several native queries that include join datasources. See Query translation for more details of SQL translation and best practices to write efficient queries.

When a join query is issued, the Broker first evaluates all datasources except for the base datasource which is the only table datasource in the query. The evaluation can include executing subqueries for query datasources. Once the Broker evaluates all non-base datasources, it replaces them with inline datasources and sends the rewritten query to data nodes (see the below "Query inlining in Brokers" section for more details). Data nodes use the hash join to process join queries. They build a hash table for each non-primary leaf datasource unless it already exists. Note that only lookup datasource currently has a pre-built hash table. See Query execution for more details about join query execution.

Joins can affect performance of your queries. In general, any queries including joins can be slower than equivalent queries against a denormalized datasource. The LOOKUP function could perform better than joins with lookup datasources. See Join performance for more details about join query performance and future plans for performance improvement.

#8728
#9545
#9111

# Query inlining in Brokers

Druid is now able to execute a nested query by inlining subqueries. Any type of subquery can be on top of any type of another, such as in the following example:

             topN
               |
       (join datasource)
         /          \
(table datasource)  groupBy

To execute this query, the Broker first evaluates the leaf groupBy subquery; it sends the subquery to data nodes and collects the result. The collected result is materialized in the Broker memory. Once the Broker collects all results for the groupBy query, it rewrites the topN query by replacing the leaf groupBy with an inline datasource which has the result of the groupBy query. Finally, the rewritten query is sent to data nodes to execute the topN query.

# Query laning and prioritization

When you run multiple queries of heterogenous workloads at a time, you may sometimes want to control the resource commitment for a query based on its priority. For example, you would want to limit the resources assigned to less important queries, so that important queries can be executed in time without being disrupted by less important ones.

Query laning allows you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the Broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane.

Automatic query prioritization determines the query priority based on the configured strategy. The threshold-based prioritization strategy has been added; it automatically lowers the priority of queries that cross any of a configurable set of thresholds, such as how far in the past the data is, how large of an interval a query covers, or the number of segments taking part in a query.

See Query prioritization and laning for more details.

#6993
#9407
#9493

New dimension in query metrics

Since a native query containing subqueries can be executed part-by-part, a new subQueryId has been introduced. Each subquery has different subQueryIds but same queryId. The subQueryId is available as a new dimension in query metrics.

New configuration

A new druid.server.http.maxSubqueryRows configuration controls the maximum number of rows materialized in the Broker memory.

Please see Query execution for more details.

#9533

# SQL grouping sets

GROUPING SETS is now supported, allowing you to combine multiple GROUP BY clauses into one GROUP BY clause. This GROUPING SETS clause is internally translated into the groupBy query with subtotalsSpec. The LIMIT clause is now applied after subtotalsSpec, rather than applied to each grouping set.

#9122

# SQL Dynamic parameters

Druid now supports dynamic parameters for SQL. To use dynamic parameters, replace any literal in the query with a question mark (?) character. These question marks represent the places where the parameters will be bound at execution time. See SQL dynamic parameters for more details.

#6974

# Important Changes

# `applyLimitPushDownToSegments` is disabled by default

applyLimitPushDownToSegments was added in 0.17.0 to push down limit evaluation to queryable nodes, limiting results during segment scan for groupBy v2. This can lead to performance degradation, as reported in #9689, if many segments are involved in query processing. This is because “limit push down to segment scan” initializes an aggregation buffer per segment, the overhead for which is not negligible. Enable this configuration only if your query involves a relatively small number of segments per historical or realtime task.

#9711

# Roaring bitmaps as default

Druid supports two bitmap types, i.e., Roaring and CONCISE. Since Roaring bitmaps provide a better out-of-box experience (faster query speed in general), the default bitmap type is now switched to Roaring bitmaps. See Segment compression for more details about bitmaps.

#9548

# Complex metrics behavior change at ingestion time when SQL-compatible null handling is disabled (default mode)

When SQL-compatible null handling is disabled, the behavior of complex metric aggregation at ingestion time has now changed to be consistent with that at query time. The complex metrics are aggregated to the default 0 values for nulls instead of skipping them during ingestion.

#9484

# Array expression syntax change

Druid expression now supports typed constructors for creating arrays. Arrays can be defined with an explicit type. For example, <LONG>[1, 2, null] creates an array of LONG type containing 1, 2, and null. Note that you can still create an array without an explicit type. For example, [1, 2, null] is still a valid syntax to create an equivalent array. In this case, Druid will infer the type of array from its elements. This new syntax applies to empty arrays as well. <STRING>[], <DOUBLE>[], and <LONG>[] will create an empty array of STRING, DOUBLE, and LONG type, respectively.

#9367

# Enabling pending segments cleanup by default

The pendingSegments table in the metadata store is used to create unique new segment IDs for appending tasks such as Kafka/Kinesis indexing tasks or batch tasks of appending mode. Automatic pending segments cleanup was introduced in 0.12.0, but has been disabled by default prior to 0.18....

Assets 2

01 Apr 18:58

jon-wei

druid-0.17.1

11df7c4

druid-0.17.1

Apache Druid 0.17.1 is a security bug fix release that addresses the following CVE for LDAP authentication:

[CVE-2020-1958]: Apache Druid LDAP injection vulnerability (https://lists.apache.org/thread.html/r9d437371793b410f8a8e18f556d52d4bb68e18c537962f6a97f4945e%40%3Cdev.druid.apache.org%3E)

Assets 2

27 Jan 01:14

jon-wei

druid-0.17.0

f37b984

druid-0.17.0

Apache Druid 0.17.0 contains over 250 new features, performance enhancements, bug fixes, and major documentation improvements from 52 contributors. Check out the complete list of changes and everything tagged to the milestone.

Highlights

Batch ingestion improvements

Druid 0.17.0 includes a significant update to the native batch ingestion system. This update adds the internal framework to support non-text binary formats, with initial support for ORC and Parquet. Additionally, native batch tasks can now read data from HDFS.

This rework changes how the ingestion source and data format are specified in the ingestion task. To use the new features, please refer to the documentation on InputSources and InputFormats.

Please see the following documentation for details:
https://druid.apache.org/docs/0.17.0/ingestion/data-formats.html#input-format
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#input-sources
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#partitionsspec

#8812

Single dimension range partitioning for parallel native batch ingestion

The parallel index task now supports the single_dim type partitions spec, which allows for range-based partitioning on a single dimension.

Please see https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html for details.

Compaction changes

Parallel index task split hints

The parallel indexing task now has a new configuration, splitHintSpec, in the tuningConfig to allow for operators to provide hints to control the amount of data that each first phase subtask reads. There is currently one split hint spec type, SegmentsSplitHintSpec, used for re-ingesting Druid segments.

Parallel auto-compaction

Auto-compaction can now use the parallel indexing task, allowing for greater compaction throughput.

To control the level of parallelism, the auto-compactiontuningConfig has new parameters, maxNumConcurrentSubTasks and splitHintSpec.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#compaction-dynamic-configuration for details.

#8570

Stateful auto-compaction

Auto-compaction now uses the partitionSpec to track changes made by previous compaction tasks, allowing the coordinator to reduce redundant compaction operations.

Please see #8489 for details.

If you have auto-compaction enabled, please see the information under "Stateful auto-compaction changes" in the "Upgrading to Druid 0.17.0" section before upgrading.

Parallel query merging on brokers

The Druid broker can now opportunistically merge query results in parallel using multiple threads.

Please see druid.processing.merge.useParallelMergePool in the Broker section of the configuration reference for details on how to configure this new feature.

Parallel merging is enabled by default (controlled by the druid.processing.merge.useParallelMergePool property), and most users should not have to change any of the advanced configuration properties described in the configuration reference.

Additionally, merge parallelism can be controlled on a per-query basis using the query context. Information about the new query context parameters can be found at https://druid.apache.org/docs/0.17.0/querying/query-context.html.

#8578

SQL-compatible null handling

In 0.17.0, we have added official documentation for Druid's SQL-compatible null handling mode.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#sql-compatible-null-handling and https://druid.apache.org/docs/0.17.0/design/segments.html#sql-compatible-null-handling for details.

Several bugs that existed in this previously undocumented mode have been fixed, particularly around null handling in numeric columns. We recommend that users begin to consider transitioning their clusters to this new mode after upgrading to 0.17.0.

The full list of null handling bugs fixed in 0.17.0 can be found at https://github.com/apache/druid/issues?utf8=%E2%9C%93&q=label%3A%22Area+-+Null+Handling%22+milestone%3A0.17.0+

LDAP extension

Druid now supports LDAP authentication. Authorization using LDAP groups is also supported by mapping LDAP groups to Druid roles.

LDAP authentication is handled by specifying an LDAP-type credentials validator.
Authorization using LDAP is handled by specifying an LDAP-type role provider, and defining LDAP group->Druid role mappings within Druid.

LDAP integration requires the druid-basic-security core extension. Please see https://druid.apache.org/docs/0.17.0/development/extensions-core/druid-basic-security.html for details.

As this is the first release with LDAP support, and there are a large variety of LDAP ecosystems, some LDAP use cases and features may not be supported yet. Please file an issue if you need enhancements to this new functionality.

#6972

Dropwizard emitter

A new Dropwizard metrics emitter has been added as a contrib extension.

The currently supported Dropwizard metrics types are counter, gauge, meter, timer and histogram. These metrics can be emitted using either a Console or JMX reporter.

Please see https://druid.apache.org/docs/0.17.0/design/extensions-contrib/dropwizard.html for details.

#7363

Self-discovery resource

A new pair of endpoints have been added to all Druid services that return information about whether the Druid service has received a confirmation that the service has been added to the cluster, from the central service discovery mechanism (currently ZooKeeper). These endpoints can be useful as health/ready checks.

The new endpoints are:

/status/selfDiscovered/status
/status/selfDiscovered

Please see the Druid API reference for details.

#6702
#9005

Supervisors system table

Task supervisors (e.g. Kafka or Kinesis supervisors) are now recorded in the system tables in a new sys.supervisors table.

Please see https://druid.apache.org/docs/0.17.0/querying/sql.html#supervisors-table for details.

#8547

Fast historical start with lazy loading

A new boolean configuration property for historicals, druid.segmentCache.lazyLoadOnStart, has been added.

This new property allows historicals to defer loading of a segment until the first time that segment is queried, which can significantly decrease historical startup times for clusters with a large number of segments.

Please see the configuration reference for details.

#6988

Historical segment cache distribution change

A new historical property, druid.segmentCache.locationSelectorStrategy, has been added.

If there are multiple segment storage locations specified in druid.segmentCache.locations, the new locationSelectorStrategy property allows the user to specify what strategy is used to fill the locations. Currently supported options are roundRobin and leastBytesUsed.

Please see the configuration reference for details.

#8038

New readiness endpoints

A new Broker endpoint has been added: /druid/broker/v1/readiness.

A new Historical endpoint has been added: /druid/historical/v1/readiness.

These endpoints are similar to the existing /druid/broker/v1/loadstatus and /druid/historical/v1/loadstatus endpoints.

They differ in that they do not require authentication/authorization checks, and instead of a JSON body they only return a 200 success or 503 HTTP response code.

#8841

Support task assignment based on MiddleManager categories

It is now possible to define a "category" name property for each MiddleManager. New worker select strategies that are category-aware have been added, allowing the user to control how tasks are assigned to MiddleManagers based on the configured categories.

Please see the documentation for druid.worker.category in the configuration reference, and the following links, for more details:
https://druid.apache.org/docs/0.17.0/configuration/index.htmlEqual-Distribution-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#Fill-Capacity-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#WorkerCategorySpec

#7066

Security vulnerability updates

A large number of dependencies have been updated to newer versions to address security vulnerabilities.

Please see the PRs below for details:

#8878
#8980

Upgrading to Druid 0.17.0

Select native query has been replaced

The deprecated Select native query type has been removed in 0.17.0.

If you have native queries that use Select, you need to modify them to use Scan instead. See the Scan query documentation (https://druid.apache.org/docs/0.17.0/querying/scan-query.html) for syntax and output format details.

For Druid SQL queries that use Select, no...

Assets 2

11 Dec 07:36

jon-wei

druid-0.16.1-incubating

144bd78

druid-0.16.1-incubating

Apache Druid 0.16.1-incubating is a bug fix and user experience improvement release that fixes a rolling upgrade issue, improves the startup scripts, and updates licensing information.

Bug Fixes
#8682 implement FiniteFirehoseFactory in InlineFirehose
#8905 Retrying with a backward compatible task type on unknown task type error in parallel indexing

User Experience Improvements
#8792 Use bundled ZooKeeper in tutorials.
#8794 Startup scripts: verify Java 8 (exactly), improve port/java verification messages.
#8942 Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1.
#8798 Fix verify script.

Licensing Update
#8944 Add license for tutorial wiki data
#8968 Add licenses.yaml entry for Wikipedia sample data

Other
#8419 Bump Apache Thrift to 0.10.0

Updating from 0.16.0-incubating and earlier

PR #8905 fixes an issue with rolling upgrades when updating from earlier versions.
Credits
Thanks to everyone who contributed to this release!

@aditya-r-m
@clintropolis
@Fokko
@gianm
@jihoonson
@jon-wei

Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Assets 2

25 Sep 00:36

clintropolis

druid-0.16.0-incubating

54d29e4

druid-0.16.0-incubating

Apache Druid 0.16.0-incubating contains over 350 new features, performance enhancements, bug fixes, and major documentation improvements from 50 contributors. Check out the complete list of changes and everything tagged to the milestone.

Highlights

# Performance

# 'Vectorized' query processing

An experimental 'vectorized' query execution engine is new in 0.16.0, which can provide a speed increase in the range of 1.3-3x for timeseries and group by v2 queries. It operates on the principle of batching operations on rows instead of processing a single row at a time, e.g. iterating bitmaps in batches instead of per row, reading column values in batches, filtering in batches, aggregating values in batches, and so on. This results in significantly fewer method calls, better memory locality, and increased cache efficiency.

This is an experimental feature, but we view it as the path forward for Druid query processing and are excited for feedback as we continue to improve and fill out missing features in upcoming releases.

Only timeseries and groupBy have vectorized engines.
GroupBy doesn't handle multi-value dimensions or granularity other than "all" yet.
Vector cursors cannot handle virtual columns or descending order.
Expressions are not supported anywhere: not as inputs to aggregators, in virtual functions, or in filters.
Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs).

The feature can be enabled by setting "vectorize": true your query context (the default is false). This works both for Druid SQL and for native queries. When set to true, vectorization will be used if possible; otherwise, Druid will fall back to its non-vectorized query engine. You can also set it to "force", which will return an error if the query cannot be fully vectorized. This is helpful for confirming that vectorization is indeed being used.

You can control the block size during execution by setting the vectorSize query context parameter (default is 1000).

#7093
#6794

# GroupBy array-based result rows

groupBy v2 queries now use an array-based representation of result rows, rather than the map-based representation used by prior versions of Druid. This provides faster generation and processing of result sets. Out of the box this change is invisible and backwards-compatible; you will not have to change any configuration to reap the benefits of this more efficient format, and it will have no impact on cached results. Internally this format will always be utilized automatically by the broker in the queries that it issues to historicals. By default the results will be translated back to the existing 'map' based format at the broker before sending them back to the client.

However, if you would like to avoid the overhead of this translation, and get even faster results,resultAsArray may be set on the query context to directly pass through the new array based result row format. The schema is as follows, in order:

Timestamp (optional; only if granularity != ALL)
Dimensions (in order)
Aggregators (in order)
Post-aggregators (optional; in order, if present)

#8118
#8196

# Additional performance enhancements

The complete set of pull requests tagged as performance enhancements for 0.16 can be found here.

# "Minor" compaction

Users of the Kafka indexing service and compaction and who get a trickle of late data, can find a huge improvement in the form of a new concept called 'minor' compaction. Enabled by internal changes to how data segments are versioned, minor compaction is based on the idea of 'segment' based locking at indexing time instead of the current Druid locking behavior (which is now referred to as 'time chunk' locking). Segment locking as you might expect allows only the segments which are being compacted to be locked, while still allowing new 'appending' indexing tasks (like Kafka indexing tasks) to continue to run and create new segments, simulataneously. This is a big deal if you get a lot of late data, because the current behavior results in compaction tasks starving as higher priority realtime tasks hog the locks. This prevention of compaction tasks from optimizing the datasources segment sizes results in reduced overall performance.

To enable segment locking, you will need to set forceTimeChunkLock to false in the task context, or set druid.indexer.tasklock.forceTimeChunkLock=false in the Overlord configuration. However, beware, after enabling this feature, due to the changes in segment versioning, there is no rollback path built in, so once you upgrade to 0.16, you cannot downgrade to an older version of Druid. Because of this, we highly recommend confirming that Druid 0.16 is stable in your cluster before enabling this feature.

It has a humble name, but the changes of minor compaction run deep, and it is not possible to adequately describe the mechanisms that drive this in these release notes, so check out the proposal and PR for more details.

#7491
#7547

# Druid "indexer" process

The new Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process. The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks.

The advantage of the Indexer is that it allows query processing resources, lookups, cached authentication/authorization information, and much more to be shared between all running indexing task threads, giving each individual task access to a larger pool of resources and far fewer redundant actions done than is possible with the Peon model of execution where each task is isolated in its own process.

Using Indexer does come with one downside: the loss of process isolation provided by Peon processes means that a single task can potentially affect all running indexing tasks on that Indexer. The druid.worker.globalIngestionHeapLimitBytes and druid.worker.numConcurrentMerges configurations are meant to help minimize this. Additionally, task logs for indexer processes will be inline with the Indexer process log, and not persisted to deep storage.

You can start using indexing by supplying server indexer as the command-line argument to org.apache.druid.cli.Main when starting the service. To use Indexer in place of a MiddleManager and Peon, you should be able to adapt values from the configuration into the Indexer configuration, lifting druid.indexer.fork.property. configurations directly to the Indexer, and sizing heap and direct memory based on the Peon sizes multiplied by the number of task slots (unlike a MiddleManager, it does not accept the configurations druid.indexer.runner.javaOpts or druid.indexer.runner.javaOptsArray). See the indexer documentation for details.

#8107

# Native parallel batch indexing with shuffle

In 0.16.0, Druid's index_parallel native parallel batch indexing task now supports 'perfect' rollup with the implementation of a 2 stage shuffle process.

Tasks in stage 1 perform a secondary partitioning of rows on top of the standard time based partitioning of segment granularity, creating an intermediary data segment for each partition. Stage 2 tasks are each assigned a set of the partitionings created during stage 1, and will collect and combine the set of intermediary data segments which belong to that partitioning, allowing it to achieve complete rollup when building the final segments. At this time, only hash-based partitioning is supported.

This can be enabled by setting forceGuaranteedRollup to true in the tuningConfig; numShards in partitionsSpec and intervals in granularitySpec must also be set.

The Druid MiddleManager (or the new Indexer) processes have a new responsibility for these indexing tasks, serving the intermediary partition segments output of stage 1 into the stage 2 tasks, so depending on configuration and cluster size, the MiddleManager jvm configuration might need to be adjusted to increase heap allocation and http threads. These numbers are expected to scale with cluster size, as all MiddleManager or Indexer processes involved in a shuffle will need the ability to communicate with each other, but we do not expect the footprint to be significantly larger than it is currently. Optimistically we suggest trying with your existing configurations, and bumping up heap and http thread count only if issues are encountered.

#8061

#...

Assets 2

16 Aug 00:49

clintropolis

druid-0.15.1-incubating

c698daa

druid-0.15.1-incubating

Apache Druid 0.15.1-incubating is a bug fix release that includes important fixes for Apache Zookeeper based segment loading, the 'druid-datasketches' extension, and much more.

Bug Fixes

Coordinator

#8137 coordinator throwing exception trying to load segments (fixed by #8140)

Middlemanager

#7886 Middlemanager fails startup due to corrupt task files (fixed by #7917)
#8085 fix forking task runner task shutdown to be more graceful

Queries

#7777 timestamp_ceil function is either wrong or misleading (fixed by #7823)
#7820 subtotalsSpec and filtering returns no results (fixed by #7827)
#8013 Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities.

API

#6786 apache-druid-0.13.0-incubating router /druid/router/v1/brokers (fixed by #8026)
#8044 SupervisorManager: Add authorization checks to bulk endpoints.

Metrics Emitters

#8204 HttpPostEmitter throw Class cast exception when using emitAndReturnBatch (fixed by #8205)

Extensions

Datasketches

#7666 sketches-core-0.13.4
#8055 force native order when wrapping ByteBuffer

Kinesis Indexing Service

#7830 Kinesis: Fix getPartitionIds, should be checking isHasMoreShards.

Moving Average Query

#7999 Druid moving average query results in circular reference error (fixed by #8192)

Documentation Fixes

#8002 Improve pull-deps reference in extensions page.
#8003 Add missing reference to Materialized-View extension.
#8079 Fix documentation formatting
#8087 fix references to bin/supervise in tutorial docs

Updating from 0.15.0-incubating and earlier

Due to issue #8137, when updating from specifically 0.15.0-incubating to 0.15.1-incubating, it is recommended to update the Coordinator before the Historical servers to prevent segment unavailability during an upgrade (this is typically reversed). Upgrading from any version older than 0.15.0-incubating does not have these concerns and can be done normally.

Known Issues

Building Docker images is currently broken and will be fixed in the next release, see #8054 which is fixed by #8237 for more details.

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@ArtyomyuS
@ccl0326
@clintropolis
@gianm
@himanshug
@jihoonson
@legoscia
@leventov
@pjain1
@yurmix
@xueyumusic

Assets 2

27 Jun 18:36

jihoonson

druid-0.15.0-incubating

44c9323

druid-0.15.0-incubating

Apache Druid 0.15.0-incubating contains over 250 new features, performance/stability/documentation improvements, and bug fixes from 39 contributors. Major new features and improvements include:

New Data Loader UI
Support transactional Kafka topic
New Moving Average query
Time ordering for Scan query
New Moments Sketch aggregator
SQL enhancements
Light lookup module for routers
Core ORC extension
Core GCP extension
Document improvements

The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.15.0

Documentation for this release is at: http://druid.apache.org/docs/0.15.0-incubating/

Highlights

New Data Loader UI (Batch indexing part)

Druid has a new Data Loader UI which is integrated with the Druid Console. The new Data Loader UI shows some sampled data to easily verify the ingestion spec and generates the final ingestion spec automatically. The users are expected to easily issue batch index tasks instead of writing a JSON spec by themselves.

Added by @vogievetsky and @dclim in #7572 and #7531, respectively.

Support Kafka Transactional Topics

The Kafka indexing service now supports Kafka Transactional Topics.

Please note that only Kafka 0.11.0 or later versions are supported after this change.

Added by @surekhasaharan in #6496.

New Moving Average Query

A new query type was introduced to compute moving average.

Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/moving-average-query.html for more details.

Added by @yurmix in #6430.

Time Ordering for Scan Query

The Scan query type now supports time ordering. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/scan-query.html#time-ordering for more details.

Added by @justinborromeo in #7133.

New Moments Sketch Aggregator

The Moments Sketch is a new sketch type for approximate quantile computation. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/momentsketch-quantiles.html for more details.

Added by @edgan8 in #6581.

SQL enhancements

Druid community has been striving to enhance SQL support and now it's no longer experimental.

New SQL functions

LPAD and RPAD functions were added by @xueyumusic in #7388.
DEGREES and RADIANS functions were added by @xueyumusic in #7336.
STRING_FORMAT function was added by @gianm in #7327.
PARSE_LONG function was added by @gianm in #7326.
ROUND function was added by @gianm in #7224.
Trigonometric functions were added by @xueyumusic in #7182.

Autocomplete in Druid Console

Druid Console now supports autocomplete for SQL.

Added by @shuqi7 in #7244.

Time-ordered scan support for SQL

Druid SQL supports time-ordered scan query.

Added by @justinborromeo in #7373.

Lookups view added to the web console

You can now configure your lookups from the web console directly.

Added by @shuqi7 in #7259.

Misc web console improvements

"NoSQL" mode : #7493 [@shuqi7]

The web console now has a backup mode that allows it to function as best as it can if DruidSQL is disabled or unavailable.

Added compaction configuration dialog : #7242 [@shuqi7]

You can now configure the auto compaction settings for a data source from the Datasource view.

Auto wrap query with limit : #7449 [@vogievetsky]

The console query view will now (by default) wrap DruidSQL queries with a SELECT * FROM (...) LIMIT 1000 allowing you to enter queries like SELECT * FROM your_table without worrying about the impact to the cluster. You can still send 'raw' queries by selecting the option from the ... menu.

SQL explain query : #7402 [@shuqi7]

You can now click on the ... menu in the query view to get an explanation of the DruidSQL query.

Surface `is_overshadowed` as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

is_overshadowed column represents that this segment is overshadowed by any published segments. It can be useful to see what segments should be loaded by historicals. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/sql.html for more details.

Improved status UI for actions on tasks, supervisors, and datasources : #7528 [shuqi7]

This PR condenses the actions list into a tidy menu and lets you see the detailed status for supervisors and tasks. New actions for datasources around loading and dropping data by interval has also been added.

Light Lookup Module for Routers

Light lookup module was introduced for Routers and they now need only minimum amount of memory. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html#router for basic memory tuning.

Added by @clintropolis in #7222.

Core ORC extension

ORC extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the ORC extension in an earlier version of Druid.

Added by @clintropolis in #7138.

Core GCP extension

GCP extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the GCP extension in an earlier version of Druid.

Added by @drcrallen in #6953.

Document Improvements

Single-machine deployment example configurations and scripts

Several configurations and scripts were added for easy single machine setup. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/single-server.html for details.

Added by @jon-wei in #7590.

Tool for migrating from local deep storage/Derby metadata

A new tool was added for easy migration from single machine to a cluster environment. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/deep-storage-migration.html for details.

Added by @jon-wei in #7598.

Document for basic tuning guide

Documents for basic tuning guide was added. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html for details.

Added by @jon-wei in #7629.

Security Improvement

The Druid system table now requires only mandatory permissions instead of the read permission for the whole sys database. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-core/druid-basic-security.html for details.

Added by @jon-wei in #7579.

Deprecated/removed

Drop support for automatic segment merge

The automatic segment merge by the coordinator is not supported anymore. Please use auto compaction instead.

Added by @jihoonson in #6883.

Drop support for `insert-segment-to-db` tool

In Druid 0.14.x or earlier, Druid stores segment metadata (descriptor.json file) in deep storage in addition to metadata store. This behavior has changed in 0.15.0 and it doesn't store segment metadata file in deep storage anymore. As a result, insert-segment-to-db tool is no longer supported as well since it works based on descriptor.json files in deep storage. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/insert-segment-db.html for details.

Please note that kill task will fail if you're using HDFS as deep storage and descriptor.json file is missing in 0.14.x or earlier versions.

Added by @jihoonson in #6911.

Removed "useFallback" configuration for SQL

This option was removed since it generates unscalable query plans and doesn't work with some SQL functions.

Added by @gianm in #7567.

Removed a public API in `Co...

Assets 2

27 May 21:05

clintropolis

druid-0.14.2-incubating

1053684

druid-0.14.2-incubating

Apache Druid 0.14.2-incubating is a bug fix release that includes important fixes for the 'druid-datasketches' extension and the broker 'result' level caching.

Bug Fixes

#7607 thetaSketch(with sketches-core-0.13.1) in groupBy always return value no more than 16384
#6483 Exception during sketch aggregations while using Result level cache
#7621 NPE when both populateResultLevelCache and grandTotal are set

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@clintropolis
@jihoonson
@jon-wei

Assets 2

09 May 04:04

clintropolis

druid-0.14.1-incubating

e8a8816

druid-0.14.1-incubating

Apache Druid 0.14.1-incubating is a small patch release that includes a handful of bug and documentation fixes from 16 contributors.

Important Notice

This release fixes an issue with druid-datasketches extension with quantile sketches, but introduces another one with theta sketches that was confirmed after the release was finalized, caused by #7320 and described in #7607. If you utilize theta sketches, we recommend not upgrading to this release. This will be fixed in the next release of Druid by #7619.

Bug Fixes

use latest sketches-core-0.13.1 #7320
Adjust BufferAggregator.get() impls to return copies #7464
DoublesSketchComplexMetricSerde: Handle empty strings. #7429
handle empty sketches #7526
Adds backwards-compatible serde for SeekableStreamStartSequenceNumbers. #7512
Support Kafka supervisor adopting running tasks between versions #7212
Fix time-extraction topN with non-STRING outputType. #7257
Fix two issues with Coordinator -> Overlord communication. #7412
refactor druid-bloom-filter aggregators #7496
Fix encoded taskId check in chatHandlerResource #7520
Fix too many dentry cache slab objs#7508. #7509
Fix result-level cache for queries #7325
Fix flattening Avro Maps with Utf8 keys #7258
Write null byte when indexing numeric dimensions with Hadoop #7020
Batch hadoop ingestion job doesn't work correctly with custom segments table #7492
Fix aggregatorFactory meta merge exception #7504

Documentation Changes

Fix broken link due to Typo. #7513
Some docs optimization #6890
Updated Javascript Affinity config docs #7441
fix expressions docs operator table #7420
Fix conflicting information in configuration doc #7299
Add missing doc link for operations/http-compression.html #7110

Updating from 0.14.0-incubating and earlier

Kafka Ingestion

Updating from version 0.13.0-incubating or earlier directly to 0.14.1-incubating will not require downtime like the migration path to 0.14.0-incubating due to the issue described in #6958, which has been fixed for this release in #7212. Likewise, rolling updates from version 0.13.0-incubating and earlier should also work properly due to #7512.

Native Parallel Ingestion

Updating from 0.13.0-incubating directly to 0.14.1-incubating will not encounter any issues during a rolling update with mixed versions of middle managers due to the fixes in #7520, as could be experienced when updating to 0.14.0-incubating.

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@b-slim
@benhopp
@chrishardis
@clintropolis
@ferristseng
@es1220
@gianm
@jihoonson
@jon-wei
@justinborromeo
@kaka11chen
@samarthjain
@surekhasaharan
@zhaojiandong
@zhztheplayer

Assets 2

09 Apr 21:09

jon-wei

druid-0.14.0-incubating-rc3

f169ada

druid-0.14.0-incubating-rc3

[maven-release-plugin] prepare release druid-0.14.0-incubating

Assets 2

Releases: apache/druid

druid-0.18.0

# New Features

# Join support

# Query inlining in Brokers

# Query laning and prioritization

New dimension in query metrics

New configuration

# SQL grouping sets

# SQL Dynamic parameters

# Important Changes

# applyLimitPushDownToSegments is disabled by default

# Roaring bitmaps as default

# Complex metrics behavior change at ingestion time when SQL-compatible null handling is disabled (default mode)

# Array expression syntax change

# Enabling pending segments cleanup by default

druid-0.17.1

druid-0.17.0

Highlights

Batch ingestion improvements

Single dimension range partitioning for parallel native batch ingestion

Compaction changes

Parallel index task split hints

Parallel auto-compaction

Stateful auto-compaction

Parallel query merging on brokers

SQL-compatible null handling

LDAP extension

Dropwizard emitter

Self-discovery resource

Supervisors system table

Fast historical start with lazy loading

Historical segment cache distribution change

New readiness endpoints

Support task assignment based on MiddleManager categories

Security vulnerability updates

Upgrading to Druid 0.17.0

Select native query has been replaced

druid-0.16.1-incubating

druid-0.16.0-incubating

Highlights

# Performance

# 'Vectorized' query processing

# GroupBy array-based result rows

# Additional performance enhancements

# "Minor" compaction

# Druid "indexer" process

# Native parallel batch indexing with shuffle

#...

druid-0.15.1-incubating

Bug Fixes

Coordinator

Middlemanager

Queries

API

Metrics Emitters

Extensions

Datasketches

Kinesis Indexing Service

Moving Average Query

Documentation Fixes

Updating from 0.15.0-incubating and earlier

Known Issues

Credits

druid-0.15.0-incubating

Highlights

New Data Loader UI (Batch indexing part)

Support Kafka Transactional Topics

New Moving Average Query

Time Ordering for Scan Query

New Moments Sketch Aggregator

SQL enhancements

New SQL functions

Autocomplete in Druid Console

Time-ordered scan support for SQL

Lookups view added to the web console

Misc web console improvements

"NoSQL" mode : #7493 [@shuqi7]

Added compaction configuration dialog : #7242 [@shuqi7]

Auto wrap query with limit : #7449 [@vogievetsky]

# `applyLimitPushDownToSegments` is disabled by default

Surface `is_overshadowed` as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

Drop support for `insert-segment-to-db` tool