Skip to content

Latest commit

 

History

History
124 lines (93 loc) · 6.47 KB

CHANGELOG.md

File metadata and controls

124 lines (93 loc) · 6.47 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[0.11.0] - 2024-01-09

Added

  • Detect and partition sparse region of UIDs (pull #224)

[0.10.0] - 2023-05-03

Changed

  • Estimator "maxLeaseId" renamed to "maxUid", as used with option dgraph.partitioner.uidRange.estimator (pull #221).
  • Upgraded gson and requests dependencies (pull #225).

Fixed

  • Work with maxUid values that cannot be parsed (pull #216).
  • Handle maxUid values larger than Long.MaxValue (pull #216).
  • Handle Dgraph data type "default" as plain strings (pull #223).
  • Supports full unsigned long (64 bits) value range of Dgraph uids, mapped into signed longs (pull #222).

[0.9.0] - 2022-07-14

Changed

  • Moved to shaded Java Dgraph client (uk.co.gresearch.dgraph:dgraph4j-shaded:21.12.0-0).

[0.8.0] - 2022-01-19

Changed

  • Moved Java Dgraph client to 21.12.0.

Fixed

  • Support latest dgraph release 21.12.0 (pull #147)

[0.7.0] - 2021-10-01

Changed

  • Moved Java Dgraph client to 21.03.1.

Fixed

  • Support latest dgraph release 21.03.0 (pull #101)

[0.6.0] - 2021-03-05

Added

  • Adds support to read string predicates with language tags like <http://www.w3.org/2000/01/rdf-schema#label@en> (issue #63). This works with any source and mode except the node source in wide mode. Note that reading into GraphFrames is based on the wide mode, so only the untagged language strings can be read there. Filter pushdown is not supported for multi-language predicates yet (issue #68).
  • Adds readable exception and suggests next steps when GRPC fails with RESOURCE_EXHAUSTED code.
  • Missing maxLeaseId in cluster state response defaults to 1000L to avoid an exception.

Changed

  • Improves predicate partitioning on projection pushdown as it creates full partitions.
  • Fixes bug that did not push predicate value filter correctly down to Dgraph causing incorrect results (issue #82)
  • Fixes bug in reading geo and password data types.
  • Tests against Dgraph 20.03, 20.07 and 20.11.
  • Moved Java Dgraph client to 20.11.0.
  • Upgraded all dependencies to latest versions.

[0.5.0] - 2020-10-21

Added

  • Optionally reads all partitions within the same transaction. This guarantees a consistent snapshot of the graph (issue #6). However, concurrent mutations reduce the lifetime of such a transaction and will cause an exception when lifespan exceeds.
  • Add Python API that mirrors the Scala API. The README.md fully documents how to load Dgraph data in PySpark.
  • Fixed dependency conflicts between connector dependencies and Spark by shading the Java Dgraph client and all its dependencies.

Changed

  • Refactored connector API, renamed spark.read.dgraph* methods to spark.read.dgraph.*.
  • Moved triples, edges and nodes sources from package uk.co.gresearch.spark.dgraph.connector to uk.co.gresearch.spark.dgraph.
  • Moved Java Dgraph client to 20.03.1 and Dgraph test cluster to 20.07.0.

[0.4.0] - 2020-07-24

Added

  • Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs. Filters like .where($"revenue".isNotNull) and projections like .select($"subject", $"`dgraph.type`", $"revenue") will be pushed to Dgraph and only the relevant graph data will be read (issue #7).
  • Improve performance of PredicatePartitioner for multiple predicates per partition. Restoring default number of predicates per partition of 1000 from before 0.3.0 (issue #22).
  • The PredicatePartitioner combined with UidRangePartitioner is the default partitioner now.
  • Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks. This make Spark read Dgraph partitions of any size.
  • Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through SparkListener.

Security

[0.3.0] - 2020-06-22

Added

  • Load data from Dgraph cluster as GraphFrames GraphFrame.
  • Use exact uid cardinality for uid range partitioning. Combined with predicate partitioning, large predicates get split into more partitions than small predicates (issue #2).
  • Improve performance of PredicatePartitioner for a single predicate per partition (dgraph.partitioner.predicate.predicatesPerPartition=1). This becomes the new default for this partitioner.
  • Move to Spark 3.0.0 release (was 3.0.0-preview2).

Fixed

  • Dgraph groups with no predicates caused a NullPointerException.
  • Predicate names need to be escaped in Dgraph queries.

[0.2.0] - 2020-06-11

Added

  • Load nodes from Dgraph cluster as wide nodes (fully typed property columns).
  • Added dgraph.type and dgraph.graphql.schema predicates to be loaded from Dgraph cluster.

[0.1.0] - 2020-06-09

Initial release of the project

Added