Skip to content

Releases: src-d/gemini

v0.7.1

28 Feb 12:51
1d29f9c
Compare
Choose a tag to compare

New features:

  • Allow to configure spark hashing job using environment variables #207

v0.7.0

22 Feb 14:12
4755486
Compare
Choose a tag to compare

Breaking changes:

  • Remove empty files from hashing #201
  • Exclude very small files from file similarity hashing #201

New features:

  • Remove duplicated repositories #200

Fixes:

  • hash/query: Use floats instead of doubles in wmh #196
  • hash: Filter files with null language #198
  • hash: Fix segfault on macOs Mojave #204
  • report: Multiple performance optimization #197

Also documentation explains current limitation, performance tips and list of known bugs now.

v0.6.0

07 Feb 18:02
b3a21fd
Compare
Choose a tag to compare

Breaking changes:

  • New version is incompatible with old feature-extractor

New features:

  • feature-extractor: New Extract method to run multiple extractors on the same UAST in one call #188
  • feature-extractor: Use multiple processes for extractors #193

Fixes:

  • Filter out too long feature names #192
  • Update jgit-spark-connecter to support remote standard/bare repositories
  • Performance improvements on hashing part #194

v0.5.0

05 Feb 11:20
52ba94e
Compare
Choose a tag to compare

Breaking change:

  • Hashing excludes vendor files now

New features:

  • Number of workers in feature-extractor service is configurable now

v0.4.0

30 Jan 16:28
df2042e
Compare
Choose a tag to compare

Breaking changes:

  • DB schema is changed and incompatible with the previous one

New features:

  • add support for Amazon Web Services S3 #183
  • require empty DB keyspace to run hash #184

Bug fixes:

  • change DB schema for document frequencies to avoid too large mutation #185

v0.3.0

25 Jan 11:22
62695cc
Compare
Choose a tag to compare

Breaking changes:

  • remove --format flag from report
  • add new --cassandra flag which behaves like --format=group-by before

New features:

  • add support for Google Cloud Storage #182
  • add json output format for report #181

v0.2.0

23 Oct 13:26
aabbaf0
Compare
Choose a tag to compare

The main features

Function level similarity

Other improvements:

  • Update jgit-spark-connecter (get list of supported languages from bblfsh)
  • Keep document frequencies in DB instead of JSON file
  • Drop go client

v0.1.0

03 Jul 15:20
4091e36
Compare
Choose a tag to compare

The main features

Apache Spark application performance tuning on actual cluster:

  • hashuses more performant Apache Spark conf defaults for running on cluster
  • hash more performance improvements: JSON parser, broadcast DocFreq
  • hash CLI has Feature Extractor error reporting summary

Other improvements:

  • CLI options parsing refactored and is shared among all commands
  • UX: CLI lists actual .siva files to be processed by engine
  • Golang example of querying for duplicates updated
  • Smaller bug fixes and improvements

v0.0.4

12 Jun 15:44
6326807
Compare
Choose a tag to compare

The main features

Packaging of the release as .jars and not using SBT to run it any more

Other improvements:

  • better logging on -v

v0.0.3

11 May 10:14
0581d2c
Compare
Choose a tag to compare

The main feature:

This release brings file similarity to hash.

Other improvements:

  • docker-compose added to make it easier to try Gemini
  • use new parameters for file similarity provided by ML team
  • documentation is slightly improved