Skip to content

haqer1/arangodb-spring-data-rational

 
 

Repository files navigation

ArangoDB-Logo

ArangoDB Spring Data (Supporting Canonical COLLECTION-PER-CLASS Type of Inheritance) Rational(ly)

One maintainer & 1 contributor in Spring Data ArangoDB project have refused to accept inheritance-related contributions implemented here. That decision has obviously (& without doubt) been driven not by rational considerations about technology, but by something else. In the process of blocking the contributions implemented here Spring Data ArangoDB upstream project has become tainted by extremely severe inefficiencies & irrationality. The developer who has provided the inheritance-related contributions implemented here, believes that what is now in the upstream is so irrational that it cannot be used as is, & therefore has to use a fork that provides rational & efficient implementation for a main-stream persistence-related inheritance type like canonical COLLECTION-PER-CLASS approach (similar to TABLE-PER-CLASS inheritance type in JPA). The expression canonical COLLECTION-PER-CLASS type of inheritance is used here not as something set in stone, but just to avoid using a more ambiguous phrase like "classes that have a declared @Document annotation". Bottom line is that this implementation is now more efficient than upstream, even for projects that don't use any persistence-related inheritance at all, because the upstream project has become inefficient & irrational for all records (whether or not any persistence-related inheritance is involved in them).

Inefficiencies & other issues in Spring Data ArangoDB OPTIMIZED/RESOLVED by this implementation

  1. Data pollution & disk space waste: amount of data persisted/processed, etc. when using this implementation is up to 4 times smaller.
  2. This data pollution & disk space waste in turn entail more memory utilization at run-time.
  3. This also entails unnecessary band-width utilization.
  4. All of the above also entail usage of more CPU cycles at run-time (considering storage of the unnecessary data, its retrieval, & processing).
  5. Issues 1-through-4, can lead to considerable & even noticeable increase in latency (responsiveness).
  6. Issues 1-through-4, (especially when using a Platform as a service) eventually (for a PaaS, quite quickly) translate to additional operating expenses (yes, there is also a cash aspect involved).
  7. Extremely absurd clutter when looking at the data (even for classes that have nothing to do with inheritance: namely, that don't extend another entity/document, & are not extended) (which is actually also a big factor, once one takes a look at it): as can be seen below.
  8. Issue 7 will most likely have a negative effect on developer & DB admin productivity: by inhibiting concentration on useful data due to presence of a lot of useless data.
  9. Unnecessary tight-coupling of DB records to Java classes: a re-factoring of any @Document Java class to a different package (or changing the name of any Document class which already! has a customized! collection name) as of now would require running a query to update all relevant DB records (this is a major code smell & reveals that now there is a conflict (& bizarre duplication) between the inheritance-support implementation focusing on non-Documents & the semantics of @Document value attribute (the former prevents the latter from freely decoupling DB records from the name of Java class): the upstream project now forces updating all relevant DB records if the name of the class is changed).

Visual examples of optimized inefficiencies

Single document record

Absurd in upstream Spring Data ArangoDB: Alt text

Normal record provided with this implementation (the size is up to 3.69 times smaller (35/129 bytes)): Alt text

Edge collection record (graph traversal use-case)

Absurd in upstream Spring Data ArangoDB: Alt text

Normal record provided with this implementation (in this example, the size is 1.97 times smaller (59/116 bytes)): Alt text

A record for a class that DOESN'T extend another entity/document, & is not extended

Absurd in upstream Spring Data ArangoDB: Alt text

Normal record provided with this implementation: Alt text

A record for a class that has a property of type List with 2 entities/documents in it

Absurd in upstream Spring Data ArangoDB (with (automatic) join, in this case redundant data would be present in all 3 entities/documents that get retrieved): Alt text

Normal record provided with this implementation: Alt text

Cumulative effect of optimizations (for JOINs, multiple records matching a query, etc.)

Taking the example of a single record & estimating that the size of single record is 3.69 times smaller (35/129 bytes), in each of the following also quite simple 2 examples (involving JOINS into 2 other COLLECTIONS) the effect would be cumulative (i.e., absolute size of data (stored, transferred, processed, etc.) would be multiplied by a factor of 3 (i.e., 1 + 1 + 1 or 1 + 2):

  1. @Document class A { B b; } @Document class B { C c; } @Document class C { }

  2. @Document class D { C c; E e; } @Document class C { } @Document class E { }

A. If one adds to example 2. an eager retrieval of a simple List of instances of some class F of size 5, the cumulative effect would be even more noticeable:

@Document class D { C c; E e; List<F> f; } @Document class F { }

So in this example, absolute size of data (stored, transferred, processed, etc.) would be multiplied by a factor of 8 (i.e., 3 documents as in example 2. + 5 more for the list). Thus smaller size per record provides a cumulative effect for operations involving JOINs or multiple records matching a query, etc. (with propagating efficiencies & benefits in terms of memory, bandwidth, CPU, latency, operational expenses, productivity, as well as visual & perceptional aspects (simpler due to less clutter, less ambiguous), etc.).

  1. For a graph of x entities with x edges, the effect would also be cumulative (the effect for x entities AND the effect for x edges): potentially doubling some of the effects (e.g., amount of storage used) for graph traversal use-cases.

Cumulative efficiencies: simple sample calculations for various numbers of persisted entities

Assuming average record size difference to be as shown in example above for single record: Alt text

Conclusion: this implementation is significantly more efficient in terms of disk space, memory, bandwidth, & CPU usage, as well as in terms of latency, operational expenses, & productivity; & is better in terms of visual & perceptional aspects (simpler due to less clutter, less ambiguous), & in terms of DB records not being tightly-coupled with Java classes.

Test report comparisons (showing that all upstream functionality is preserved, it is just optimized (not less, just better))

Release 2.1.7 vs. 2.1.7.1-rational

Modified (branch) Upstream (original) Diff

Feel free to repeat the steps shown in Diff using the following (more recent) tag pairs:

  • 2.3.0 (upstream) & 2.3.0.1 (-rational)
  • 3.2.3 (upstream) & 3.2.3.1 (-rational)

Optimization for edges and graph traversal branch

Modified (branch) Upstream (original)

PR 41 vs. equivalent upstream 2.1.4-SNAPSHOT

Modified (branch) Upstream (original) Diff

Brief history

ArangoDB Spring Data had no support for inheritance in @Documents, so an issue was logged on March 13, 2018 focusing on support for a main-stream inheritance type: canonical COLLECTION-PER-CLASS (similar to TABLE-PER-CLASS in JPA). On March 24th, a pull request was provided for it. This pull request didn't receive the same quick treatment that others get. On April 5th, a strange issue was opened by another contributor to support inheritance in properties of interface type. That strange request was followed by request to not merge the pull request for main-stream inheritance support of type COLLECTION-PER-CLASS. On April 12th, a pull request was submitted by that same contributor that focuses on inheritance in non-@Documents by persisting the fully-qualified class name. On April 17th, despite it having been stated that for canonical COLLECTION-PER-CLASS type of inheritance storing the fully-qualified class name is 100% unnecessary, that alternative PR got merged into upstream Spring Data ArangoDB. Despite the fact that the inefficiencies introduced by the alternative PR had been clearly shown, the maintainer of ArangoDB Spring Data refused to merge the original pull request (which had been updated to avoid persistence of the fully-qualified class name for @Documents (because it's unnecessary & causes many issues & inefficiencies), leaving other cases as is (i.e., leaving them up to whatever ArangoDB Spring Data in general wants to do with them (such as based on the alternative PR))), & closed it on May 22nd. To make it clear, the developer of this fork never made a request to not merge the alternative PR, or to revert it: but the other developer requested the contributions here to not be merged, & that's how the PR got closed by the maintainer. The maintainer also has taken (on July 2, 2018) an insane position of refusing to accept a PR that removes fully-qualified class name storage, retrieval, & processing for @Edges without having been able to provide a single (!) use-case for which fully-qualified class names need to be stored for @Edges. Thus, to have a rational mapping implementation for ArangoDB spring-data, there is a need for an alternative implementation: hence, this project.

Maven Central Actions Status

Learn more

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 99.2%
  • Shell 0.8%