Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dynamic rather than static parameters #105

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

msalib
Copy link

@msalib msalib commented Nov 14, 2022

  • I agree to follow the project's code of conduct.
  • I added an entry to rstar/CHANGELOG.md if knowledge of this change could be valuable to users.

Hi! This PR makes RTree parameters a single struct, named Params instead of a trait that clients have to provide as part of the type signature. I need that functionality for a project I'm working where the parameters have to be supplied externally and can't be baked into code statically -- we're experimenting with using RTrees for partitioning rather than lookups.

This change produces a 3%-15% improvement on benchmarks.

Along the way, I've moved some checks into Params::new or into a static assert (in the case of the point dimension greater than 1 check).

Although I made this PR to provide more flexibility to clients and although it has improved benchmarks, I think this change has made the API simpler as well. Params is just a single struct with a constructor. You don't have to deal with weird type magic. RTree<T> has a single generic parameter, T. In earlier versions of this PR, I made the insertion strategy another generic parameter for RTree with a default value, but then dropped it because it cluttered up the code significantly; this crate is almost 5 years old and has only ever had one insertion strategy, so I didn't see the benefit in adding another generic parameter to support genericity that we haven't needed and are unlikely to need in the future.

Tests and benchmarks pass but I still need to clean up the docs.

@Andlon
Copy link
Contributor

Andlon commented Dec 8, 2022

Some comments from an rstar user who's not so familiar with the rstar code base itself:

From a user's perspective, I think this is probably a good idea. It reduces "needless" generics and therefore overall complexity of the API, and it seems easier to work with and enables changing settings at runtime.

What's the reason for the improvement in the benchmarks? I'd have expected static parameters not to be slower than runtime parameters...

I'm concerned about the Params::new(usize, usize, usize) signature, as it's easy to get the order of parameters mixed up. What about a builder-like pattern instead? Perhaps without a separate builder? This has the added benefit that you can add new parameters in the future without a breaking change as long as you provide a default. Example:

let params = Params::default() //could also be Params::new()
    .with_max_size(my_max_size)
    .with_reinsertion_count(2);
// we omit .with_min_size which leaves it the default

@Andlon
Copy link
Contributor

Andlon commented Dec 8, 2022

Oh, and, perhaps you consider not implementing Copy for Params, since you might want to add non-copy settings in the future. The amount of times you need to explicitly .clone() such a struct should in any case be very small as I expect you take it by value in the construction and only reference from then on (?).

@kylebarron
Copy link
Contributor

this crate is almost 5 years old and has only ever had one insertion strategy, so I didn't see the benefit in adding another generic parameter to support genericity that we haven't needed and are unlikely to need in the future.

I haven't looked at the code changes yet; does this PR remove the insertion strategy altogether? I had been thinking I might try to dabble with an implementation of STR insertion strategy for static trees for data partitioning.

@rmanoka
Copy link
Contributor

rmanoka commented Dec 9, 2022

This PR currently addresses two concerns: (1) supporting dynamic params; (2) simplifying the RTree type generics. I think the former is definitely a useful addition, but am not convinced about the latter. May I suggest the following alternative to only support (1) and discuss (2) in a separate PR?

Reg. dynamic params, the current trait does indeed not support it, and it was probably designed to statically verify the fact that the sizes in the params do not change. Instead, we could change it to:

pub trait RTreeParams: Send + Sync {
    fn min_size(&self) -> usize;
    fn max_size(&self) -> usize;
    fn reinsertion_count(&self) -> usize;

    type DefaultInsertionStrategy: InsertionStrategy;
}

We can then implement this trait on the Params struct defined in the PR. Now of course, the static guarantee is lost, and we need to emphasize that the min_size, etc. should not change across calls on the same object. This anyway can't be done unless we use interior mutability or just produce some random numbers in the function call.

Or we could make the function into something like fn min_size(&self) -> &usize to ensure it is a stored reference that is not being computed per call.

This way, we maintain the current generics and we can decide on dropping that independently.

@msalib would you be able to try this in this PR?

/// maintain a good tree quality. Must be smaller than `MAX_SIZE` - `MIN_SIZE`.
/// Larger values will improve query times but increase insertion time.
const REINSERTION_COUNT: usize;
/// hi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants