Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bodybuilder v3.0 #282

Open
johannes-scharlach opened this issue Dec 27, 2020 · 8 comments
Open

Bodybuilder v3.0 #282

johannes-scharlach opened this issue Dec 27, 2020 · 8 comments

Comments

@johannes-scharlach
Copy link
Collaborator

Given our recent discussions in #281 I had a bit of an itch in my fingers to try if a new and simpler API to bodybuilder could be created given everything we've learnt over the years and https://github.com/johannes-scharlach/elasticbuilder is my approach to that. It already passes most of the test cases which have been written for this repository, which seems like a big step to prove that it indeed is possible to provide the same functionality with what is hopefully a smaller interface.

@danpaz @ferronrsmith (and if you're interested also @quinnlangille) I'm looking forward to your feedback to see if we can/should turn that into v3 of bodybuilder. It might not be trivial to find a migration path, since it would be a complete rewrite of the library.

Example usage:

    const bb = new BodyBuilder();
    bb.query
      .must('match', 'message', 'this is a test')
      .filter('term', 'user', 'kimchy')
      .filter('term', 'user', 'herald')
      .should('term', 'user', 'johnny')
      .mustNot('term', 'user', 'cassie');
    bb.aggs
      .add('agg_terms_user', 'terms', 'user')
      .add(
        'agg_diversified_sampler_user.id',
        'diversified_sampler',
        {
          field: 'user.id',
          shard_size: 200,
        },
        new AggregationBuilder({
          keywords: {
            significant_terms: {
              field: 'text',
            },
          },
        })
      );
    const result = bb.build();

which brings up a couple of new questions:

  1. Now the user will be aware of concepts like the query builder and the aggregation builder
  2. given that those builders exist, it is no longer possible to do a single chaining which combines them – a real shame. Should we duplicate that interface on bodybuilder and muddle the responsibilities again?
  3. the real difference in API can be seen in passing in new AggregationBuilder and these nested queries being evaluated lazily when you call build(). This would have helped me in the past, but is it worth it to expose those new concepts?

I'm happy to continue the work in this repo if you're interested in v3 or I might even go in the direction of generalising the builder interface description as @ferronrsmith suggested to also construct other queries (e.g. mongodb).

@ferronrsmith
Copy link
Collaborator

ferronrsmith commented Dec 28, 2020

Removing single chaining would be a mistake. Changing an API used by 100s if not 1000s of devs is a bad idea.

That said I like the direction, as it relates to the simplification and lazy loading.

@quinnlangille
Copy link
Contributor

quinnlangille commented Dec 28, 2020

I like this! Moving to a more simplified api would allow for some good typing upgrades, and an overall boost to developer experience. I agree with @ferronrsmith in that we probably want to keep the single chaining. Regarding point 3, I think exposing the builder classes could be cool. For example, in a past version of our ES setup we would pass around a single instance of BB to many methods before sending it to ES:

class ProductAggBuilder {
    buildAgg(bb: BodyBuilder) {
          return bb.agg(...)
    }
)

This got a bit cumbersome with different business logic, and ended up being tough to follow. With the exposed classes, we could have written the above method like:

class ProductAggBuilder {
    buildAgg() {
          return new AggregationBuilder(...)
    }
)

I'm not sure if that's how others use BB, but having the exposed builders as an option would let devs write code that is easier to read and test, IMO.

Our usage has since changed, but we still rely on BB for all our ES stuff at the office. I know we have done a lot of wrapping/hacking to get around some things, so I can check with some of the devs for any feedback/input they have and get back to you!

@johannes-scharlach
Copy link
Collaborator Author

Thank you both very much for your candid feedback.

I moved back to single chaining. It turns out that TS by now has all the tools to allow you to merge multiple objects/builders into one. The above example now looks like this

    const bb = bodyBuilder();
    bb.must('match', 'message', 'this is a test')
      .filter('term', 'user', 'kimchy')
      .filter('term', 'user', 'herald')
      .should('term', 'user', 'johnny')
      .mustNot('term', 'user', 'cassie')
      .aggregation('agg_terms_user', 'terms', 'user')
      .aggregation(
        'agg_diversified_sampler_user.id',
        'diversified_sampler',
        {
          field: 'user.id',
          shard_size: 200,
        },
        bodyBuilder({
          aggs: {
            keywords: {
              significant_terms: {
                field: 'text',
              },
            },
          },
        })
      );
    const result = bb.build();

Feel free to explore the code base at https://github.com/johannes-scharlach/elasticbuilder/tree/composing-builders

Some important goals that have been reached are

  1. new builders (QueryBuilder, AggregationBuilder and BodyBuilder) can be created from existing builders and/or the literal representation of that ES body part
  2. BodyBuilder extends AggregationBuilder, QueryBuilder -> you can always just use a BodyBuilder if you want to, but also a more specialised builder is exposed for a smaller interface which may suffice
  3. Single object chaining: If you call .aggregation on a BodyBuilder, you get the bodybuilder. If you call it on an AggregationBuilder, you get the same instance of the aggregation builder.
  4. Lazy evaluation of any nested AggregationBuilder and QueryBuilder

@quinnlangille thank you very much for the detail, I used to have similar problems and workarounds. I'm looking forward to more real life use cases, since I don't use bodybuilder at work anymore :/

Open questions:

  1. when creating a new builder from an existing one, should it be evaluated eagerly or lazily?
  2. should callback-style nested aggregation builders continue to be supported (if yes, we might be able to create a codemod to support migrating from existing versions)
  3. Can we simplify the API even more? E.g. the queryBuilder().must still has a very flexible API, which is fine if things are intuitive to the user, but we usually get a lot of questions on this repo because the very flexible API of bodybuilder isn't obvious.

@quinnlangille
Copy link
Contributor

quinnlangille commented Dec 28, 2020

should callback-style nested aggregation builders continue to be supported (if yes, we might be able to create a codemod to support migrating from existing versions)

I think having a tighter syntax for nested aggs would make the api even more flexible than the last-argument-must-be-nested-callback status quo. For example:

     const agg = bb.aggregation(
        'agg_diversified_sampler_user',
        'diversified_sampler',
        {
          field: 'user',
          shard_size: 200,
          nested: bb.aggregation(
              'agg_user_id',
              'term'
              {
                 field: 'id'
              }
          )
        },
      );

To me, this would be a big win for new/onboarding devs + general readability.

Can we simplify the API even more? E.g. the queryBuilder().must still has a very flexible API, which is fine if things are intuitive to the user, but we usually get a lot of questions on this repo because the very flexible API of bodybuilder isn't obvious.

Even though the query api is less complex than the agg api, I think it would benefit from the same treatment by using a strongly typed config object in lieu of many arguments. ex:

bb.must({ type: 'match', field: 'message', query: 'this is a test' })
  .filter({ type: 'term', field: 'user', query: 'kimchy' })

Something like this would let a developer programatically construct their BB chain arguments with solid type safety. It would also making changing the api more straightforward/safer for devs working on the BB package itself. For example, if ES changes in a dramatic way and BB wants to support, the status quo would be to add more overloads and further complicate the API. Using a config object like in the example above would let us just add optional fields. There are some downsides to that approach as well, the biggest (IMO) is that it would be more difficult to visually parse than status quo if you do know BBs api already.

Also, I should note that I'm totally biased here - an implementation like this would better suit my teams usage. I'm not ramped up on other/common usages of BB, so could be way off base.

@johannes-scharlach
Copy link
Collaborator Author

I don't really see the difference between your example for nested aggregations and

const agg: AggregationBuilder = AB().aggregation(
  'agg_diversified_sampler_user',
  'diversified_sampler',
  {
    field: 'user',
    shard_size: 200,
  },
  BB().aggregation(
    'agg_user_id',
    'term',
    { field: 'id' }
  )
);

but my question is if additionally we should also allow

const agg: AggregationBuilder = AB().aggregation(
  'agg_diversified_sampler_user',
  'diversified_sampler',
  {
    field: 'user',
    shard_size: 200,
  },
  (ab: AggregationBuilder) => {
    ab.aggregation('agg_user_id', 'term', { field: 'id' });
  }
);

I definitely don't want an object based approach that is different to existing ES syntax. Using a different syntax must lead to simplification in building queries (and I think it's questionable if .filter('term', 'user', 'kimchy') is really simpler than .filter({ term: { user: 'kimchy' } })) or advantages in composing queries (e.g. nested aggregations or nested queries becoming cumbersome to compose).

I can warm up to

qb.must({ match: { message: 'this is a test' } })
  .filter({ term: { user: 'kimchy' } })

which could also be strongly typed (though any strong typing requires us to type the full ES query and aggregation API, which was so far a non-goal of this library)

@quinnlangille
Copy link
Contributor

(though any strong typing requires us to type the full ES query and aggregation API, which was so far a non-goal of this library)

Yes that's a good point, and I'll admit the "spirit" of the lib is a perspective that I'm totally lacking. If providing exhaustive types for ES is not something the lib wants to do, then I don't think the object based approach is a direction it should take either.

I wonder if there is some level of typing we could use without having to integrate the full ES query and it's APIs? We could probably add enums for the builder type arguments, rather than just string, and also check for required fields only in any config object. I guess it depends on the goal of not exhaustively typing ES (similar to what you suggested here) - if it's maintenance, then I think some additional typing would be fine, but if it's flexibility/zero-opinions then we should probably just leave it as is.

re: the nested agg example, I was just spitballing so no worries lol. I don't see a need to keep the callback-style syntax (other than less migration), passing BB().aggregation(...) is nicer syntax IMO but since it's so similar having both feels bloated.

@johannes-scharlach
Copy link
Collaborator Author

@danpaz I'd really appreciate your feedback here, too

@danpaz
Copy link
Owner

danpaz commented Jan 10, 2021

@johannes-scharlach thank you for starting this proposal, the discussion here has surfaced several problems I wasn't really aware of with usage of bodybuilder, since I also don't use bb or elasticsearch at work anymore. Let me see if I can summarize the main problems to solve that have been raised in the discussion so far:

  1. The callback-as-last-arg approach of nesting aggregations is a pain point, first because it's unintuitive, and second because it requires passing a single object around to achieve the nesting. Exposing something like an AggregationBuilder would simplify usage and allow for lazy loading. I totally agree there's an opportunity to improve the API here.
  2. Correct usage of the API isn't obvious, and we'd like to provide some hints to the developer on correct usage via stronger typing. This was indeed an early goal of the library but Typescript was not popular yet and I didn't think runtime checks would be ergonomic. Typing the entire ES syntax is probably not worth it, but I think some basic checks of commonly used fields would be beneficial.
  3. The API surface area is large and has some superfluous parts that could be removed.

My question to the group here is: do we actually need a major version bump to solve these problems? Certainly 3 would require a major bump to slim the API surface, but I feel like 1 and 2 could be added in a non-breaking way. E.g. we could continue to support the nested callbacks, and also expose the nested builders. Similarly, perhaps we could add a config-object call style without changing the list-of-strings style for the most basic usage.

My feeling is that bodybuilder is one of those libraries that ought to "just work" without too much thought from the developer. I think that's one of the reasons it's still popular after 5(!) years. We should just be sure that a major bump provides enough benefit to the user to justify the change.

I want to emphasize that though I was the original author of this library, these are really only my opinions and, especially since I don't use the library anymore, should be taken as input rather than decision :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants