Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search speed greatly reduced when using Counts.publish() #59

Open
Diginized opened this issue Aug 9, 2015 · 9 comments
Open

Search speed greatly reduced when using Counts.publish() #59

Diginized opened this issue Aug 9, 2015 · 9 comments

Comments

@Diginized
Copy link

First, thanks for creating this package, I was accomplishing some of this with custom method calls but this offers many handy features. I also find it much easier to use than the many others out there that seek to help with counts and paging.

The one downside is that it seems to add a lot of lag to my searches and it's doing so in a blocking (synchronous) fashion. I don't really care if the total count takes a second or two to respond, if it can run in the background and just update the template when it's ready. But holding up the rest of the app is a bummer. Is there's a way to get this to run asynchronously?

Example:
The following code executes and returns/displays 30 records to browser almost instantly (at most 5ms after button click)

    Meteor.publish("search", function (searchParameters) {
        cursor = Contacts.find(searchParameters); 
        //with Counts disabled, results on screen feel instant. 
        //counts = Counts.publish(this, 'totalSearchResultsCounter', Contacts.find(searchParameters));
        return cursor;   
    });

The same code with Counts.publish() added is taking between .5 and 2 full seconds before the first signs of life are hitting the browser, and then things drop in sequentially (the total count from Counts.publish (y), then the count of records (x) on page (showing x of y records), then the records themselves)

    Meteor.publish("search", function (searchParameters) {
        cursor = Contacts.find(searchParameters); 
        //with Counts enabled, results have a lot of lag because it's blocking everything else form running. 
        counts = Counts.publish(this, 'totalSearchResultsCounter', Contacts.find(searchParameters));
        return cursor;   
    });

Any suggestions on how to improve? Thanks again!

@boxofrox
Copy link
Contributor

@Diginized, I've not benchmarked the publish-counts, but...

This test seems to only load the initial data. publish-counts uses Cursor.count() to get the initial count from a cursor which is my prime suspect.

Does your test investigate latency in reactive updates? i.e. does the page respond more slowly when new docs are inserted that match the cursor's search parameters? May be difficult to check, but if no latency occurs on a reactive update, then it supports my theory that the problem is with count().

You could also add another example to your test. If this also delays 0.5 - 2 seconds, then Meteor's count() is the problem.

    Meteor.publish("search", function (searchParameters) {
        cursor = Contacts.find(searchParameters); 
        //with Counts enabled, results have a lot of lag because it's blocking everything else form running. 
        counts = cursor.count();
        return cursor;   
    });

If count() is the problem, I'm not sure what we can do to improve the response time. Relying on the observer.added callback will block until all matching documents are processed (if I understand the Meteor docs correctly), so I don't expect to see an improvement there.

@boxofrox
Copy link
Contributor

Other considerations:

  1. Cursor values get emptied when passed to Counts.publish() #58 indicates @Diginized is using regex for search parameters. I'm curious if adding normalized fields (a la { norm_lastname: lastName.toLower() }) will mitigate some of the observed latency while satisfying the case insensitive requirement.
  2. the fix in Cursor values get emptied when passed to Counts.publish() #58 will allow use of one cursor in both Meteor.publish and publish-counts. If Meteor doesn't consolidate queries, then the fix will result in processing one cursor instead of two.
  3. @Diginized, if third example above is performant, try fourth example with two separate cursors.
Meteor.publish("search", function (searchParameters) {
    cursor = Contacts.find(searchParameters); 
    //with Counts enabled, results have a lot of lag because it's blocking everything else form running. 
    counts = Contacts.find(searchParameters).count();
    return cursor;   
});

@Diginized
Copy link
Author

@boxofrox , you are correct, the lag I'm citing is only with the initial pageload of data. I think you are probably also correct about .count() being the slow part. Here's what I've learned:

  • Your first test to try cursor.count() is a good suggestion. That definitely accounts for a noticeable amount of the lag. The remainder is probably stuff like updates to the template which is to be expected.
  • Your suggestion # 3 above (two cursors) performs about the same as my initial code and your cursor.count() example, so there doesn't seem to be any benefit to creating two separate cursors.
  • I'm guessing the real difference in performance here is that when not using the Counts or cursor.count() I'm fetching on only the first 30 rows {limit: 30} and that's probably why the records are coming back to the client so fast. When doing one of the count methods, the limit is ignored because we're trying to get the total count, and that total count is just slower.

I don't really understand the observer.added parts. This database doesn't change too often (very small number of users doing data input) so I'm not even as worried about the reactive updates or certain how to test them.

Regarding Regex/denormalized: I've definitely thought about doing the denormalized names instead of regex searching, like you suggested, but so far the regex searching is working fast enough (when a count isn't happening) that I feel it's not causing any troubles. If the database gets into the 10s of millions of records then this is something I thought I might have to look into.

Conclusion (for now): Your suggestion # 2 above (The fix described in issue #58 ) definitely has faster performance than any of the things discussed above. If you do make that change and allow the search values to pass through, I think that's a big help for performance. I have a feeling there will be a negative side-effect to it, which I'll note when I reply to that thread in a min, so maybe treating that behavior as an option is the best of all worlds.

@Diginized
Copy link
Author

An Aside, to whom it may benefit:
This discussion has actually got me thinking about some even better ways possibly to improve the UX and perceived performance. Part of what I'm trying to accomplish here is to retain that initial "wow, that came back fast!" that I had when I first built this search form (before implementing any showing x of y records counts.

I'm doing three things with each search: displaying paged results, counting how many are displayed, and finally counting how many else are out there. I currently perform all of these things when the user hit's the "search" button, which is why I want it to be snappy. Another option that this conversation brought back to mind is that I could have an event listener on the various form fields that reactively updates parameters being passed to publish-counts so that the user can see "total results expected" before they ever hit the search button. This is done on some faceted search systems I've seen out there. Although more code on me, there are two (and maybe a third) benefits to this:

  1. The user gets a preview of their search before they ever fire it off and before waiting for any data to transmit to the client (which means they won't hit search till they really mean it), and
  2. When they do finally hit the search button they'll get those first 30 or so records back really fast/snappy.
  3. Finally, I think this might even make the two processes independent of each other, so if a user quickly typed in the one field they wanted and hit the enter key.. the results could be fetched async of the count that's happening because the field detected a blur or pause. Have to test to see if this gets the desired result, but it sounds right..

@boxofrox
Copy link
Contributor

FYI to better explain the observer.added bit from earlier. From the Meteor docs on cursor.observe:

Before observe returns, added (or addedAt) will be called zero or more times to deliver the initial results of the query.

So what I meant was if publish-counts didn't use cursor.count(), it would instead have to rely on that behavior in the docs to count the number of times added was called when the initial results were delivered. In point of fact, there are 3 major types of counts: basic, countByField, and countByFieldLength. The latter two types cannot use cursor.count() and rely on the document behavior of cursor.observe to initialize the count.

I plan to fix #58 before the week is out, but it will follow my work on #57.

@Diginized
Copy link
Author

Makes sense. Thanks.

@tmeasday
Copy link
Member

Again, haven't read this closely, but I'll say two things (chime in if I can be more helpful):

  1. Regexp searches are really slow in mongo. So if you are doing a count over a big set with a regexp, I'd expect it to take a while (and probably not a great idea in the long run).
  2. Publications "block" in meteor Provide this.unblock in publish context meteor/meteor#853 -- this is outside of readiness and will happen even if the slowness is on mongo's side. You can work around it pretty easily though: either use arunoda's package, or try:
  Meteor.publish(..., function() {
   var cursor = ...;

   var sub = this;
   Meteor.defer(function() {
      Counts.publish(sub, 'name', cursor);
   });

   return cursor;
  });

What this does is put the publish in a separate "thread" and doesn't block other pubs. _There are edge cases if the sub stops before the count is done however_ (we should probably test these at some point).

@Diginized
Copy link
Author

@tmeasday , thank you very much, both of these are really good to know. I actually missed that blocking was the default for publications (guess I haven't read as much of the docs as I thought!) That's part of why I'm now trying to mess with much larger test datasets, some of these things aren't really noticeable in the early stages. Both the Meteor.defer() and https://github.com/meteorhacks/unblock package are good to know about.

I admit I'm still very used to SQL's case insensitivity, which I feel is really the more suitable default for searches/queries. Pre-processing and denormalizing Mongo data and making the database, and making it that much larger is going to take some getting used to. I assume that's really the only good suggestion out there (keep lowercase fields next to the originals),given what you're saying?

But even if I were to do this, it still doesn't really help with the "I forget your full last name and think I might know the first few letters" scenario... which perhaps is best done via a typeahead/autocomplete system against an index.

@tmeasday
Copy link
Member

@Diginized denorming is pretty normal in mongo. But there are other options you could look at, like full text search..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants