Update render queue limits #152

Komzpa · 2017-01-08T19:38:32Z

Adjust render queue depth so that a rendering server dirty queue length is shifted from several minutes (1000 / 7 ~= 142 s ~= 2.3 minutes) to a whole day.

Metrics are taken from http://munin.osm.org/static/dynazoom.html?cgiurl_graph=/munin-cgi/munin-cgi-graph&plugin_name=openstreetmap/yevaud.openstreetmap/renderd_processed&size_x=800&size_y=400&start_epoch=1483213021&stop_epoch=1483904221

Adjust render queue depth so that a rendering server dirty queue length is shifted from several minutes (1000 / 7 ~= 142 s ~= 2.3 minutes) to a whole day.

pnorman · 2017-01-08T22:38:33Z

includes/render_config.h

-#define DIRTY_LIMIT (1000)
-
+#define DIRTY_LIMIT (216000)
+#define HASHIDX_SIZE (477856)


Why is a higher HASHIDX_SIZE needed for with metatiles?

According to @gardster, this is not required and is redefined anyway later. I did it by analogy to the below. It looks like it's safe to remove both.

pnorman

The logic of this seems tailored specifically to OSM.org. Does it also apply in general? Is a large queue desirable behavior?

pnorman · 2017-01-08T22:39:01Z

includes/render_config.h

-#define DIRTY_LIMIT (1000)
-
+#define DIRTY_LIMIT (216000)
+#define HASHIDX_SIZE (477856)
 #else
 #define QUEUE_MAX (1024)
 #define REQ_LIMIT (512)
 #define DIRTY_LIMIT (10000)


Shouldn't the non-metatile limit be higher than the metatile limit?

Komzpa · 2017-01-08T22:55:45Z

The values are tailored based on osm.org statistics.

On non-osm.org installs:

if it's a local small install, the queue just won't grow. No major change.
if it's a popular but not overloaded install, the queue will grow in the same fashion as on osm.org, and will be shrinking on non-popular hours. Starting from day 2 the number of fresh tiles served from cache directly grows, outweighing the queuing overhead by a lot.
if it's an overloaded install, the queue would be large and won't shrink. This adds some time for queuing, but this shouldn't be noticeable as the install is overloaded and is unable to serve tiles on time anyway.

Komzpa · 2017-11-15T11:24:05Z

hello, what do we do to get it merged?

Zverik · 2017-11-15T15:27:29Z

includes/render_config.h

 #else
 #define QUEUE_MAX (1024)
 #define REQ_LIMIT (512)
-#define DIRTY_LIMIT (10000)
-#define HASHIDX_SIZE 22123


What is HASHIDX_SIZE and why are you removing it?

This line is no-op.
It is used here:

mod_tile/src/request_queue.c

Line 39 in 62c4a5a

return key % queue->hashidxSize;

and redefined here:

mod_tile/includes/request_queue.h

Line 31 in 62c4a5a

#define HASHIDX_SIZE 2213

(see includes order in

mod_tile/src/request_queue.c

Line 29 in 62c4a5a

#include "render_config.h"

)

So does the hash table have 2213 buckets? or 22123 buckets? Not that either is anywhere big enough once you have a two million entry queue - that will give an average chain length of either 100 or 1000 either of which seems way too large.

Seems like somebody wants to increase HASHIDX_SIZE from default 2213 to 22123 when METATILE is not defined. But it's so ugly and include order dependent that should be fixed.

@apmon According to the code history, here is the initial commit to use different values for HASHIDX_SIZE:
da19811

But later after the refactoring, it was broken, and now HASHIDX_SIZE is always (not obvious) is 2213:
707e125

@tomhughes I've updated numbers to match ratio.
We're not using non-metatile mode, so it's not 2M, but 200k records.

@vng I've updated request_queue.h to not rewrite the value.

tomhughes · 2017-11-15T16:10:23Z

Not having our servers running in full meltdown mode 24x7 could of course be considered a feature rather than a bug...

tomhughes · 2017-11-15T16:12:58Z

Incidentally I believe the current queue lengths are the results of previous experimentation by @apmon although I can't claim to remember all the details.

I have some sort of memory that there were performance issues managing the queue if it was too big?

kocio-pl · 2017-11-21T15:28:38Z

There's a lot of time between 2 minutes and the whole day. Is it possible to test a single server with something like an hour or 10 minutes for example?

Komzpa · 2017-11-26T20:50:45Z

@kocio-pl I can provide such a branch. Does anybody have a machine to run it on, with a proper load?
@tomhughes can we do a split-test on tile.osm.org? :)

tomhughes · 2017-11-26T20:55:04Z

Not easily. At the very least you would need to provide Ubuntu compatible binary packages.

kocio-pl · 2017-11-26T20:58:54Z

I guess you could ask @rrzefox if he's willing to test it.

kocio-pl · 2017-11-26T21:06:56Z

Maybe it makes sense to start with making these variables configurable through mod_tile.conf. Once the code and package are updated, it would be much easier to experiment and even tweak it the way any given server admins like it.

rrzefox · 2017-11-26T21:23:55Z

I guess you could ask @rrzefox if he's willing to test it.

Sorry, no can do - I'm using tirex, not renderd. Which incidentally does not have a queue limit, at least none I've ever hit...

Maybe it makes sense to start with making these variables configurable through mod_tile.conf.

You probably mean renderd.conf? mod_tile.conf would be something read by apache, and apache is not handling the queue, the render-daemon is.
But indeed, something like queue length is a value I'd expect to be configurable, not hardcoded.

tomhughes · 2017-11-26T21:32:32Z

Making it configurable would certainly be a good idea - then we could do a new build and test different configurations.

mmd-osm · 2017-12-03T14:49:36Z

I'm a bit confused by this proposed change. There are basically 5 different queue classes, which are being handled by request_queue_fetch_request in the following order:

Priority Request Queue,
Request Queue,
Low Priority Request Queue,
Dirty Queue, and
Bulk Request queue.

As long as there are queue entries in a higher class, they will be processed first. (I wonder why this doesn't lead to starvation issues, but that's another issue).

To my understanding, changing the DIRTY_LIMIT will only affect entries in category (4), that is the Dirty Queue. However, this queue seems pretty empty most of the time anyway: munin link

Isn't this queue only relevant, if someone adds /dirty to an URL to manually trigger a re-rendering of a tile?

Komzpa · 2017-12-03T15:00:37Z

hey @mmd-osm,
Dirty Queue is not rendered during high load times, and not kept until low load times.
https://munin.openstreetmap.org/openstreetmap/render.openstreetmap/renderd_queue.html

Dirty queue holds all the tiles that were rendered previously and marked dirty due to someone making an edit in the area. They can be rendered on change instead of being rendered later on the first request.

mmd-osm · 2017-12-03T15:14:26Z

Thanks, your munin link makes a lot more sense than mine. So yes, Dirty Queue as prio 4 both has to discard entries due to its currently limited size, and there's also a starvation issue here, because most everything else is more important than processing those dirty tiles.

Bounded overtaking would be one potential approach to limit those starvation effects, but that's currently not implemented. It may be even better in the long run than just increasing the dirty queue size.

Also, I really thought, only /dirty would add new entries to the Dirty Queue, thanks for clarifying that bit.

apmon · 2017-12-03T18:56:34Z

Thought I'd throw in some clarifying remarks.

Assuming this information is still accurate, I haven't looked at the code in quite a while so I don't know if things have changed in this respect.

Priority Request Queue, Request Queue, Low Priority Request Queue are all meant for on the fly rendering. i.e. if a user requests a tile in mod_tile and it is out of date in some way or another, mod_tile delays serving that tile while it tries to render it. So these renderings are "time critical" as they delay the serving of a tile. Priority Request Queue thereby handles tiles that have no stale cached tile. I.e. if the on the fly rendering fails, mod_tile will deliver a 404 tile not found response. That is why this is the highest priority. Request Queue servers tiles that have a cached stale tile for which we know the data has changed, so if mod_tile ends up serving the cached version as renderd doesn't render the tile in time, the user will see "wrong"/"old" geometry/features. Low Priority Queue is for when the style sheet has changed, but not the data it self.

As all of these queues are designed for on the fly rendering, their size is limited to 32 entries. It is not reasonable to believe renderd will render more than 32 metatiles in the time before mod_tile times out and serves either a stale or 404 tile. So there is no point in having a longer queue there, as that would just add to the starvation issue.

If a tile tries to be added to lets say the request queue, but the queue is full, it spills over into the dirty queue, which is the "large batch background" rendering queue which isn't "time critical" anymore, as there is no user directly waiting for the answer. This queue is there to try and steady the load on the server and make sure that the server is still doing something useful during low load times.

So increasing the queue length beyond 2 minutes definitely seems good. However, there are definitely a couple of issue with increasing the dirty queue size. As as been pointed out, the dirty queue can easily be starved out. Now if during a burst of high load the priority queue with missing tiles overflow, some of those missing tiles request end up in the dirty queue. If the load then drops a little so that the priority queue is no longer full, but e.g. the low priority queue is still saturated, then renderd will only render those style sheet changed tiles, but never get around to rendering the missing tile that has dropped into the dirty queue. If a new user now tries to look at that dirty tile, mod_tile will submit it to renderd again, but as it is already queued (in the dirty queue) renderd won't requeue it into the priority queue, even if the priority queue could now hold the request. So it is possible to starve out missing metatiles in the dirty queue, which is very bad. With a shorter dirty queue this risk of starving important tiles out is somewhat reduced.

Furthermore, back then (not sure if that has changed by now), renderd didn't really use any library routines. So instead of using a decent quality existing implementation of a priority queue, I coded up my own implementation. And that implementation was in the category "good enough at the time", not really in the "good" category.

So ideally that hole queuing implementation gets replaced by a proper library implementation of a priory queue with re-queueing of higher priority requests and with something like an epsilon greedy dequeueing strategy to reduce starvation issues.

Update render queue limits

b0ae60f

Adjust render queue depth so that a rendering server dirty queue length is shifted from several minutes (1000 / 7 ~= 142 s ~= 2.3 minutes) to a whole day.

pnorman reviewed Jan 8, 2017

View reviewed changes

pnorman suggested changes Jan 8, 2017

View reviewed changes

Fixed review notices

77d23f7

Zverik reviewed Nov 15, 2017

View reviewed changes

Komzpa added 2 commits November 16, 2017 19:10

Make hash index linearly larger

6deaa83

remove hash index size redeclaration

bf4a7ca

Marc-marc-marc mentioned this pull request Mar 26, 2018

performance tile.openstreetmap.fr osm-fr/infrastructure#27

Closed

mmd-osm mentioned this pull request Sep 26, 2018

32-bit instead of 8-bit PNG output #64

Closed

mboeringa mentioned this pull request Sep 21, 2023

Increase queue limits to reflect modern machines #336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update render queue limits #152

Update render queue limits #152

Komzpa commented Jan 8, 2017

pnorman Jan 8, 2017

Komzpa Jan 8, 2017

pnorman left a comment

pnorman Jan 8, 2017

Komzpa commented Jan 8, 2017

Komzpa commented Nov 15, 2017

Zverik Nov 15, 2017

Komzpa Nov 15, 2017

tomhughes Nov 15, 2017 •

edited

vng Nov 15, 2017

vng Nov 15, 2017 •

edited

Komzpa Nov 16, 2017

tomhughes commented Nov 15, 2017

tomhughes commented Nov 15, 2017

kocio-pl commented Nov 21, 2017

Komzpa commented Nov 26, 2017

tomhughes commented Nov 26, 2017

kocio-pl commented Nov 26, 2017

kocio-pl commented Nov 26, 2017

rrzefox commented Nov 26, 2017

tomhughes commented Nov 26, 2017

mmd-osm commented Dec 3, 2017 •

edited

Komzpa commented Dec 3, 2017

mmd-osm commented Dec 3, 2017 •

edited

apmon commented Dec 3, 2017

Update render queue limits #152

Are you sure you want to change the base?

Update render queue limits #152

Conversation

Komzpa commented Jan 8, 2017

pnorman Jan 8, 2017

Choose a reason for hiding this comment

Komzpa Jan 8, 2017

Choose a reason for hiding this comment

pnorman left a comment

Choose a reason for hiding this comment

pnorman Jan 8, 2017

Choose a reason for hiding this comment

Komzpa commented Jan 8, 2017

Komzpa commented Nov 15, 2017

Zverik Nov 15, 2017

Choose a reason for hiding this comment

Komzpa Nov 15, 2017

Choose a reason for hiding this comment

tomhughes Nov 15, 2017 • edited

Choose a reason for hiding this comment

vng Nov 15, 2017

Choose a reason for hiding this comment

vng Nov 15, 2017 • edited

Choose a reason for hiding this comment

Komzpa Nov 16, 2017

Choose a reason for hiding this comment

tomhughes commented Nov 15, 2017

tomhughes commented Nov 15, 2017

kocio-pl commented Nov 21, 2017

Komzpa commented Nov 26, 2017

tomhughes commented Nov 26, 2017

kocio-pl commented Nov 26, 2017

kocio-pl commented Nov 26, 2017

rrzefox commented Nov 26, 2017

tomhughes commented Nov 26, 2017

mmd-osm commented Dec 3, 2017 • edited

Komzpa commented Dec 3, 2017

mmd-osm commented Dec 3, 2017 • edited

apmon commented Dec 3, 2017

tomhughes Nov 15, 2017 •

edited

vng Nov 15, 2017 •

edited

mmd-osm commented Dec 3, 2017 •

edited

mmd-osm commented Dec 3, 2017 •

edited