added logs rfc #4

wagnerf42 · 2019-08-19T14:42:22Z

hi, i'm back to work. let's start the rfc on logs.

nikomatsakis

I did a quick read through and left some minor comments -- I will try to write up something more substantial.

nikomatsakis · 2019-11-12T10:33:23Z

accepted/rfc0004-logs.md

+- fork join graph
+
+I'll start by the raw events. Since logging modify application behaviors we need to be able to log at a very low (and deterministic) cost.
+Basically each thread records in memory a few bytes tag for every event happening to him.


Suggested change

Basically each thread records in memory a few bytes tag for every event happening to him.

Basically each thread records in memory a few bytes tag for every event happening to it.

nikomatsakis · 2019-11-12T10:34:01Z

accepted/rfc0004-logs.md

+
+## Saving logs.
+
+It is the user's responsibility to not save logs while using threads. It should not crash but the graphs obtained


How are logs saved? I guess there is a function that gets called by the user to save logs?

yes, there is a save_logs function (global pool) or method on the pool.

nikomatsakis · 2019-11-12T10:34:26Z

accepted/rfc0004-logs.md

+Currently I depend on the following crates. Should I remove these dependencies ?
+- time (easy to remove)
+- itertools (harder to remove but ok)
+- serde (also easy to remove since I output logs in json)


I think we probably want to remove dependencies where we can

ok, that's not a big deal. that's fine.

nikomatsakis · 2019-11-12T10:34:48Z

accepted/rfc0004-logs.md

+
+## Svg
+
+All graph algorithms and svg display are not included.


I think we will want to create an external crate that is able to do the postprocessing and imagine manipulation

nikomatsakis · 2019-11-12T10:46:52Z

As I mentioned in #5, I've also created a logging system that I was using there to analyze performance. However, the focus of those logs was pretty different. They were focused on logging the internals of the scheduler, like how many sleeping threads there are at a given time, how many pending jobs, and so forth. Still, there are of course similarities.

I think overall it'd be great to have a mechanism (such as the one described in this RFC) that lets users log and visualize the parallelism they are getting. I think one thing that wasn't described very much in this RFC but which I would want to think about is how this mechanism is enabled/disabled and interacted with.

A few thoughts, in no particular order:

If possible, I think we should avoid requiring a #[cfg] or a cargo feature, but I'm not sure on this point. I've been leaning this way for the other logging mechanism as well. In principle, though, that requires a certain cost for every execution -- the branches to check if logging is enabled will be compiled in -- though hopefully a very minimal one.

My reasoning is mostly that testing around features and cfgs is a pain. You have to test with the enabled, disabled, and every which way, and they multiply. Moreover, we can tune the execution for logs being disabled (e.g., have an if that guards a function call) and it is unlikely to have an impact on performance.

What I didn't get from this RFC was some idea how users interact with logs. How are logs enabled and saved? It seemed to me that there was probably some method that they should call, and that they have to do so while parallelism is not happening. Presumably this is a method on the thread-pool? This seems "ok" but it may be inconvenient -- particularly in larger applications, the use of Rayon may not be readily accessible. An alternative might be to use an environment variable, but it raises the question of "when do we dump the events?".

(The approach I was trying in my internal event logger was to have events get flushed automatically, each time we exit a root event.)

I guess a final thought is -- who do we envision consuming these logs? What kind of "semver guarantees" do we want to provide around them? Are these mostly going to be used by us, in tuning parallel iterator implementations? Do we think end users will want them?

I'm somewhat inclined to try and avoid making guarantees. I'd prefer to say something like "the logging facility is provided for debugging and performance tuning purposes only; it may change radically or even be removed in the future". To that end, the more that we can avoid adding methods or other publicly visible bits of API related to logging, the better.

nikomatsakis · 2019-11-12T10:48:38Z

Oh, one final question that I was wondering about:

After logs are saved, is the data cleared? I had the impression from the RFC that data is never cleared. I guess this assumes then that we are working with short-lived benchmarks?

wagnerf42 · 2019-11-13T08:08:52Z

here are a few more answers to your questions:

about #cfg
you are right that it is painful (especially when chaining crates dependencies).
I really wanted to be able to remove all costs if unneeded while keeping the same source
code for both logged and unlogged runs.
there might be a possibility to do that without features : have a ThreadPool trait with one logging threadpool and one non-logging both answering the same api (join, join_context, scope, ...)`
the other possibility is to add conditionals everywhere which would be not so nice for me.
logs interactions:
logs are auto enabled when the feature is turned on.
then you use the save_logs function to get them out of the pool.
we could also autosave when destroying the pool to keep an identical api.
if we switch to different threadpools we could have logging threadpools and unlogging threadpools.
logs consumption:
I think the logs are great for understanding what goes on in your code
(especially with the extended features from rayon logs like hardware events or work computation).
For sure it is useful for rayon developers and teachers
(oh by the way I've got a course started now : http://wagnerf.pages.ensimag.fr/parallel-systems/) but I do hope it would be good for apps developers also.
the semver question is interesting because I've already been confronted to that when changing data file format. My take was to just revert using an older library.
data clearing:
currently nothing is cleared.
it would be possible to partially clear data though.
we need a mutex on save_logs, a counter indicating which task has not yet been saved.
data is appended at the end of a buffers list. when saving we could clear all buffers except the working one.

added logs rfc

c9a0f70

nikomatsakis mentioned this pull request Sep 15, 2019

improve the sleeping thread algorithm #5

Merged

nikomatsakis reviewed Nov 12, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added logs rfc #4

added logs rfc #4

wagnerf42 commented Aug 19, 2019

nikomatsakis left a comment

nikomatsakis Nov 12, 2019

nikomatsakis Nov 12, 2019

wagnerf42 Nov 13, 2019

nikomatsakis Nov 12, 2019

wagnerf42 Nov 13, 2019

nikomatsakis Nov 12, 2019

nikomatsakis commented Nov 12, 2019

nikomatsakis commented Nov 12, 2019

wagnerf42 commented Nov 13, 2019

	Basically each thread records in memory a few bytes tag for every event happening to him.
	Basically each thread records in memory a few bytes tag for every event happening to it.


		## Saving logs.

		It is the user's responsibility to not save logs while using threads. It should not crash but the graphs obtained


		## Svg

		All graph algorithms and svg display are not included.

added logs rfc #4

Are you sure you want to change the base?

added logs rfc #4

Conversation

wagnerf42 commented Aug 19, 2019

nikomatsakis left a comment

Choose a reason for hiding this comment

nikomatsakis Nov 12, 2019

Choose a reason for hiding this comment

nikomatsakis Nov 12, 2019

Choose a reason for hiding this comment

wagnerf42 Nov 13, 2019

Choose a reason for hiding this comment

nikomatsakis Nov 12, 2019

Choose a reason for hiding this comment

wagnerf42 Nov 13, 2019

Choose a reason for hiding this comment

nikomatsakis Nov 12, 2019

Choose a reason for hiding this comment

nikomatsakis commented Nov 12, 2019

nikomatsakis commented Nov 12, 2019

wagnerf42 commented Nov 13, 2019