Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft updated doc for profiler #1144

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ridgeworks
Copy link
Contributor

Added doc for additional options on profile/2.

Left "Visualizing profiling data" section as is; figure looks a little dated but difficult to generate decent X-Windows screenshots on MacOS.

Added doc for additional options on `profile/2`.

Left "Visualizing profiling data" section as is; figure looks a little dated but difficult to generate decent X-Windows screenshots on MacOS.
man/profile.doc Outdated
library. The latter can be hooked, which is used by the XPCE module
\pllib{swi/pce_profile} to provide an interactive graphical
frontend for the results.

The information gathering component can be run in two modes, controllable using
the \const{ports} option. The first mode (\const{ports(false)}) minimizes the
memory required to store the call tree, but only accumulates the the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "the the"

man/profile.doc Outdated
memory required to store the call tree, but only accumulates the the
\emph{call} port count for each predicate. (See \secref{debugoverview} for more information
on the Prolog "Byrd Box Model" or "4 Port Model".) This mode is most beneficial for long
profiler sessions on running applications. The second mode (\const{ports(false)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ports(true)?

If \const{true} counts on all ports will be accumulated at the possible
expense of added memory overhead. If \const{false} only the \emph{call} port
counts are accumulated but memory overhead is reduced in some situations.
If neither value is specified, the default is the "classic" view which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "classic" view different from specifying either ports(true) or ports(false)? If so, I suggest adding a ports(classic) option.

man/profile.doc Outdated

\begin{description}
\termitem{sampling_rate}{+Rate}
\arg{Rate} is numeric value between 1 and 1000 specifying the number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is numeric/is a numeric/

@@ -144,12 +174,12 @@ While the program executes under the profiler, the system builds a
kernel: one that starts a new goal (\emph{profCall}), one that tells the
system which goal is resumed after an \emph{exit} (\emph{profExit}) and
one that tells the system which goal is resumed after a \emph{fail}
(i.e., which goal is used to \emph{retry} (\emph{profRedo})). The
profCall() function finds or creates the subnode for the argument
(i.e., which goal is used to \emph{retry} (\emph{profFail})).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use \exam{...} for examples (not \emph{...})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\const{} is for constants (identifiers, atoms, ...), \exam{} is for (example) inline code fragments (i.e., generic "code"). Functions are written name() (will be styled automatically). Terms may be written as \term{name}{args}, especially if the arguments are placeholders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proof reading; I'll update the PR to fix the typos.

Regarding "classic" mode; perhaps I'm overly pedantic but I think a mode which displays incorrect information in some situations is a mistake. It's (too?) easy to enable that mode buy just by omitting the ports option so, personally, I'd rather not legitimize it further in the doc.

As for the tex conventions used by SWIP, I just tried to mimic what was there previously as I don't really understand the nuances, and the changes required (if any) to what's been submitted. This is just a draft so I assume the editor-in-chief will have the final say for the official version.

Copy link
Contributor

@kamahen kamahen Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused: if I specify ports(false), do I get the current (somewhat wrong) behaviour? Do I get a different behaviour if I don't specify either ports(false) or ports(true)?

If ports(false) gives the same behaviour as not specifying the ports option at all, then there's no problem. If this isn't the case, I don't like it because the API would have no way of specifying the default behaviour.

[I'll leave the markup fixing to Jan -- I often get it wrong because there's no documentation about the markup and I got lost while trying to read the TeX macros]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three different behaviours at the moment. The API (predicate '$profile'/4) supports all three. The user doc "should" explain how each one is specified in the profile/2 predicate (no ports option defaults to "classic" mode).

I'd rather there were just two modes: ports(true) and ports(false); that's how I intend to use it.

@kamahen
Copy link
Contributor

kamahen commented Mar 12, 2023

If the ports(classic) produces a wrong result, I see no reason to continue supporting it; just have the two values ports(false) and ports(true) - you might treat ports(false) as a bug-fix for the "classic" version.

@ridgeworks
Copy link
Contributor Author

I agree, although it's not quite so clear-cut. "classic" port counts will be accurate except for indirect recursion loops. Perhaps an argument can be made that that if you know what you're doing and/or you can somehow mark suspect data, you can almost have the best of both worlds. I would prefer to keep it simple for the average user (and the doc/implementation).

@kamahen
Copy link
Contributor

kamahen commented Mar 12, 2023

I have no opinion on whether there should be 2 or 3 options. I do have an opinion about APIs; and I don't like an API that has no positive way of specifying something. For example, suppose you have a predicate:

my_profile(PortOption) : -
    profile(my_goal, [port(PortOption)]).

then there's no way to specify the "classic" profile. Instead you'd have to do:

my_profile(PortOption) : -
    (  PortOption == classic
    -> profile(my_goal)
    ;  profile(my_goal, [port(PortOption)])
    ).

which is not as nice an API.

@ridgeworks
Copy link
Contributor Author

As it turns out, the first version will work just fine with ports(classic). But as one with an opinion, I wish it didn't. (It's an interesting point as to when the API to profile/2 matters anyway; in such cases is version 2 so problematical?)

@kamahen
Copy link
Contributor

kamahen commented Mar 12, 2023

I think we're arguing over the colour of a bikeshed - as you say, the API to profile/2 probably doesn't matter. Even if a "foolish consistency is the hobgoblin of little minds", it's not wrong to be consistent as much as possible; and reducing complexity is also good. (I'm in favour of removing "classic" completely, on the grounds that it was wrong.)
I shall say no more (on this topic).

@ridgeworks
Copy link
Contributor Author

Hope it didn't feel like an argument; not my intent. More like a small difference in perspective.

In any case, thanks for taking the time to read this. Doc is not my strong suit (if I have one).

@kamahen
Copy link
Contributor

kamahen commented Mar 13, 2023

I should have said that we're discussing the colour of a bikeshed. I have a pretty thick skin when it comes to technical discussions - my intent is to get to the best solution, not to prove that I'm right and you're wrong. So, I hope I haven't given the wrong impression. :)

@JanWielemaker
Copy link
Member

As is, I think we should step back. I think there are two questions to answer first.

  1. Does the ports(true) option lead to prohibitive expansion of the call tree in realistic examples?
  2. If (1) is realistic, can we add some marks that tells us which port counts are wrong, and possibly by how much?

As to (1), I think it can because deep mutual recursion isn't something unrealistic in Prolog. So, not-so-theoretical code may run fine but fills up all memory while profiling. An example is ordered merging based on compare/3 and some list iteration. Lists can have millions of elements, creating 2 profile nodes per element. To me, this means the current ports(true) cannot be the default. This means we should either go for ports(false) and possibly suggests to re-run with ports(true) or keep the current call tree compaction.

As to (2), I think we can, if we create an indirect recursion loop, mark the node as such. If nothing was marked, all our counts are correct. Also all nodes that have no marks above them are correct. I'm not sure whether we can do something with branches below the marks. Possibly, we could check that the walks over the call tree for exit and fail do (not) walk over such a mark and rule more as correct and/or give an upper bound on how much off they can be? Even with the simple approach touch, the compacted version could simply give the correct counts and only give the call counts for predicates involved in indirect recursion.

@ridgeworks
Copy link
Contributor Author

As to (1), I think it can because deep mutual recursion isn't something unrealistic in Prolog. So, not-so-theoretical code may run fine but fills up all memory while profiling. An example is ordered merging based on compare/3 and some list iteration. Lists can have millions of elements, creating 2 profile nodes per element.

Don't intend to be argumentative, but this seems pretty theoretical to me. Sure it's possible to write such an example (construct and process million element lists?), but I have difficulty understanding why anyone would do so (other than artificial benchmarks). And a million element list takes a minimum of 24 Mbytes just by itself, so an additional overhead of 200 Mbytes of profiler overhead doesn't feel all that prohibitive to me these days. Not to mention that surely one would have a pretty good handle on performance of 1000 element lists before scaling up to a million elements, so the need for detailed profiling at that excessive size seems questionable

At the end of the day it's not my call but, subject to a real world counter example, I don't see any need for any more than two port's options and I don't think it even matters much which is default. The complications of figuring out how to efficiently mark erroneous information, pass that to the output generator, modify the GUI accordingly, and explain it all in user documentation just aren't worth the effort IMO.

@kamahen
Copy link
Contributor

kamahen commented Mar 23, 2023 via email

@JanWielemaker
Copy link
Member

Don't intend to be argumentative, but this seems pretty theoretical to me. Sure it's possible to write such an example (construct and process million element lists?), but I have difficulty understanding why anyone would do so (other than artificial benchmarks). And a million element list takes a minimum of 24 Mbytes just by itself, so an additional overhead of 200 Mbytes of profiler overhead doesn't feel all that prohibitive to me these days. Not to mention that surely one would have a pretty good handle on performance of 1000 element lists before scaling up to a million elements, so the need for detailed profiling at that excessive size seems questionable

Seems we live in a different Prolog universe 😄 . I've had the pleasure of working with Prolog processes using hundreds of giga bytes, including many million element long lists. There is nothing artificial about these. And yes, merging multiple of such (ordered) lists can be done quite ok and result (using the most natural implementation) in a mutual recursive predicates. A 10 fold space increment is a problem, exactly for these applications that stress the limits of your hardware. Not to mention a malloc() for every cell we are processing.

You seem to live mostly in domains using Prolog for (backtracking) search. Those programs typically do not stress memory a lot. Several of my projects deal with large amounts of data, (often) fairly simple logic and aggregation steps that often generate huge lists.

@ridgeworks
Copy link
Contributor Author

Seems we live in a different Prolog universe 😄 . I've had the pleasure of working with Prolog processes using hundreds of giga bytes, including many million element long lists. There is nothing artificial about these.

Apparently. My industry experience was on computers with disks smaller than memory sizes these days, and memory.., well you get the picture. I have a hard time imagining such behemoths, let alone profiling them. I would hazard a guess that not very many Prolog users are dealing with hundreds of giga bytes of data.

I appreciate Peter's offer for getting some real experience to chew on. My suspicion is that the issues will be more on the analyse/display side than the collection side.

In any case, the profiler now works much better for my "limited" applications, so I don't intend to put any more effort into it.

@JanWielemaker
Copy link
Member

I would hazard a guess that not very many Prolog users are dealing with hundreds of giga bytes of data.

In my last academic job we had a machine with 1Tb main memory and 48 cores/96 threads 😄

In any case, the profiler now works much better for my "limited" applications, so I don't intend to put any more effort into it.

I'm happy with the improvements. Thanks!

@kamahen
Copy link
Contributor

kamahen commented Mar 24, 2023

Do you want me to try the profiler with my moderately long lists? (It'll take me a little bit of work because it's part of a larger batch job, so I need to isolate one large step and re-run it with the profiler)

@ridgeworks
Copy link
Contributor Author

Do you want me to try the profiler with my moderately long lists? (It'll take me a little bit of work because it's part of a larger batch job, so I need to isolate one large step and re-run it with the profiler)

If it's not too much trouble, that would be great. If it, or some part thereof, can be packaged up for me to poke at, I'm happy to go that route (assuming it works on my meagre computer resources); whatever works for you.

@JanWielemaker
Copy link
Member

Bottom line to me is whether you consider mutual recursive procedures on long lists to be something one finds in the wild. From the library, ord_union/3 is a perfect example. Real-life code. Now you need to believe there are real Prolog programs processing ordered sets with millions of elements. I have seen and used them. QED 😄

Now it is also true that this scenario is not that common. It might be an option to change the collection to use ports(true) until a specified recursion depth and use the "classic" algorithm above. All that would need is to add a recursion depth field to the nodes.

@ridgeworks
Copy link
Contributor Author

Regarding profiling specifically, for this "uncommon scenario" you have ports(false) mode. And you don't lose much (anything?) because it is exactly this scenario where the ports counts are unreliable. So I'm still unconvinced the classic mode is worth the added complexity, and building in more complexity doesn't really solve anything IMO.

@JanWielemaker
Copy link
Member

So I'm still unconvinced the classic mode is worth the added complexity, and building in more complexity doesn't really solve anything IMO.

Only some of the port counts are wrong. Only those for predicates involved in a branch of the call tree subject to mutual recursion are wrong. That leaves many correct in most applications. As it isn't too hard to identify the wrong ones, that seems a promising route. I don't care too much though. It works pretty well as it does right now. It is mostly a little complicated to explain.

@ridgeworks
Copy link
Contributor Author

What is the status of this PR? I'd like to submit a PR to fix a bug in the cmpReals code (pl-gmp.c) so I think need to capture a new copy of swipl-devel on github. Can that be easily managed without closing this first?

Enables module simplex to pass sandbox safety check
@JanWielemaker
Copy link
Member

This pull request has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/using-format-2-with-attributed-variable-on-a-pengine/6996/18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants