Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on real-world codebase #30

Open
bgamari opened this issue Jul 4, 2018 · 6 comments
Open

Poor performance on real-world codebase #30

bgamari opened this issue Jul 4, 2018 · 6 comments
Labels
Milestone

Comments

@bgamari
Copy link
Collaborator

bgamari commented Jul 4, 2018

I was looking into using Ward to lint GHC's runtime system, starting with simple lock checking. Unfortunately even with only no privileges defined and enabling enforcement for a single file the check runs for more than 10 minutes before sending my laptop with 32GB of RAM into swap-death. This seems a bit high for a 50kLoC codebase.

Checking each source file individually typically takes around 30 seconds per file. Is this the recommended strategy for non-small projects?

@bgamari
Copy link
Collaborator Author

bgamari commented Jul 4, 2018

For the record, I am now following the example of check-mono.sh and first producing call-maps of the sources and then using the compiler mode to check these maps. This is still slow, but much more bearable. I did a bit of profiling of the compiler mode and noticed that the callmap parser appears to be responsible for most of the time and allocations.

@evincarofautumn
Copy link
Owner

Yeah, this is a known issue, and honestly one of the reasons my work on this fizzled out before I left Microsoft—I found it hard to iterate on the project when it was so slow on a nontrivial codebase, and not obvious to me how to fix that.

The performance issues were rooted primarily in language-c—the entire AST is lazy and includes a large amount of detail & indirections. @lambdageek wrote the call map code as a workaround for this, so yes, this is the recommended approach. He might have a better idea of what’s up with the perf there, but I’ll look into it.

@lambdageek
Copy link
Collaborator

Hey, I kind of got distracted for a while, but one of the last things that I pushed to language-c about six months ago was NFData instances for all the syntax datatypes. That means we can finally do something about

Ward/src/Graph.hs

Lines 74 to 76 in 05b02cf

-- Why can't we just deepseq tus'? Because language-c doesn't provide
-- NFData instances :-(
whnfList tus' `seq` CTranslUnit tus' firstLocation

There are a few other places in Graph.hs that use Data.Generics that could use some deepseq.

That ought to help with the memory usage if I interpreted the profiler output correctly.

In terms of time - i'm sure there's some low-hanging fruit in terms of the data representation, but after that we'll probably need to get smarter about the order in which we recompute the permissions on each iteration. Unfortunately I couldn't get tests to pass when I tried rewriting the algorithm as a classic dataflow analysis, so I don't have a good grasp on how to reason about possible transformations.

@bgamari
Copy link
Collaborator Author

bgamari commented Jul 6, 2018

Thanks for adding a note to the readme, @evincarofautumn!

@lambdageek
Copy link
Collaborator

@bgamari

I did a bit of profiling of the compiler mode and noticed that the callmap parser appears to be responsible for most of the time and allocations.

That's interesting.

For analyzing Mono, I saw the C parser + callmap generator taking a lot of time and memory, but the analysis run (parsing all the callmap files and running the global analysis) was relatively speedy.

@bgamari
Copy link
Collaborator Author

bgamari commented Jul 6, 2018

Well, to be clear I was profiling compiler mode with only callgraph inputs, so the C parser didn't have much of a chance to show up. That being said, a majority of the time was spent call-graph parsing.

@evincarofautumn evincarofautumn added this to the 1.0 milestone Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants