Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gosrc2cpg] Memory cleanup #4407

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

pandurangpatil
Copy link
Contributor

As we are processing the main source code in two phases. We create the initial list of AstCreator first pass and we hold the data like ParserResult, Line no to line code mapping, parserNodeInfo cache etc for each file. Till the time we don't run through the second phase.

  1. Made changes to release some of the data points post first phase and write some caching information to the file. NOTE: couldn't write the parserNodeInfo cache as upiclke map is not serializable. Rebuild the released information back in the second phase when it starts processing.
  2. Also made changes in download dependency pass parallelism. In download dependency pass few days back we made few optimisation to start processing the dependencies in parallel. However as we were also creating multiple futures for each file of the dependency. It's been observed multiple Future threads (which are created per dependency) were getting blocked waiting for the results. In turn lesser number of Future threads being used or available while processing the dependency files. Made changes and removed the Futures while downloading the dependencies, anyway we had to synchronise the section which executes the command. Processing of individual dependency code is queued in the queue which will be picked up by another writer thread. This will make sure processing is triggered in parallel, however it will also make sure only one dependency code is being processed at time.

Queued processing of one dependency at one time post parallel proessing
of download.

1. Delinked downloading and processing of the dependency.
2. In one thread we are downloading the dependencies and queuing them to
be processed in separate writer thread.
@DavidBakerEffendi DavidBakerEffendi added performance Improves the performance of Joern go Relates to gosrc2cpg labels Mar 27, 2024
protected val declaredPackageName = parserResult.json(ParserKeys.Name)(ParserKeys.Name).str
protected val fullyQualifiedPackage =
goMod.getNameSpace(parserResult.fullPath, declaredPackageName)
protected var aliasToNameSpaceMapping: mutable.Map[String, String] = mutable.Map.empty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 49-56 is going to become technical debt and hard to maintain. Can we try to find a solution that avoids so much global mutable state in this class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code below, perhaps you want to look at what happens in AstSummaryVisitor in RubySrc2Cpg and CSharpSrc2Cpg for how to do a pre-parse without all this mutable state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, we need to move these variables to another trait, as it's becoming very difficult to follow this file.

*/
private def deserialise(bytes: Array[Byte]): Any = {
val ois = new ObjectInputStream(new ByteArrayInputStream(bytes))
val value = ois.readObject
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readObject is an RCE waiting to happen, it is an extremely dangerous function.

We have a whole talk at BSides about exploiting Java deserialization https://www.youtube.com/watch?v=8qo0ZGK0tt0

https://msgpack.org is on the classpath from ODB, rather serialize data using that format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess serialising the alias to namespace mapping is anyway shouldn't make much of the difference. I need to try removing this.

})
}

def cleanup(): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather suggest using close and having AstCreator extend AutoCloseable, this can enforce Using.resource(astCreator) => which guarantees this global mutable state is at least closed otherwise we may sit with an implicit protocol which may not be followed and result in a memory leak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Relates to gosrc2cpg performance Improves the performance of Joern
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants