Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking more time to analyse with many processes #29

Open
KarthickRaja2002 opened this issue Aug 17, 2023 · 0 comments
Open

Taking more time to analyse with many processes #29

KarthickRaja2002 opened this issue Aug 17, 2023 · 0 comments

Comments

@KarthickRaja2002
Copy link

Hi @RootLUG ,

I am invoking Aura through Java ProcessBuilder as 30 processes with same zips as input. While doing this it is taking more time for analysis. If the same zip is invoked with a single process, it is completed within 3 mins. But doing the same for 30 zips as 30 processes, it is taking more than an hour.

Moreover, The zip contains more recursive zips. So that I have used the ThreadPoolExecutors with max_workers as 10 for extraction alone. I have also changed the max-depth in aura_config.yaml file to 50.

Here, I have given the modified ThreadPoolExecutor in package_analyzer.py file. Kindly check this and let me know why it is taking too much time for analysis while invoking through Java with 30 processes.

Thanks in advance!

`
@staticmethod
def scan_directory(item: base.ScanLocation):
print(f"Collecting files in a directory '{item.str_location}")
dir_executor = futures.ThreadPoolExecutor(max_workers=10)
dir_executor.submit(Analyzer.scan_dir_by_ThreadPool, item)
collected = Analyzer.scan_dir_by_ThreadPool(item=item)
dir_executor.shutdown()
return collected

@staticmethod
def scan_dir_by_ThreadPool(item: base.ScanLocation):
    """Scanning input directory"""
    topo = TopologySort()
    collected = []
    for f in utils.walk(item.location):
        if str(f).endswith((".py",".zip",".jar",".war", ".whl", ".egg",".gz",".tgz")):
            new_item = item.create_child(f,
                parent=item.parent,
                strip_path=item.strip_path
                )
            collected.append(new_item)
            topo.add_node(Path(new_item.location).absolute())
            logger.debug("Computing import graph")
            for x in collected:
                if not x.metadata.get('py_imports'):
                    continue
                node = Path(x.location).absolute()
                topo.add_edge(node, x.metadata['py_imports']['dependencies'])
            topology = topo.sort()
            collected.sort(
                key=lambda x: topology.index(x.location) if x.location in topology else 0
            )
            logger.debug("Topology sorting finished")
    return collected

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant