Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using generator with process_map crashes, despite passing total parameter to tqdm_class #1231

Closed
mlissner opened this issue Aug 19, 2021 · 1 comment · Fixed by #1233 or #1229
Closed
Assignees
Labels
c1-quick 🕐 Complexity low p0-bug-critical ☢ Exception rasing to-merge ↰ Imminent
Projects
Milestone

Comments

@mlissner
Copy link

mlissner commented Aug 19, 2021

When using process_map on a generator, it crashes if len can't be called. For example:

cd /var/www/cl/docker/nginx/ && sudo docker-compose exec cl-python /opt/courtlistener/manage.py  dump_anon_docket_html --offset 74850 --processes 4 ; bush "Done making html"
  0%| | 0/2703724 [0Traceback (most recent call last):
  File "/opt/courtlistener/manage.py", line 15, in <module>
    main()
  File "/opt/courtlistener/manage.py", line 11, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 68, in handle
    make_html(options)
  File "/opt/courtlistener/cl/corpus_importer/management/commands/dump_anon_docket_html.py", line 40, in make_html
    process_map(
  File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/usr/local/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 61, in _executor_map
    kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()
Sentry is attempting to send 1 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

That's a lot, here's the important part:

    kwargs["total"] = len(iterables[0])
TypeError: object of type 'generator' has no len()

What's funny about that is that I passed the total parameter to my tqdm_class parameter:

    progress_bar = tqdm(
        total=total,
    )
    process_map(
        _write_item_to_disk,
        objects,
        max_workers=options["processes"],
        tqdm_class=progress_bar,
        total=total,
    )

It makes sense when you look in the code though. There, you find that the process_map calls _executor_map with whatever tqdm_kwargs it gets:

    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)

And _executor_map does:

    if "total" not in kwargs:
        kwargs["total"] = len(iterables[0])

Which crashes.

The fix is probably either:

  1. If there's a tqdm_class parameter, _executor_map should use the total parameter it provides, instead of using the kwarg provided to process_map; or

  2. The documentation should be updated to note that the total parameter is a supported argument in the process_map function.

I think number 2 is easier (no code change), but number 1 makes more sense because it provides a simpler API for how to set the total parameter. You could do a combination of the two also:

  1. Update the docs to note that total can be provided to the process_map, but prefer the total parameter that is passed to the tqdm_class class, if there is one.

Version info:
4.59.0 3.8.8 (default, Mar 12 2021, 19:44:18)
[GCC 8.3.0] linux

@casperdcl casperdcl self-assigned this Aug 19, 2021
@casperdcl casperdcl added c1-quick 🕐 Complexity low p0-bug-critical ☢ Exception rasing question/docs ‽ Documentation clarification candidate to-fix ⌛ In progress labels Aug 19, 2021
@casperdcl casperdcl added this to the Non-breaking milestone Aug 19, 2021
@casperdcl casperdcl added this to Next Release in Casper Aug 19, 2021
@casperdcl casperdcl added to-merge ↰ Imminent and removed question/docs ‽ Documentation clarification candidate to-fix ⌛ In progress labels Aug 22, 2021
casperdcl added a commit that referenced this issue Aug 22, 2021
@casperdcl
Copy link
Sponsor Member

ah should've been caught in #912. Will fix with #1233 next release.

@casperdcl casperdcl linked a pull request Aug 22, 2021 that will close this issue
casperdcl added a commit that referenced this issue Aug 22, 2021
Casper automation moved this from Next Release to Done Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c1-quick 🕐 Complexity low p0-bug-critical ☢ Exception rasing to-merge ↰ Imminent
Projects
Casper
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants