Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix progress logging for parallel builds #9096

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 40 additions & 11 deletions sphinx/builders/__init__.py
Expand Up @@ -23,7 +23,8 @@
from sphinx.events import EventManager
from sphinx.io import read_doc
from sphinx.locale import __
from sphinx.util import import_object, logging, progress_message, rst, status_iterator
from sphinx.util import (display_chunk, import_object, log_status_message, logging,
progress_message, rst, status_iterator)
from sphinx.util.build_phase import BuildPhase
from sphinx.util.console import bold # type: ignore
from sphinx.util.docutils import sphinx_domains
Expand Down Expand Up @@ -450,20 +451,29 @@ def read_process(docs: List[str]) -> bytes:
# allow pickling self to send it back
return pickle.dumps(self.env, pickle.HIGHEST_PROTOCOL)

chunks = make_chunks(docnames, nproc)

chunks_complete = 0
summary = bold(__('reading sources... '))

def merge(docs: List[str], otherenv: bytes) -> None:
# Update progress message for each chunk completed
nonlocal chunks_complete
chunks_complete += 1
log_status_message(display_chunk(docs), summary, 'purple', chunks_complete,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a new option to control the timing of messages (ex. before, after, both) to status_iterator() instead? I think the concept of status_iterator() is separating progress messages from the main code. But this refactoring makes them coupled. It would be nice if we can keep them less coupled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion doesn't relate to the issue because the for loop doesn't wait until the tasks are completed. The entire loop can finish before even a single task has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tk0miya in case I wasn't clear, the status_iterator here only measures the progress of submitting tasks into the queue with add_task. make_chunks can create more than nproc chunks, and add_task doesn't block when there are more than nproc tasks submitted. It just adds them to a queue ready to be run. So, there is no connection whatsoever between the status_iterator progress and actual progress of building the documentation. i.e. the progress messages are pretty much useless.

For writing, there is some serialised work executed inside of the for loop which is measured by the status_iterator. However, it's still true that none of the parallelised work is represented in the progress messages. So there can still be huge delays between 100% progress and actually finishing the build step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separating progress messages from the main code

The logic in the PR looks good to me. Progress messages cannot be decoupled completely from doing the actual work - there has to be some signal that says "chunk is done, please send out a new progress message".

@tk0miya is there a chance this can be merged in the near future? This will be quite helpful for SciPy. We have a very heavy doc build (~20 minutes) that sometimes times out on CI because no output is received.

If there's some extra validation or other activity that we can do to help move this forward, please let us know!

len(chunks), self.app.verbosity)

# Merge build environment from sub-process
env = pickle.loads(otherenv)
self.env.merge_info_from(docs, env, self.app)

tasks = ParallelTasks(nproc)
chunks = make_chunks(docnames, nproc)

for chunk in status_iterator(chunks, __('reading sources... '), "purple",
len(chunks), self.app.verbosity):
for chunk in chunks:
tasks.add_task(read_process, chunk, merge)

# make sure all threads have finished
logger.info(bold(__('waiting for workers...')))
tasks.join()
logger.info('')

def read_doc(self, docname: str) -> None:
"""Parse a file and add/update inventory entries for the doctree."""
Expand Down Expand Up @@ -557,22 +567,41 @@ def write_process(docs: List[Tuple[str, nodes.document]]) -> None:
self.write_doc_serialized(firstname, doctree)
self.write_doc(firstname, doctree)

tasks = ParallelTasks(nproc)
chunks = make_chunks(docnames, nproc)

# Each chunk is done in two steps: one serialized and one parallel
n_steps = 2 * len(chunks)
steps_done = 0
summary = bold(__('writing output... '))

def on_chunk_done(args: List[Tuple[str, Any]], result: Any) -> None:
# Update progress message for each chunk completed
nonlocal steps_done
steps_done += 1
if len(args) == 1:
chunk_name = args[0][0]
else:
chunk_name = '{} .. {}'.format(args[0][0], args[-1][0])

log_status_message(chunk_name, summary, 'darkgreen', steps_done,
n_steps, self.app.verbosity)

tasks = ParallelTasks(nproc)
self.app.phase = BuildPhase.RESOLVING
for chunk in status_iterator(chunks, __('writing output... '), "darkgreen",
len(chunks), self.app.verbosity):
for chunk in chunks:
log_status_message(display_chunk(chunk), summary, 'darkgreen', steps_done,
n_steps, self.app.verbosity)
arg = []
for i, docname in enumerate(chunk):
doctree = self.env.get_and_resolve_doctree(docname, self)
self.write_doc_serialized(docname, doctree)
arg.append((docname, doctree))
tasks.add_task(write_process, arg)
steps_done += 1
tasks.add_task(write_process, arg, on_chunk_done)

# make sure all threads have finished
logger.info(bold(__('waiting for workers...')))
tasks.join()
logger.info('')

def prepare_writing(self, docnames: Set[str]) -> None:
"""A place where you can add logic before :meth:`write_doc` is run"""
Expand Down
25 changes: 15 additions & 10 deletions sphinx/util/__init__.py
Expand Up @@ -660,25 +660,30 @@ def old_status_iterator(iterable: Iterable, summary: str, color: str = "darkgree
logger.info('')


def log_status_message(item: str, summary: str, color: str, progress: int,
length: int, verbosity: int = 0) -> None:
s = '{}[{:3d}%] {}'.format(summary, (100 * progress) // length,
colorize(color, item))
if verbosity:
s += '\n'
else:
s = term_width_line(s)
logger.info(s, nonl=True)
peterbell10 marked this conversation as resolved.
Show resolved Hide resolved


# new version with progress info
def status_iterator(iterable: Iterable, summary: str, color: str = "darkgreen",
length: int = 0, verbosity: int = 0,
stringify_func: Callable[[Any], str] = display_chunk) -> Iterable:
if length == 0:
yield from old_status_iterator(iterable, summary, color, stringify_func)
return
l = 0
l = None
summary = bold(summary)
for item in iterable:
l += 1
s = '%s[%3d%%] %s' % (summary, 100 * l / length, colorize(color, stringify_func(item)))
if verbosity:
s += '\n'
else:
s = term_width_line(s)
logger.info(s, nonl=True)
for l, item in enumerate(iterable):
log_status_message(stringify_func(item), summary, color, l + 1, length, verbosity)
yield item
if l > 0:
if l is not None:
logger.info('')


Expand Down