Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve plain progress #7272

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented May 3, 2024

Fixes #7137.

With the changes as part of #6835 (in v0.11.0), all old TUIs were removed, including the old --progress=plain output. With #7069 (in v0.11.1), a limited version of the plain TUI was restored. However, as noted in #7137 (comment) and #7223 (comment), this new plain progress has issues - it only reports spans that have logs, and loses a lot of additional information.

This PR attempts to strike a balance between the v0.10.X plain output and the v0.11.X plain output - unfortunately, the old behavior cannot be perfectly restored due to the architectural changes in the move to OpenTelemetry.

Specifically, some guiding ideas I'm trying to stick to:

  • We should share as much code as possible between plain + tty output. We want to ensure that the visibility of spans is the same between both outputs, and that the verbosity options create roughly the same effects for both.
  • We should attempt to organize the logs by function call as much as possible - this is the direction of the tty output, as well as dagger cloud - being aligned keeps the outputs somewhat similar.
  • We should keep a familiar style to the old plain logs, with numerical prefixes, similar colors, etc.
  • We should attempt to default to linearly readable logs in the case of linear progress (e.g. a linear series of calls in the CLI should display linearly in the progress). We should aim to show the exact amount of interesting information. More complex parallel pipelines are out of scope, there is no way to do this neatly in a log format.

@jedevc
Copy link
Member Author

jedevc commented May 3, 2024

As a sneak preview of what I'm building towards at the moment:

image

@gerhard
Copy link
Member

gerhard commented May 3, 2024

Looking good @jedevc 💪

@jedevc
Copy link
Member Author

jedevc commented May 7, 2024

Okay, continuing to make progress here, did some refactoring, but also started reworking the output a bit.

One of the main problems with plain progress is that we lose a lot of context for when anything appears. This was always kind of a problem, but it's now really quite annoying. So ideally, what we should do is try and provide some amount of context for each vertex.

One idea I've been playing around with is to show each vertex with its parent call (if there is one):

image

The merging between parts isn't quite right, but the grouping does seem to help improve readability. Unlike the pretty progress, it's not really doable to nest these indefinitely, since this will really clog up the output - but showing the most recent parent call does really help to understand where something comes from.

@jedevc
Copy link
Member Author

jedevc commented May 17, 2024

Okay, so a lot of work from this got spun out into separate PRs: #7347 #7371 #7385

I also got some more time to focus on this one for a bit yesterday and today - I refactored most of the progress, and so now we have a lot of nice shared code between the plain and the pretty output. Here's the current state:

image

There's a few more steps to take early next week before this is done:

  • Introduce a small delay buffer for context switching (similar to what streaming_output.go was doing) - this will help for grouping, as well as avoiding out-of-order issues caused by logs and spans received in interesting orders.
  • Add some vertical spacing for visual separation
  • Tidy up any specifically nasty hacks
  • Get some user feedback
  • Tidy commit history
  • Check tests

@jedevc jedevc force-pushed the plain-progress branch 4 times, most recently from e22fcc9 to c0a4166 Compare May 20, 2024 17:23
@jedevc jedevc marked this pull request as ready for review May 20, 2024 17:25
@jedevc jedevc requested review from vito, helderco and sipsma May 20, 2024 17:25
@jpadams
Copy link
Contributor

jpadams commented May 21, 2024

@jedevc I like the way this renders a lot. I don't think it's part of this PR likely, but I'd still like to see less/none of the HTTP HEAD/POST/PUT calls. I rebased this PR on top of main and rebuilt engine, but still have the HTTP output, so something to look into.
image

Copy link
Contributor

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM! Obviously need to solve the test failure mystery, but otherwise good to merge

}
depth--
indent(out, depth)
depth-- //nolint:ineffassign
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtf, silly linter... 😄

Copy link
Contributor

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, this looks great! I love that it keeps all the info and coloring from the TUI, and it doesn't feel too insanely verbose. The refactor is very sensible too.

Purely UI feedback, but we can iterate, it's already better than what we have:

I mentioned spacing previously, and I think keeping the top-level items separate but keeping the nested items flush is probably the best we can do. Besides maybe double-spacing at the top and single-spacing at the inner levels. Maybe worth trying, but dunno if it'd be better.

That being said it seems like sometimes calls "break out" into separate sections instead of being nested properly. Maybe a bug? (See the extra "Apko.wolfi" header below)

image

Could use a blank line here above "Container evaluated." to make it feel more final.

image

Since we're already omitting args on the "DONE" line maybe we should omit the return type too? Seems slightly noisy.

image

The spacing on the numbers feels a little awkward but I get that it keeps things aligned. Maybe the number could be right-aligned so it's flush with the :? Maybe 0-padded?

image

MAYBE the top-level inline args are indented one level too far? Not sure if intentional, but it means the children align a little differently than when it's not a multiline code block.

image

That's all I got for now! Again nothing blocking.

@@ -10,6 +10,8 @@ import (
)

func main() {
// XXX: check that this still works!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal if it doesn't, it's already a bit nerfed from TUI refactoring. It used to give you a deep dump of the ID, but now it abbreviates into things like Container.withExec which isn't as useful.

}
return view
// ConnectedToEngine is called when the CLI connects to an engine.
// TODO: remove this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly curious, any particular reason to remove it? The thinking was that a plain UI would just print this, but a pretty TUI might show it somewhere onscreen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case tea.WindowSizeMsg:
fe.SetWindowSize(msg)
return fe, nil
type Frontend interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc
Copy link
Member Author

jedevc commented May 23, 2024

Rebased onto main after #7385 was merged.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc jedevc force-pushed the plain-progress branch 11 times, most recently from 2aaaccf to acb5e49 Compare May 23, 2024 13:29
These now strike a balance between the old and the new plain progress,
while trying to take as much advantage of our new OTEL tooling as
possible.

These new logs use the span database, sharing the logic with the pretty
progress as well (but handling logs entirely differently, we don't
need/want fancy vterm handling for these).

Signed-off-by: Justin Chadwell <me@jedevc.com>
We don't want to display these at all for the new TUI.

Signed-off-by: Justin Chadwell <me@jedevc.com>
This got missed somewhere during a rebase.

Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc
Copy link
Member Author

jedevc commented May 23, 2024

Right. I think I have a handle on what's going on. Compare the failing check on this PR here and the passing check on my work-in-progress PR here.

There's one key difference here: 100aaf2 (#7450)

Here's my best guess what's going on - somewhere, there's an unexpected layer of output buffering, which is preventing flushing data from the output stream - this seems to manifest as blocking something critical in a python test. If we just become more chatty on our logs... we flush more data.

Alongside the hard evidence of the different PRs:

  • This explains why this PR is a "regression" - the new plain progress is a lot less chatty, so we might flush less.
  • This would explain why enabling --debug "fixes" the issue - we re-enable a ton of chatty output, so we would flush more.

From the python subprocess side, we do set bufsize=0 explicitly - this should disable buffering. I wonder if there's still maybe something to pull on here though, because we don't see this in any other language SDK.

@helderco
Copy link
Contributor

From the python subprocess side, we do set bufsize=0 explicitly - this should disable buffering. I wonder if there's still maybe something to pull on here though, because we don't see this in any other language SDK.

Interesting! Yeah, the runtime also disables it for modules:

WithEnvVariable("PYTHONUNBUFFERED", "1").

But CI doesn't:

dagger/ci/sdk_python.go

Lines 181 to 183 in 643aa61

base := dag.Container().
From(fmt.Sprintf("python:%s-slim", version)).
WithEnvVariable("PIPX_BIN_DIR", "/usr/local/bin").

It's an easy change to put there.

Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc
Copy link
Member Author

jedevc commented May 23, 2024

Hm, that doesn't seem to have worked.

Something weird here is also that the extra chattiness I added was for stderr - while I think we do a lot of our essential session communication over stdout. Still not quite sure what this could be though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

✨ Output in automatic provisioning like 0.10
6 participants