Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traces log for command to include exec content and full stdout/stderr #816

Open
gberche-orange opened this issue Jul 7, 2023 · 20 comments

Comments

@gberche-orange
Copy link
Contributor

gberche-orange commented Jul 7, 2023

Describe the feature:

With complex commands, the exec field conveniently enables to use multiline strings. However, the content of the exec field is not yet printed in with trace level. It would also be useful to have the full stdout and stderr be printed.

Given following goss.yaml

command:
    display-gcloud-sql-instances-with-labels:
      exec: |
        echo "current list of sql instances in the gcp project:"
        for i in $(gcloud sql instances list --format="value(name)"); do printf "\n\n$i :\n"; gcloud sql instances describe $i | yq '[.createTime, .settings.userLabels]'  ; done
      exit-status: 0
    no-sql-instance-remaining-for-current-namespace:
      exec: |
        gcloud sql instances list --filter="labels.xrc_namespace:$NAMESPACE"
      exit-status: 0
      stdout:
        - "Listed 0 items"

the command goss -L TRACE validate does not output the executed commands and their current stdout/stderr

image

cloudsdk@gcp-gcloud-cli-5c74f98554-26t58:~$ /home/cloudsdk/bin/goss -L TRACE -g /home/cloudsdk/goss.yaml validate 
2023/07/07 09:03:51 [DEBUG] Setting log level to TRACE
2023/07/07 09:03:56 [WARN] FAIL: Command => no-sql-instance-remaining-for-current-namespace (exit-status [0] [1]) [5.63]
F2023/07/07 09:03:56 [WARN] FAIL: Command => no-sql-instance-remaining-for-current-namespace (stdout [Listed 0 items] []) [0.00]
F2023/07/07 09:04:01 [WARN] FAIL: Command => display-gcloud-sql-instances-with-labels (exit-status [] []) [10.00]
F

Failures/Skipped:

Command: no-sql-instance-remaining-for-current-namespace: exit-status:
Expected
    <int>: 1
to equal
    <int>: 0
Command: no-sql-instance-remaining-for-current-namespace: stdout: patterns not found: [Listed 0 items]

display-gcloud-sql-instances-with-labels: exit-status: Error: Command execution timed out (10s)

Total Duration: 10.001s
Count: 3, Failed: 3, Skipped: 0
2023/07/07 09:04:01 [WARN] FAIL SUMMARY: Total Duration: 10.001s Count: 3, Failed: 3, Skipped: 0 

Describe the solution you'd like

Describe alternatives you've considered

the json format to include an additional field with the command exec original output. Currently it only displays the matcher output and matcher expected output

goss -L TRACE -g /home/cloudsdk/goss.yaml validate --format json -o verbose | yq -P
[...]
  - duration: 25336                                                                                                                                                                                        
    err: null                                                                                                                                                                                              
    expected:                                                                                                                                                                                              
      - value                                                                                        
    found: null                                                                                                                                                                                            
    human: ""                                                                                                                                                                                              
    meta: null                                                                                                                                                                                             
    property: stdout
    resource-id: gcloud sql instances list                                                                                                                                                                 
    resource-type: Command                                                                           
    result: 1                                                                                                                                                                                              
    skipped: false                                                                                   
    successful: false                                                                                                                                                                                      
    summary-line: 'Command: gcloud sql instances list: stdout: patterns not found: [value]'                                                                                                                
    test-type: 2                                                                                                                                                                                           
    title: ""                                                                                                                                                                                              
@aelsabbahy
Copy link
Member

I will mark this as approved.. but I would like to talk more about the desired output.

For example, the upcoming v4 release will allow the following.

The default behaves similarly to the current version. This default is very memory efficient as it reads the file line-by-line:

$ cat goss.yaml 
command:
  echo_test:
    exec: |
      for i in {1..10};do
        echo "This is an example multi line output $i"
      done
    exit-status: 0
    stdout:
      - "This is an example multi line output 11"
$ goss v
.F

Failures/Skipped:

Command: echo_test: stdout:
Expected
    "object: *bytes.Reader"
to have patterns
    ["This is an example multi line output 11"]
the missing elements were
    ["This is an example multi line output 11"]

Total Duration: 0.002s
Count: 2, Failed: 1, Skipped: 0

With v4, the concept of transformations is introduced, any matcher other than the default matcher will transform the output first, for example, have-patterns mimicks the above, but uses more memory and reports failures differently:

$ cat goss.yaml 
command:
  echo_test:
    exec: |
      for i in {1..10};do
        echo "This is an example multi line output $i"
      done
    exit-status: 0
    stdout:
      have-patterns:
        - "This is an example multi line output 11"
$ goss v
.F

Failures/Skipped:

Command: echo_test: stdout:
Expected
    "This is an example multi line output 1\nThis is an example multi line output 2\nThis is an example multi line output 3\nThis is an example multi line output 4\nThis is an example multi line output 5\nThis is an example multi line output 6\nThis is an example multi line output 7\nThis is an example multi line output 8\nThis is an example multi line output 9\nThis is an example multi line output 10\n"
to have patterns
    ["This is an example multi line output 11"]
the missing elements were
    ["This is an example multi line output 11"]

Total Duration: 0.002s
Count: 2, Failed: 1, Skipped: 0

Lastly, v4 will also allow full-text matching, for example:

$ cat goss.yaml 
command:
  echo_test:
    exec: |
      for i in {1..10};do
        echo "This is an example multi line output $i"
      done
    exit-status: 0
    stdout: |
      This is an example multi line output 999
      This is an example multi line output 2
      This is an example multi line output 3
      This is an example multi line output 4
      This is an example multi line output 5
      This is an example multi line output 6
      This is an example multi line output 7
      This is an example multi line output 8
      This is an example multi line output 9
      This is an example multi line output 10
$ goss v
.F

Failures/Skipped:

Command: echo_test: stdout:
Expected
    "This is an example multi line output 1\nThis is an example multi line output 2\nThis is an example multi line output 3\nThis is an example multi line output 4\nThis is an example multi line output 5\nThis is an example multi line output 6\nThis is an example multi line output 7\nThis is an example multi line output 8\nThis is an example multi line output 9\nThis is an example multi line output 10\n"
to equal
    "This is an example multi line output 999\nThis is an example multi line output 2\nThis is an example multi line output 3\nThis is an example multi line output 4\nThis is an example multi line output 5\nThis is an example multi line output 6\nThis is an example multi line output 7\nThis is an example multi line output 8\nThis is an example multi line output 9\nThis is an example multi line output 10\n"

Total Duration: 0.002s
Count: 2, Failed: 1, Skipped: 0

Now of course this is less about debugging and more about user experience. It also won't display stderr unless there's a test fo rit. This issue seems to be about logs. This can be added in a future release after v4 when trace logging is enabled to print out each line of stderr and stdout, even when the first syntax is used. Note, in parallel executions the log output will be intermingled.

Please let me know if I understood the ask correctly, and if you have any feedback on the above solutions.. or suggestion for log improvements.

@gberche-orange
Copy link
Contributor Author

thanks a lot @aelsabbahy for your answer and the great new v4 upcoming features related to logging out actual/expected stdout for troubleshooting failed tests, as well as refined matchers !

This issue seems to be about logs. This can be added in a future release after v4 when trace logging is enabled to print out each line of stderr and stdout, even when the first syntax is used. Note, in parallel executions the log output will be intermingled.
Please let me know if you have suggestion for log improvements.

In the your example reproduce below

$ cat goss.yaml 
command:
  echo_test:
    exec: |
      for i in {1..10};do
        echo "This is an example multi line output $i"
      done
    exit-status: 0
    stdout:
      - "This is an example multi line output 11"

I would suggest that the TRACE logs include the following (possibly with \n instead of full full line feeds)

F2023/07/07 09:04:01 [TRACE] executing echo_test with exec command: 
bash -c "
      for i in {1..10};do
        echo "This is an example multi line output $i"
      done"

I hope such traces to support debugging of problems such as:

  • incorrect loading of the exec string (e.g. due to mis yaml indentation)
  • details about shell executed by goss
  • bugs in upstream automation that could have damaged the exec string (e.g. ${i} being substituted by an intermediate such as flux kustomization)

@mpermar
Copy link

mpermar commented Aug 2, 2023

Chiming in here to provide on this enhancement request both a goss success story and complementing the use case, but first @aelsabbahy really absolutely congrats for this fantastic tool.

I'm not sure if anyone mentioned it before, I believe no, but our team at Bitnami has moved all the Bitnami Helm Charts to use goss for functional testing. I believe in pretty much every of the existing Helm Charts we have scripts like this, this or this that get run when some contributor sends a PR. That has helped us both to automate testing and to provide more visibility and feedback to our community on their submissions. So this is really goss being used at a very large scale every day.

Now, usually the goss tests are self descriptive about what has failed which is great, but one common issue that we always have had is when we needed to run a script, that is out of our control, and that might fail for multiple reasons. And maybe there are better ways to do this, but PyTorch here is a good example. That exec command runs a python script that could fail due to multiple causes. Changes in upstream python script, missing imports, network, etc. But we are limited to check the status code. When this script fails, we have very little information about why it failed, and usually that requires someone to manually try to replicate locally what happened to then be able to see the logs.

If TRACE logging is going to dump the stdout/stderr from exec commands then I think that will address this use case although if this logging could be triggered via a command flag that would probably be ideal as any other TRACE logging would be omitted.

@aelsabbahy
Copy link
Member

aelsabbahy commented Aug 3, 2023

@mpermar that's an AWESOME success story thank you for sharing. Also, thanks for sharing the links to the Goss files.

Can you provide exact examples of what you would like to see from the logging/UX perspective. I would love to brainstorm some ideas here.

@mpermar
Copy link

mpermar commented Aug 3, 2023

@aelsabbahy , yes sure. From a logging perspective the content would be whatever the exec (single or multiline) outputs to stdout/stderr. Same output that one would get, as you have suggested in a few threads, by piping tee (we plan to start with this workaround) on those exec.

For the sake of our use case, I'll take Pytorch's getting started guide for example. Following this guide I could write today a goss validation like this:

file:
  /app:
    exists: true
    filetype: directory
    mode: "0775"
    owner: root
command:
  validate-pytorch:
    exec: |
      pip3 install torch torchvision
      python pytorch-test.py
    exit-status: 0
    timeout: 60000

And then run it and I would get:

$ goss validate
.....

Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0

For us, on this use case, having a way to see the output from the commands is enough. This could work like (I'm non-opinionated here so totally making up the possible flag names):

$ goss validate --show-command-logs
.....

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Collecting torch
  Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
     |████████████████████████████████| 619.9 MB 69.4 MB/s
Collecting torchvision
  Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)
     |████████████████████████████████| 33.8 MB 41.5 MB/s
Collecting nvidia-curand-cu11==10.2.10.91
  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
.... tons of other logs from pip here ....
$ python pytorch-test.py
tensor([[0.5815, 0.1555, 0.5098],
        [0.2945, 0.2317, 0.6991],
        [0.0552, 0.4376, 0.6506],
        [0.2882, 0.8917, 0.9164],
        [0.2955, 0.5643, 0.0300]])
....
Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0

That's the simplest approach. As multiple exec goss tests might run in parallel lines might get intermingled. I honestly don't have a strong opinion there either. For our use case it is totally fine if lines get intermingled, but I could see the case for appending some sort of indicator of which test that line comes from. Again, to me intermingled sounds good as a first approach.

Similar to the above, one could be interesting on knowing what comes from stdout or stderr. I don't see anything wrong with appending [out|err] or [stdout|stderr] or using different colours, or again with just keeping it simple for the first implementation.

Another improvement could be to have a flag for dumping the logs those logs into a file. That could be good and keep the goss output decoupled from the exec output. Something like:

$ goss validate --show-command-logs --logs-file pytorch-validation.txt

With this latter approach you get into logging library issues like having a max-log-file-size, rotation patterns, i.e all the stuff that logging libraries end up having to do when they start dumping logs to files. Again, an initial approach probably doesn't need all of this stuff.

@aelsabbahy
Copy link
Member

After the v0.4.0 release.. I'll probably spin up a new issue to brainstorm the design for this logging feature.

The timing seems ideal since go just added structured logging to the std library.

@aelsabbahy
Copy link
Member

So, v0.4.0 is released.. wanted to loop back to this issue.

Some experimentation with the new slog package, do any of these look good for the example above? If implemented it would be only when "debug" logging is enabled.

code:

package main

import (
	"fmt"
	"strings"

	"log/slog"
	"os"

	"github.com/lmittmann/tint"
	"github.com/mattn/go-isatty"
)

const out = `
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Collecting torch
  Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
     |████████████████████████████████| 619.9 MB 69.4 MB/s
Collecting torchvision
  Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)
     |████████████████████████████████| 33.8 MB 41.5 MB/s
Collecting nvidia-curand-cu11==10.2.10.91
  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
.... tons of other logs from pip here ....
$ python pytorch-test.py
tensor([[0.5815, 0.1555, 0.5098],
        [0.2945, 0.2317, 0.6991],
        [0.0552, 0.4376, 0.6506],
        [0.2882, 0.8917, 0.9164],
        [0.2955, 0.5643, 0.0300]])
`

const summary = `
....
Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0
`

func main() {
	var logger *slog.Logger
	switch os.Getenv("LOGGER") {
	case "tint":
		logger = slog.New(
			tint.NewHandler(os.Stderr, &tint.Options{
				Level:   slog.LevelDebug,
				NoColor: !isatty.IsTerminal(os.Stderr.Fd()),
			}),
		)
	case "text":
		logger = slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelDebug}))
	case "json":
		logger = slog.New(slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelDebug}))

	}
	l := logger.With("resourceType", "command", "resourceId", "validate-pytorch", "property", "stdout")
	for _, ln := range strings.Split(out, "\n") {
		l.Debug(ln)
	}
	fmt.Println(summary)
}

Example outputs:

$ for x in tint text json default;do echo "\n==========\noutput using $x\n==========\n"; LOGGER=$x go run main.go;done


==========
output using tint
==========

Aug 26 11:27:26.714 DBG  resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG Collecting torch resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG   Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB) resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG      |████████████████████████████████| 619.9 MB 69.4 MB/s resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG Collecting torchvision resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG   Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB) resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG      |████████████████████████████████| 33.8 MB 41.5 MB/s resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG Collecting nvidia-curand-cu11==10.2.10.91 resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG   Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB) resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG .... tons of other logs from pip here .... resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG $ python pytorch-test.py resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG tensor([[0.5815, 0.1555, 0.5098], resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG         [0.2945, 0.2317, 0.6991], resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG         [0.0552, 0.4376, 0.6506], resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG         [0.2882, 0.8917, 0.9164], resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG         [0.2955, 0.5643, 0.0300]]) resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG  resourceType=command resourceId=validate-pytorch property=stdout

....
Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0


==========
output using text
==========

time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag." resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="Collecting torch" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="  Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="     |████████████████████████████████| 619.9 MB 69.4 MB/s" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="Collecting torchvision" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="  Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="     |████████████████████████████████| 33.8 MB 41.5 MB/s" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="Collecting nvidia-curand-cu11==10.2.10.91" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg=".... tons of other logs from pip here ...." resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="$ python pytorch-test.py" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="tensor([[0.5815, 0.1555, 0.5098]," resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="        [0.2945, 0.2317, 0.6991]," resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="        [0.0552, 0.4376, 0.6506]," resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="        [0.2882, 0.8917, 0.9164]," resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="        [0.2955, 0.5643, 0.0300]])" resourceType=command resourceId=validate-pytorch property=stdout
time=2023-08-26T11:27:26.808-07:00 level=DEBUG msg="" resourceType=command resourceId=validate-pytorch property=stdout

....
Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0


==========
output using json
==========

{"time":"2023-08-26T11:27:26.904078506-07:00","level":"DEBUG","msg":"","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904138903-07:00","level":"DEBUG","msg":"WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904142005-07:00","level":"DEBUG","msg":"Collecting torch","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904143985-07:00","level":"DEBUG","msg":"  Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.90416387-07:00","level":"DEBUG","msg":"     |████████████████████████████████| 619.9 MB 69.4 MB/s","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904166407-07:00","level":"DEBUG","msg":"Collecting torchvision","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.90416836-07:00","level":"DEBUG","msg":"  Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.90417053-07:00","level":"DEBUG","msg":"     |████████████████████████████████| 33.8 MB 41.5 MB/s","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904172773-07:00","level":"DEBUG","msg":"Collecting nvidia-curand-cu11==10.2.10.91","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904177583-07:00","level":"DEBUG","msg":"  Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.90418006-07:00","level":"DEBUG","msg":".... tons of other logs from pip here ....","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904182124-07:00","level":"DEBUG","msg":"$ python pytorch-test.py","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904184019-07:00","level":"DEBUG","msg":"tensor([[0.5815, 0.1555, 0.5098],","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904187202-07:00","level":"DEBUG","msg":"        [0.2945, 0.2317, 0.6991],","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904190196-07:00","level":"DEBUG","msg":"        [0.0552, 0.4376, 0.6506],","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904193223-07:00","level":"DEBUG","msg":"        [0.2882, 0.8917, 0.9164],","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904196083-07:00","level":"DEBUG","msg":"        [0.2955, 0.5643, 0.0300]])","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}
{"time":"2023-08-26T11:27:26.904198843-07:00","level":"DEBUG","msg":"","resourceType":"command","resourceId":"validate-pytorch","property":"stdout"}

....
Total Duration: 0.020s
Count: 5, Failed: 0, Skipped: 0

Note: parallel execution is going to make the output pretty mangled up, but there would be a few divides with this approach:

  • stdout: the original goss output

  • stderr:

    • the debug debug messages
    • also, each log contains enough info to know what the resourceType/ResourceName/Property is.

    Anyways.. just brainstorming some ideas before documenting/implementing this, would love your thoughts.

@mpermar
Copy link

mpermar commented Aug 27, 2023

Congrats on the successful v0.4.0. Tons of features there! @dani8art we should give it ago when other tasks allow!

Our main use case is to show command logs output to the user. So from that perspective, tint's example above looks better as it is the less polluted. But the extra context at the end would be confusing and it is also unparseable.

Is there any way to have context appended at the beginning of lines? Something like:

Aug 26 11:27:26.714 DBG [validate-pytorch][out] WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. 
Aug 26 11:27:26.714 DBG [validate-pytorch][out] Collecting torch resourceType=command resourceId=validate-pytorch property=stdout
Aug 26 11:27:26.714 DBG [validate-pytorch][out] Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB) resourceType=command resourceId=validate-pytorch property=stdout

The above alleviates pretty well the parallel test problem because with context in the left it is easier to figure out where each line comes from.

So, for us, with something like that then we would not need to parse the command logs for exposing those to the users in a palatable way. There could be a side effect as the command names might have variable lengths, and if somehow that placeholder could be normalised to the longest lengths that might be nice. But it wouldn't be terrible if the lengths are that different anyways.

Otherwise if that is somehow difficult then the full json structured format will be the most flexible. But in such case, we would likely end up parsing that json to build our own less-verbose easier-to-read version.

@aelsabbahy
Copy link
Member

aelsabbahy commented Sep 4, 2023

Kicking the tires on some ideas locally, given:

command:
  validate-pytorch:
    exec: cat ./dummy_data.txt
    exit-status: 0
    timeout: 10000

How does this look? It's very close to what you suggested, just minor tweaks to be consistent with documentation output format.

Example:

$ goss -L DEBUG v -f documentation
2023/09/03 19:10:31 [DEBUG] Setting log level to DEBUG
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  Collecting torch
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:    Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:       |████████████████████████████████| 619.9 MB 69.4 MB/s
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  Collecting torchvision
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:    Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:       |████████████████████████████████| 33.8 MB 41.5 MB/s
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  Collecting nvidia-curand-cu11==10.2.10.91
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:    Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  .... tons of other logs from pip here ....
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  $ python pytorch-test.py
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  tensor([[0.5815, 0.1555, 0.5098],
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:          [0.2945, 0.2317, 0.6991],
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:          [0.0552, 0.4376, 0.6506],
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:          [0.2882, 0.8917, 0.9164],
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:          [0.2955, 0.5643, 0.0300]])
2023/09/03 19:10:31 [DEBUG] Command: validate-pytorch: stdout:  
Command: validate-pytorch: exit-status: matches expectation: 0


Total Duration: 0.002s
Count: 1, Failed: 0, Skipped: 0

Note: All the log messages are going to stderr.

@aelsabbahy
Copy link
Member

FYI, there's a draft-ish PR here: #837

This includes the logging feature, still working out the architecture decisions.

If you're comfortable with go. Feel free to make install that version and see if it meets your needs.

No ETA on delivery.. just wanted to update you on progress.

@mpermar
Copy link

mpermar commented Sep 10, 2023

Yeah, that works nicely. A couple of suggestions:

  • For the timestamp, some standard format will be useful e.g. ISO-8601 2020-01-21T16:30:25Z
  • The command: feels quite verbose. You can just set the convention that after the log level the next piece is the command or an empty string, e.g.

2023/09/03 19:10:31 [DEBUG][validate-pytorch] stdout: [0.2882, 0.8917, 0.9164],

Or if there is a use case for no command:

2023/09/03 19:10:31 [DEBUG][] stdout: [0.2882, 0.8917, 0.9164],

I personally would also trim stdout: to out: or [out] to make it shorter too.

@mpermar
Copy link

mpermar commented Sep 10, 2023

@dani8art I think you were tasked to test our platform with goss v4. When you give it a try, can you also have a look at #837 and ideally try it as well?

@aelsabbahy
Copy link
Member

Thank you for your valuable feedback, @mpermar

For the timestamp, some standard format will be useful e.g. ISO-8601 2020-01-21T16:30:25Z

Good callout, will push out that change.

The command: feels quite verbose. You can just set the convention that after the log level the next piece is the command or an empty string, e.g.

I agree that it's a bit verbose, however, it is more consistent with other outputs from goss.

For example:

goss -l debug v -f documentation

The output to stdout will look like:

Command: validate-pytorch: stdout: matches expectation: ...

The stderr debug output will be:

2023-09-10T21:26:33Z [DEBUG] Command: validate-pytorch: stdout: ...

They both share the ${type}: ${id}: ${attribute}: ... which makes it very easy to correlate and easier to scale consistently as logging is introduced for other types and attributes.

@aelsabbahy
Copy link
Member

v0.4.2 is out with this change.

When you have a chance take it for a spin and we can either close this ticket or iterate on the solution.

@mpermar
Copy link

mpermar commented Sep 15, 2023

That's awesome @aelsabbahy . Haven't forgot about it. Will come back soon with some feedback here and close accordingly.

@dani8art
Copy link

Hello! I'm sorry for not getting back to you sooner. I will try it so we can integrate it into the system and finally show command logs.

I'll try to get back to you sooner rather than later.

@dani8art
Copy link

Hi! I did some tests to check that the already implemented goss command logs fit our needs, which look very promising.

✅ For us, keeping the output separated between stdout (goss result/report) and stderr (error, goss logs and debug logs) is so important to keep retro compatibility against code based on older GOSS versions.
✅ Adding the command and the stdXXX to the context makes the logs very clear and easy to identify

💡 If it were possible to add something, it would just be the following esthetic suggestion

## Currently we see log lines like
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] Command: wp-cli-info: stdout:  OS:	Linux 5.15.49-linuxkit-pr #1 SMP Thu May 25 07:17:40 UTC 2023 x86_64
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] Command: wp-cli-info: stdout:  Shell:	
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] Command: wp-cli-info: stdout:  PHP binary:	/opt/bitnami/php/bin/php
## I would add [ and ] to the context part of the line so it would perfectly fit our log structure.
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] [wp-cli-info][stdout]  OS:	Linux 5.15.49-linuxkit-pr #1 SMP Thu May 25 07:17:40 UTC 2023 x86_64
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] [wp-cli-info][stdout]  Shell:	
2023-09-18T14:06:16.836+0000 [    main][DEBUG][goss-cli] [wp-cli-info][stdout]  PHP binary:	/opt/bitnami/php/bin/php

NOTE: you should ignore that part 2023-09-18T14:06:16.836+0000 [ main][DEBUG][goss-cli] from the log line examples that come from us.

I've checked that you were adding those characters to the log-level part of the context, would it be possible to add it there as well?

Anyway, this is a great step to make GOSS and our implementation on top of it easier to debug and user-friendly, thank you so much @aelsabbahy for taking this into account.

@aelsabbahy
Copy link
Member

Hello @dani8art thanks for the quick feedback, is this what you had in mind?

$ goss -l debug v
2023-09-21T22:00:02Z [DEBUG] Setting log level to DEBUG
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  Collecting torch
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]    Downloading torch-2.0.1-cp38-cp38-manylinux1_x86_64.whl (619.9 MB)
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]       |████████████████████████████████| 619.9 MB 69.4 MB/s
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  Collecting torchvision
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]    Downloading torchvision-0.15.2-cp38-cp38-manylinux1_x86_64.whl (33.8 MB)
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]       |████████████████████████████████| 33.8 MB 41.5 MB/s
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  Collecting nvidia-curand-cu11==10.2.10.91
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]    Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  .... tons of other logs from pip here ....
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  $ python pytorch-test.py
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  tensor([[0.5815, 0.1555, 0.5098],
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]          [0.2945, 0.2317, 0.6991],
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]          [0.0552, 0.4376, 0.6506],
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]          [0.2882, 0.8917, 0.9164],
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]          [0.2955, 0.5643, 0.0300]])
2023-09-21T22:00:02Z [DEBUG][Command][validate-pytorch][stdout]  
.

Total Duration: 0.002s
Count: 1, Failed: 0, Skipped: 0
2023-09-21T22:00:02Z [DEBUG] OK SUMMARY: Total Duration: 0.002s Count: 1, Failed: 0, Skipped: 0 

Let me know if there are any tweaks to this output, this way the new finalized format can be in the next release.

Many thanks, your feedback makes goss better for everyone =)

@dani8art
Copy link

Yes, this is precisely what I had! This would be simply amazing!

Thank you so much for taking this feedback into account, we're excited to see the next release integrated into our system!

@mpermar
Copy link

mpermar commented Oct 19, 2023

@aelsabbahy the team has been already testing this. Just turned it on in "production" a few days ago and so far goss v4 has been working flawlessly. We have been using goss to verify the Pull Requests and changes from upstream for bitnami containers, virtual machines and Helm charts.

It's not totally obvious but note for example how this PR, triggers a GH Action run that has a regression that gets found thanks to the goss tests:

Screenshot 2023-10-19 at 17 11 49 Screenshot 2023-10-19 at 17 13 17

Now an user can go to the action summary and check the zip file with the logs for all the different validations that we run with different tools. Then looking at the goss output, now there is much more information available for our engineers and for the overall community to debug from.

Screenshot 2023-10-19 at 17 12 28

It's literally like day and night. Fantastic stuff. I think this issue can be marked as closed. Thanks a lot for your help and should you need any help with promotion, telling success stories, whatever, count on us. We certainly do ❤️ goss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants