Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Logging for Workflow #7065

Closed
Tracked by #7156
RyanLettieri opened this issue Oct 19, 2023 · 9 comments · Fixed by #7222
Closed
Tracked by #7156

Enhance Logging for Workflow #7065

RyanLettieri opened this issue Oct 19, 2023 · 9 comments · Fixed by #7222
Assignees
Milestone

Comments

@RyanLettieri
Copy link
Contributor

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

Describe the feature

Currently, it is rather difficult to diagnose an issue that occurs when a user is using dapr workflow, and due to dapr workflow being a new feature, issues occur more frequently than hoped. By adding in additional logging, diagnosing the issue will be a much easier process. I am proposing the following logs to be added to the runtime:

  1. Creation of a Workflow
    c.logger.Infof("Created workflow %s with instanceID '%s'", <workflowName>, <instanceID>)
  • Include the Instance ID for the workflow
  • Include the name of the workflow
  1. Scheduling of an Activity
    c.logger.Infof("Scheduling activity '%s' for workflow %s with instanceID %s", <activityName>, <workflowName>, <instanceID>)
  • Include the name of the activity
  • Include the name of the workflow
  • Include the instanceID for the workflow
  1. Execution of an Activity
    c.logger.Infof("Executing activity '%s' for workflow %s with instanceID %s", <activityName>, <workflowName>, <instanceID>)
  • Include the name of the activity
  • Include the name of the workflow
  • Include the instanceID for the workflow
  1. Completion of an Activity
    c.logger.Infof("Activity '%s' completed with status: '%s' for workflow %s with instanceID %s", <activityName>, <activityStatus>, <workflowName>, <instanceID>)
  • Include the name of the activity
  • Include the name of the workflow
  • Include the instanceID for the workflow
  • Include the status of the activity (pass/fail)
  1. Completion of a Workflow
    c.logger.Infof("Workflow %s with instanceID %s completed with status: '%s'", <workflowName>, <instanceID>, <workflowStatus>)
  • Include the Instance ID for the workflow
  • Include the name of the workflow
  • Include the status of the workflow (pass/fail)

In addition to the proposed logs for the runtime, I am also proposing that we add logs to the various SDKs which will provide information that the runtime does not have access to. These logs are as follows:

  1. Registration of a Workflow
    log("Workflow <workflowName> has been successfully registered")
  • Include the name of the workflow
  1. Registration of an Activity
    log("Activity <activityName> has been successfully registered to workflow <workflowName>")
  • Include the name of the activity
  • Include the parent workflow of the activity

Additional Thoughts

  1. Timestamps could also be added to the beginning of the various log statements to help create a timeline for the workflow execution. One downside would be that the logs may get too cluttered.
  2. The current proposal is to have the log statements as a level of "info". Again, to avoid cluttering the output, the log level could (and quite possibly should) be toned down to "debug"
  3. If/when more logs are added to the various SDKs, we should first determine if they can be added to the runtime, and if so, add them there. If not, we must ensure that the logs across all SKDKs are of a consistent format.

Release Note

RELEASE NOTE:

@ItalyPaleAle
Copy link
Contributor

👍

  1. Timestamps are already included in all logs by default, so we're good there
  2. How about we add an option in the Configuration CRD to determine the log level for workflow? We already have a section for logging in there. This way we can determine if the wf logs are debug or info

@philliphoff
Copy link
Contributor

Tiny nit: consider being consistent in use of quotes around names/IDs.

@cgillum
Copy link
Contributor

cgillum commented Oct 23, 2023

Have we considered whether some of this information could be added to the log schema? For example, instead of making the workflow ID part of the log message, could it just be a first-class property on the log record itself (i.e., structured logging)?

@olitomlinson
Copy link

Can I make a super minor suggestion that you tweak the message format for 4 and 5 (and the registration messages) so that they follow the same pattern as 1, 2 3.

i.e. Activity '%s' completed with status: '%s' for workflow %s with instanceID %s"
becomes
Activity completed...

Workflow %s with instanceID %s completed with status: '%s'", <workflowName>, <instanceID>, <workflowStatus>
becomes
Workflow completed...

This format helps when doing searches across logs as I can easily search for the phrase Workflow completed rather than searching for workflow and then filtering that list of results further by searching for completed (which still may give too many unwanted matches)

@rabollin
Copy link
Contributor

rabollin commented Nov 1, 2023

For logging the Completion of a Workflow details, if the status is Fail, consider adding the source of the error(DTF/ Workflow Client/ DAPR Client) in the message. Also the errors should not log confidential details like Apps hostname/ IP Address/ subscription ID ..etc.

@lilyjma
Copy link

lilyjma commented Nov 8, 2023

Something that came up in interviews with DF customers is that being able to see the input and output of activity functions would be helpful. Not sure if this is something to be exposed on the logging side, but exposing this info would be useful for debugging.

@famarting
Copy link
Contributor

famarting commented Jan 26, 2024

This two comments:

are really valid points and have not been addressed on the PR #7222 . Taking the most out of structured logging is crucial IMO to make the workflow logging useful for debugging, once structured logging is put in place it can more easily enable log processing and allows for more easily querying for the inputs and outputs of activities

I can create a PR if you are happy with it

@cgillum
Copy link
Contributor

cgillum commented Jan 26, 2024

@famarting I'm supportive of this, and would be happy to provide feedback and/or guidance.

@shivamkm07
Copy link
Contributor

This two comments:

are really valid points and have not been addressed on the PR #7222 . Taking the most out of structured logging is crucial IMO to make the workflow logging useful for debugging, once structured logging is put in place it can more easily enable log processing and allows for more easily querying for the inputs and outputs of activities

I can create a PR if you are happy with it

@famarting @cgillum As per comment #7065 (comment), I was adding Workflow ID as part of the logger field, similar to other fields like time, level etc. But that would require a separate logger Object for every workflow instance, unlike currently where there is a single global wfLogger object. Also for consistency across logs, I added the prefix Workflow Actor: <WorkflowID> as part of Message which would help querying all logs from a specific workflow instance. Is there any advantage of adding this as part of logger field? Or do we have other fields as well that we want to add other than WorkflowID? (The current format does require all workflow instance logs to follow the specific format, but helps us in having a single object).

For querying activity inputs/outputs, do we want to add these as well in logs? Won't these be user information and can potentially be quite large in size?

For the other comment, yeah it makes sense to add Workflow Completed and Activity completed as prefix which would help in searching logs. Creating a quick PR to add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants