Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining Workflow metrics #7152

Merged
merged 47 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
2203b81
Defining Workflow metrics
prateek041 Nov 6, 2023
fc670b3
Added logic to record metrics for:
prateek041 Nov 10, 2023
733bbfb
Added logic to record workflowRemindersTotal metric
prateek041 Nov 10, 2023
9519bb8
Added logic to Record Workflow/operations metrics for:
prateek041 Nov 21, 2023
4927cc4
Add logic to record Failed workflow/operations metrics
prateek041 Nov 21, 2023
a4025ed
Record metrics for Successful and Failed Wokflow/Activity
prateek041 Nov 21, 2023
9a3c080
Merged failed/success metrics into one
prateek041 Nov 26, 2023
1322841
Record Latency metrics for Operations and Executions
prateek041 Nov 26, 2023
6b1bf39
Questions regarding latency metrics
prateek041 Nov 26, 2023
624d435
Reducing functions to record metrics and code cleanup
prateek041 Dec 7, 2023
751d867
Added Operation test
prateek041 Dec 10, 2023
d139476
Added namespace and component key
prateek041 Dec 10, 2023
8dbee6e
Add tests for Reminders metrics
prateek041 Dec 10, 2023
7d35ed0
Added Execution metrics tests
prateek041 Dec 10, 2023
dbacff4
fix: fixed test conditions
prateek041 Dec 11, 2023
76a85da
cleanup
prateek041 Dec 15, 2023
9881dd8
resolving conflicts
prateek041 Dec 15, 2023
374df01
fix
prateek041 Dec 15, 2023
5238fa0
fixing minor changes
prateek041 Dec 18, 2023
a6ed62b
minor fix
prateek041 Dec 21, 2023
4260b94
Merge branch 'master' into workflow-metrics
prateek041 Dec 21, 2023
665b88e
Fixing minor Nits
prateek041 Dec 23, 2023
55e4373
Added metrics to docs
prateek041 Dec 25, 2023
612af02
Added comments for metrics description
prateek041 Dec 26, 2023
4a2581a
Removed reminders count metrics
prateek041 Dec 26, 2023
3ccbbe1
Merge branch 'master' into workflow-metrics
prateek041 Dec 26, 2023
f00c5cd
empty commit
shivamkm07 Dec 27, 2023
4c1d076
fixes
shivamkm07 Dec 27, 2023
eb920b8
Rename workflow_metrics.go -> workflow_monitoring.go
shivamkm07 Dec 28, 2023
4ee9224
Updating dapr-metrics docs
shivamkm07 Dec 28, 2023
0e899e2
typo
shivamkm07 Dec 28, 2023
b9b7ebc
Adding comment
shivamkm07 Dec 28, 2023
bc847c9
Avoid collecting metrics in case workflow is already completed before…
shivamkm07 Dec 28, 2023
d61f24e
linter fixes
shivamkm07 Jan 2, 2024
39d4ef8
Merge branch 'master' into workflow-metrics
shivamkm07 Jan 3, 2024
666f1bf
Update pkg/diagnostics/workflow_monitoring.go
shivamkm07 Jan 5, 2024
6126529
Merge branch 'master' into workflow-metrics
shivamkm07 Jan 8, 2024
29747f8
Adding activity execution metrics as separate metrics
shivamkm07 Jan 8, 2024
da4cb40
renaming componentName
shivamkm07 Jan 8, 2024
4cbccba
Adding IT for metrics wf
shivamkm07 Jan 9, 2024
ec1b5fb
Merge branch 'master' into workflow-metrics
shivamkm07 Jan 9, 2024
508978f
linter fixes
shivamkm07 Jan 9, 2024
2c7355b
Merge branch 'master' into workflow-metrics
shivamkm07 Jan 10, 2024
56d0808
Merge branch 'master' into workflow-metrics
mukundansundar Jan 11, 2024
0907987
Merge branch 'master' into workflow-metrics
mukundansundar Jan 11, 2024
55ec256
Updating docs
shivamkm07 Jan 11, 2024
32aab9d
Removing componentKey from wf metrics
shivamkm07 Jan 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/development/dapr-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,16 @@ Dapr uses prometheus process and go collectors by default.
* dapr_resiliency_count: The number of times a resiliency policy has been executed.
* dapr_resiliency_activations_total: Number of times a resiliency policy has been activated in a building block after a failure or after a state change.

#### Workflow metrics

[workflow metrics](../../pkg/diagnostics/workflow_monitoring.go)

* dapr_runtime_workflow_operation_count: The number of successful/failed workflow operation requests.
* dapr_runtime_workflow_operation_latency: The latencies of responses for workflow operation requests.
* dapr_runtime_workflow_execution_count: The number of successful/failed/recoverable workflow executions.
* dapr_runtime_workflow_activity_execution_count: The number of successful/failed/recoverable activity executions.
* dapr_runtime_workflow_activity_execution_latency: The total time taken to run an activity to completion.

### gRPC monitoring metrics

Dapr leverages opencensus ocgrpc plugin to generate gRPC server and client metrics.
Expand Down Expand Up @@ -173,3 +183,4 @@ We support only server side metrics.

* dapr_component_secret_count: The number of operations performed on the secret component
* dapr_component_secret_latencies: The latency of the response from the secret component

8 changes: 6 additions & 2 deletions pkg/diagnostics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ var (
DefaultComponentMonitoring = newComponentMetrics()
// DefaultResiliencyMonitoring holds resiliency specific metrics.
DefaultResiliencyMonitoring = newResiliencyMetrics()
// Rules holds regex expressions for metrics labels
Rules map[string]string
// DefaultWorkflowMonitoring holds workflow specific metrics.
DefaultWorkflowMonitoring = newWorkflowMetrics()
)

// InitMetrics initializes metrics.
Expand All @@ -66,6 +66,10 @@ func InitMetrics(appID, namespace string, rules []config.MetricsRule) error {
return err
}

if err := DefaultWorkflowMonitoring.Init(appID, namespace); err != nil {
return err
}

// Set reporting period of views
view.SetReportingPeriod(DefaultReportingPeriod)
return utils.CreateRulesMap(rules)
Expand Down
137 changes: 137 additions & 0 deletions pkg/diagnostics/workflow_monitoring.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/*
Copyright 2023 The Dapr Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package diagnostics

import (
"context"

"go.opencensus.io/stats"
"go.opencensus.io/stats/view"
"go.opencensus.io/tag"

diagUtils "github.com/dapr/dapr/pkg/diagnostics/utils"
)

var (
workflowNameKey = tag.MustNewKey("workflow_name")
activityNameKey = tag.MustNewKey("activity_name")
)

const (
StatusSuccess = "success"
StatusFailed = "failed"
StatusRecoverable = "recoverable"
CreateWorkflow = "create_workflow"
GetWorkflow = "get_workflow"
AddEvent = "add_event"
PurgeWorkflow = "purge_workflow"

WorkflowEvent = "event"
Timer = "timer"
)

type workflowMetrics struct {
// workflowOperationCount records count of Successful/Failed requests to Create/Get/Purge Workflow and Add Events.
workflowOperationCount *stats.Int64Measure
// workflowOperationLatency records latency of response for workflow operation requests.
workflowOperationLatency *stats.Float64Measure
// workflowExecutionCount records count of Successful/Failed/Recoverable workflow executions.
workflowExecutionCount *stats.Int64Measure
// activityExecutionCount records count of Successful/Failed/Recoverable activity executions.
activityExecutionCount *stats.Int64Measure
// activityExecutionLatency records time taken to run an activity to completion.
activityExecutionLatency *stats.Float64Measure
appID string
enabled bool
namespace string
}

func newWorkflowMetrics() *workflowMetrics {
return &workflowMetrics{
workflowOperationCount: stats.Int64(
"runtime/workflow/operation/count",
"The number of successful/failed workflow operation requests.",
stats.UnitDimensionless),
workflowOperationLatency: stats.Float64(
"runtime/workflow/operation/latency",
"The latencies of responses for workflow operation requests.",
stats.UnitMilliseconds),
workflowExecutionCount: stats.Int64(
"runtime/workflow/execution/count",
"The number of successful/failed/recoverable workflow executions.",
stats.UnitDimensionless),
activityExecutionCount: stats.Int64(
"runtime/workflow/activity/execution/count",
"The number of successful/failed/recoverable activity executions.",
stats.UnitDimensionless),
activityExecutionLatency: stats.Float64(
"runtime/workflow/activity/execution/latency",
"The total time taken to run an activity to completion.",
stats.UnitMilliseconds),
}
}

func (w *workflowMetrics) IsEnabled() bool {
return w != nil && w.enabled
}

// Init registers the workflow metrics views.
func (w *workflowMetrics) Init(appID, namespace string) error {
w.appID = appID
w.enabled = true
w.namespace = namespace

return view.Register(
diagUtils.NewMeasureView(w.workflowOperationCount, []tag.Key{appIDKey, namespaceKey, operationKey, statusKey}, view.Count()),
diagUtils.NewMeasureView(w.workflowOperationLatency, []tag.Key{appIDKey, namespaceKey, operationKey, statusKey}, defaultLatencyDistribution),
diagUtils.NewMeasureView(w.workflowExecutionCount, []tag.Key{appIDKey, namespaceKey, workflowNameKey, statusKey}, view.Count()),
diagUtils.NewMeasureView(w.activityExecutionCount, []tag.Key{appIDKey, namespaceKey, activityNameKey, statusKey}, view.Count()),
diagUtils.NewMeasureView(w.activityExecutionLatency, []tag.Key{appIDKey, namespaceKey, activityNameKey, statusKey}, defaultLatencyDistribution))
}

// WorkflowOperationEvent records total number of Successful/Failed workflow Operations requests. It also records latency for those requests.
func (w *workflowMetrics) WorkflowOperationEvent(ctx context.Context, operation, status string, elapsed float64) {
if !w.IsEnabled() {
return
}

stats.RecordWithTags(ctx, diagUtils.WithTags(w.workflowOperationCount.Name(), appIDKey, w.appID, namespaceKey, w.namespace, operationKey, operation, statusKey, status), w.workflowOperationCount.M(1))

if elapsed > 0 {
stats.RecordWithTags(ctx, diagUtils.WithTags(w.workflowOperationLatency.Name(), appIDKey, w.appID, namespaceKey, w.namespace, operationKey, operation, statusKey, status), w.workflowOperationLatency.M(elapsed))
}
}

// WorkflowExecutionEvent records total number of Successful/Failed/Recoverable workflow executions.
// Execution latency for workflow is not supported yet.
func (w *workflowMetrics) WorkflowExecutionEvent(ctx context.Context, workflowName, status string) {
if !w.IsEnabled() {
return
}

stats.RecordWithTags(ctx, diagUtils.WithTags(w.workflowExecutionCount.Name(), appIDKey, w.appID, namespaceKey, w.namespace, workflowNameKey, workflowName, statusKey, status), w.workflowExecutionCount.M(1))
}

// ActivityExecutionEvent records total number of Successful/Failed/Recoverable workflow executions. It also records latency for these executions.
func (w *workflowMetrics) ActivityExecutionEvent(ctx context.Context, activityName, status string, elapsed float64) {
if !w.IsEnabled() {
return
}

stats.RecordWithTags(ctx, diagUtils.WithTags(w.activityExecutionCount.Name(), appIDKey, w.appID, namespaceKey, w.namespace, activityNameKey, activityName, statusKey, status), w.activityExecutionCount.M(1))

if elapsed > 0 {
stats.RecordWithTags(ctx, diagUtils.WithTags(w.activityExecutionLatency.Name(), appIDKey, w.appID, namespaceKey, w.namespace, activityNameKey, activityName, statusKey, status), w.activityExecutionLatency.M(elapsed))
}
}