Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch Collector LogsTransform Multi-Line Skipping Logs #3467

Open
chrism-teal opened this issue Dec 29, 2023 · 0 comments
Open

CloudWatch Collector LogsTransform Multi-Line Skipping Logs #3467

chrism-teal opened this issue Dec 29, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@chrism-teal
Copy link

chrism-teal commented Dec 29, 2023

I am trying to work with the otelcloudwatch collector to poll logs from a cloudwatch log group. I am running into an issue where events are being skipped due to the logstransform/cloudwatch gobbling up and discarding certain messages because it thinks these are multi-line messages. To reproduce:

Reproduce

my configuration for sumo helm chart (config.yaml):

          logs:
            collector:
              otelcloudwatch:
                enabled: true
                logGroups:
                  /aws/lambda/test-logs:
                    prefixes:
                      - 20
                    names: []
                persistence:
                  enabled: false
                region: us-east-1

Now you can use this bash script (you'll need to authenticate in aws-cli however you prefer first) to fire events at the log group for testing:

#!/bin/bash
log_stream_name='2023/11/06/[$LATEST]65f6fa16e86a4d45a51c767725f95c6f'
log_group_name='/aws/lambda/test-logs'
while true; do
  new_event_time="$(date +%s)000"
  echo "${new_event_time}"
  aws logs put-log-events --log-group-name "${log_group_name}" --log-stream-name "${log_stream_name}" --log-events "timestamp=${new_event_time},message=test log event@${new_event_time}" --region us-east-1 --no-cli-pager
  sleep 30
done

If I use this configuration and run the collector I get the following error (this happens even if I change the log to a JSON blob):

2023-12-29T13:44:52.571-0600	error	recombine/recombine.go:299	entry does not contain the combine_field	{"kind": "processor", "name": "logstransform/cloudwatch", "pipeline": "logs/collector/otelcloudwatch", "operator_id": "merge-cri-lines", "operator_type": "recombine"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/recombine.(*Transformer).addToBatch
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.91.0/operator/transformer/recombine/recombine.go:299
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/recombine.(*Transformer).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.91.0/operator/transformer/recombine/recombine.go:263
github.com/open-telemetry/opentelemetry-collector-contrib/processor/logstransformprocessor.(*logsTransformProcessor).converterLoop
	github.com/open-telemetry/opentelemetry-collector-contrib/processor/logstransformprocessor@v0.91.0/processor.go:143

When this happens it does actually grab a log event and export it, but on the next iteration it will skip events between and only pickup the most recent. Because of the nature of the error I am assuming this is an attempt to grab other events and combine them as an attempt to determine if it is a multi-line log. The confusion here is that the logs are being grabbed from cloudwatch as a JSON object so the entire log line (with newlines and all) is contained within the JSON object (from AWS, it is not newline based fetching these events). Leaving me wondering what the purpose of this multi-line logic is or if it's necessary.

At the moment the pipeline is hardcoded as:

    logs/collector/otelcloudwatch:
      receivers:
        - awscloudwatch
      processors:
        - transform/set_source_identifier
        - groupbyattrs/stream
        - transform/parsejson
        - logstransform/cloudwatch
        - transform/metadata
        - batch

Fixes

If i completely remove the logstransform/cloudwatch from the pipeline everything works as expected, no events are skipped, no errors are thrown, even with logs that are multi-line. I have tried multiple other solutions such as on_error: send within the operator as well as recoding the pipeline to actually set these combine_field that it is looking for, but even in doing so it still skips events. So if I had to submit a patch it would be:

diff --git a/deploy/helm/sumologic/conf/logs/collector/otelcloudwatch/config.yaml b/deploy/helm/sumologic/conf/logs/collector/otelcloudwatch/config.yaml
index 85b9bbe4..f75c9ab3 100644
--- a/deploy/helm/sumologic/conf/logs/collector/otelcloudwatch/config.yaml
+++ b/deploy/helm/sumologic/conf/logs/collector/otelcloudwatch/config.yaml
@@ -51,22 +51,6 @@ processors:
           - replace_pattern(attributes["k8s.pod.name"], "^.*kube\\.var\\.log\\.containers\\.([0-9a-zA-Z\\-]+)\\_.*", "$$1")
           - replace_pattern(attributes["k8s.container.name"], "^.*kube\\.var\\.log\\.containers\\.[0-9a-zA-Z\\-]+\\_[a-zA-Z\\-]*\\_([a-zA-Z]*).*", "$$1")
           - replace_pattern(attributes["k8s.namespace.name"], "^.*kube\\.var\\.log\\.containers\\.[0-9a-zA-Z\\-]+\\_([a-zA-Z\\-]*)_.*", "$$1")
-  logstransform/cloudwatch:
-    operators:
-      - id: merge-cri-lines
-        combine_field: attributes.log
-        combine_with: ""
-        is_last_entry: attributes.logtag == "F"
-        output: "merge-multiline-logs"
-        overwrite_with: newest
-        source_identifier: resource["cloudwatch.log.stream"]
-        type: recombine
-      - id: merge-multiline-logs
-        combine_field: attributes.log
-        combine_with: "\n"
-        is_first_entry: attributes.log matches {{ .Values.sumologic.logs.multiline.first_line_regex | quote }}
-        source_identifier: resource["cloudwatch.log.stream"]
-        type: recombine
 receivers:
   awscloudwatch:
     region: {{ .Values.sumologic.logs.collector.otelcloudwatch.region }}
@@ -91,7 +75,6 @@ service:
         - transform/set_source_identifier
         - groupbyattrs/stream
         - transform/parsejson
-        - logstransform/cloudwatch
         - transform/metadata
         - batch
       exporters:

Even abstracting this as a config value of some sort that I can pass to the helm chart to tell it "do not attempt multi-line combining in the cloudwatch collector" would work as well. Thanks in advance and hopefully I can get this cleared up.

@chrism-teal chrism-teal added the bug Something isn't working label Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant