Google Cloud Storage output plugin #15190

stijn-vanbael-enprove · 2024-04-19T06:07:07Z

Use Case

I want to store incoming metrics directly into the cheapest form of storage we have, which on Google Cloud is Google Cloud Storage.

Example usage:

[[outputs.google_cloud_storage]]
  bucket = "my-bucket"
  data_format = "influx"
  credentials_file = "path/to/my/creds.json"
  metrics_per_object = 1
  group_by = "day"
  object_suffix = ".line"

Expected behavior

The configuration above will result in one object per metric in the bucket "my-bucket", with the following object name:
<measurement>/<date>/<timestamp>.line

Actual behavior

There is no support for outputting to Google Cloud Storage yet.

Additional info

No response

The text was updated successfully, but these errors were encountered:

powersj · 2024-04-19T13:08:50Z

Hi,

Some questions around the proposal:

Have you looked into how to manage credentials?

bucket = "my-bucket"

Would telegraf create the bucket or would we assume the user has created it?

metrics_per_object = 1

If you have 20 objects, would you then write 20 files at every interval? Likewise, if you have 10,000 metrics, 10,000 files? Rather than dividing shouldn't a plugin respect the batch format serializer setting instead.

//.line
group_by = "day"

What are you assuming date would look like? 2005-01-02? Are you assuming telegraf would create and manage different folders and auto-create new ones? How does that relate to the group by?

Are you planning to submit a PR?

stijn-vanbael-enprove · 2024-04-19T13:46:29Z

Have you looked into how to manage credentials?

I assumed credentials would work in the same way as they do for the google_cloud_storage input plugin.

Would telegraf create the bucket or would we assume the user has created it?

Creating the bucket is not a hard requirement for me, but it would be nice if Telegraf could take care of it.

If you have 20 objects, would you then write 20 files at every interval? Likewise, if you have 10,000 metrics, 10,000 files? Rather than dividing shouldn't a plugin respect the batch format serializer setting instead.

Right, this is better handled by the serializer indeed.

What are you assuming date would look like? 2005-01-02? Are you assuming telegraf would create and manage different folders and auto-create new ones? How does that relate to the group by?

2005-01-02 would be a good format, but maybe it's better to have it configurable. Google Cloud Storage doesn't actually have folders. It just groups files for you in a folder-like structure when you use slashes in the object name.

Are you planning to submit a PR?

I'm afraid not

powersj · 2024-04-19T14:54:05Z

2005-01-02 would be a good format, but maybe it's better to have it configurable. Google Cloud Storage doesn't actually have folders. It just groups files for you in a folder-like structure when you use slashes in the object name.

Right, however, even in your original request you started given the objects a path, so I assume others would ask the same. We could do something similar to what we do in opensearch, where the index name there takes a Golang template.

What I am think then is a config like this:

[[outputs.google_cloud_storage]]
  ## Bucket
  ## Name of Cloud Storage bucket to send metrics to.
  bucket = ""

  ## Object name
  ## Target object name for metrics. This is a Golang template (see
  ## https://pkg.go.dev/text/template). You can also specify metric name
  ## (`{{.Name}}`), tag value (`{{.Tag "tag_name"}}`), field value
  ## (`{{.Field "field_name"}}`), or timestamp (`{{.Time.Format "xxxxxxxxx"}}`).
  ## If the tag does not exist, the default tag value will be empty string "".
  ##
  ## For example: "telegraf-{{.Time.Format \"2006-01-02\"}}-{{.Tag \"host\"}}" 
  ## would set it to `telegraf-2023-07-27-HostName`
  object_name = ""

  ## Data format to output
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  # data_format = "influx"

  ## Credentials file
  ## Optional. File path for GCP credentials JSON file to authorize calls to
  ## Google Cloud Storage APIs. If not set explicitly, Telegraf will attempt to use
  ## Application Default Credentials, which is preferred.
  # credentials_file = "path/to/my/creds.json"

stijn-vanbael-enprove added the feature request Requests for new plugin and for new features to existing plugins label Apr 19, 2024

powersj added the waiting for response waiting for response from contributor label Apr 19, 2024

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 19, 2024

powersj added help wanted Request for community participation, code, contribution size/l 1 week or more effort labels Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Cloud Storage output plugin #15190

Google Cloud Storage output plugin #15190

stijn-vanbael-enprove commented Apr 19, 2024

powersj commented Apr 19, 2024

stijn-vanbael-enprove commented Apr 19, 2024

powersj commented Apr 19, 2024 •

edited

Google Cloud Storage output plugin #15190

Google Cloud Storage output plugin #15190

Comments

stijn-vanbael-enprove commented Apr 19, 2024

Use Case

Expected behavior

Actual behavior

Additional info

powersj commented Apr 19, 2024

stijn-vanbael-enprove commented Apr 19, 2024

powersj commented Apr 19, 2024 • edited

powersj commented Apr 19, 2024 •

edited