Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Cloud Storage output plugin #15190

Open
stijn-vanbael-enprove opened this issue Apr 19, 2024 · 3 comments
Open

Google Cloud Storage output plugin #15190

stijn-vanbael-enprove opened this issue Apr 19, 2024 · 3 comments
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/l 1 week or more effort

Comments

@stijn-vanbael-enprove
Copy link

Use Case

I want to store incoming metrics directly into the cheapest form of storage we have, which on Google Cloud is Google Cloud Storage.

Example usage:

[[outputs.google_cloud_storage]]
  bucket = "my-bucket"
  data_format = "influx"
  credentials_file = "path/to/my/creds.json"
  metrics_per_object = 1
  group_by = "day"
  object_suffix = ".line"

Expected behavior

The configuration above will result in one object per metric in the bucket "my-bucket", with the following object name:
<measurement>/<date>/<timestamp>.line

Actual behavior

There is no support for outputting to Google Cloud Storage yet.

Additional info

No response

@stijn-vanbael-enprove stijn-vanbael-enprove added the feature request Requests for new plugin and for new features to existing plugins label Apr 19, 2024
@powersj
Copy link
Contributor

powersj commented Apr 19, 2024

Hi,

Some questions around the proposal:

Have you looked into how to manage credentials?

bucket = "my-bucket"

Would telegraf create the bucket or would we assume the user has created it?

metrics_per_object = 1

If you have 20 objects, would you then write 20 files at every interval? Likewise, if you have 10,000 metrics, 10,000 files? Rather than dividing shouldn't a plugin respect the batch format serializer setting instead.

//.line
group_by = "day"

What are you assuming date would look like? 2005-01-02? Are you assuming telegraf would create and manage different folders and auto-create new ones? How does that relate to the group by?

Are you planning to submit a PR?

@powersj powersj added the waiting for response waiting for response from contributor label Apr 19, 2024
@stijn-vanbael-enprove
Copy link
Author

Have you looked into how to manage credentials?

I assumed credentials would work in the same way as they do for the google_cloud_storage input plugin.

Would telegraf create the bucket or would we assume the user has created it?

Creating the bucket is not a hard requirement for me, but it would be nice if Telegraf could take care of it.

If you have 20 objects, would you then write 20 files at every interval? Likewise, if you have 10,000 metrics, 10,000 files? Rather than dividing shouldn't a plugin respect the batch format serializer setting instead.

Right, this is better handled by the serializer indeed.

What are you assuming date would look like? 2005-01-02? Are you assuming telegraf would create and manage different folders and auto-create new ones? How does that relate to the group by?

2005-01-02 would be a good format, but maybe it's better to have it configurable. Google Cloud Storage doesn't actually have folders. It just groups files for you in a folder-like structure when you use slashes in the object name.

Are you planning to submit a PR?

I'm afraid not

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 19, 2024
@powersj
Copy link
Contributor

powersj commented Apr 19, 2024

2005-01-02 would be a good format, but maybe it's better to have it configurable. Google Cloud Storage doesn't actually have folders. It just groups files for you in a folder-like structure when you use slashes in the object name.

Right, however, even in your original request you started given the objects a path, so I assume others would ask the same. We could do something similar to what we do in opensearch, where the index name there takes a Golang template.

What I am think then is a config like this:

[[outputs.google_cloud_storage]]
  ## Bucket
  ## Name of Cloud Storage bucket to send metrics to.
  bucket = ""

  ## Object name
  ## Target object name for metrics. This is a Golang template (see
  ## https://pkg.go.dev/text/template). You can also specify metric name
  ## (`{{.Name}}`), tag value (`{{.Tag "tag_name"}}`), field value
  ## (`{{.Field "field_name"}}`), or timestamp (`{{.Time.Format "xxxxxxxxx"}}`).
  ## If the tag does not exist, the default tag value will be empty string "".
  ##
  ## For example: "telegraf-{{.Time.Format \"2006-01-02\"}}-{{.Tag \"host\"}}" 
  ## would set it to `telegraf-2023-07-27-HostName`
  object_name = ""

  ## Data format to output
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  # data_format = "influx"

  ## Credentials file
  ## Optional. File path for GCP credentials JSON file to authorize calls to
  ## Google Cloud Storage APIs. If not set explicitly, Telegraf will attempt to use
  ## Application Default Credentials, which is preferred.
  # credentials_file = "path/to/my/creds.json"

@powersj powersj added help wanted Request for community participation, code, contribution size/l 1 week or more effort labels Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/l 1 week or more effort
Projects
None yet
Development

No branches or pull requests

2 participants