Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend kube_inventory plugin to support Custom Resource metrics #14917

Closed
Shubhama19 opened this issue Mar 1, 2024 · 21 comments
Closed

Extend kube_inventory plugin to support Custom Resource metrics #14917

Shubhama19 opened this issue Mar 1, 2024 · 21 comments
Assignees
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor

Comments

@Shubhama19
Copy link

Use Case

This feature would allow users to configure telegraf to fetch state metrics for Custom resources. Users should provide a list of CRDs and the gauges that needs to be monitored by telegraf.
This can either be provided as a config to telegraf or as input flags in telegraf deployment yaml.
This is similar to how native kube-state-metrics supports the monitoring for custom resource.
https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md

Expected behavior

Users should be able to monitor Custom resource State metrics.

Actual behavior

Currently there is no way to retrieve metrics for resources other than the ones supported out of the box. This feature would provide extensibility.

Additional info

No response

@Shubhama19 Shubhama19 added the feature request Requests for new plugin and for new features to existing plugins label Mar 1, 2024
@powersj
Copy link
Contributor

powersj commented Mar 1, 2024

Hi,

Assume I don't know much about this plugin and how it is used ;)

Users should provide a list of CRDs and the gauges that needs to be monitored by telegraf.

Looking briefly at the plugin it has a list of collectors, where it goes through and pulls metrics from pods, nodes, secrets, etc. Are you wanting to extend this list to allow the user to collect from some other resource?

What resource is missing the list that isn't already captured?

Thanks

@powersj powersj added the waiting for response waiting for response from contributor label Mar 1, 2024
@Shubhama19
Copy link
Author

Shubhama19 commented Mar 1, 2024

Hey @powersj,
The default list may not be able to provide metrics for all the resources deployed by the user. As we know, kubernetes provides a way to create/deploy custom controllers through CRDs.
Currently telegraf does not provide a way to monitor those resources while kube state metrics provides a way to extend the capability to monitor custom resources.
Is it possible to have something similar here as well ? where we could configure telegraf to monitor the user defined custom resources for specific gauges ?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 1, 2024
@powersj
Copy link
Contributor

powersj commented Mar 4, 2024

The default list may not be able to provide metrics for all the resources deployed by the user

As I asked above, can you give me an example?

As we know, kubernetes provides a way to create/deploy custom controllers through CRDs.

I am not a K8s expert, so I don't know this :) Can you explain what an example would be please?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 4, 2024
@Shubhama19
Copy link
Author

Ohh ok @powersj,
Like we have Deployments, daemonsets, statefulsets .. etc users have the capability to define their own K8s resource using Custom Resource Definitions.
They are deployed similarly as we deploy a Deployment or Daemonset.
So, since the current list of resources monitored by Telegraf is exhaustive and there is no way we can extend that list using the telegraf config or deployment config, we cannot monitor the metrics for those resources on our clusters.
You can refer this -> https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md
this is how Kube State Metrics allows users to define their custom resources that needs to be monitored.

So can we have a feature similar to this which would allow us to monitor custom resources as well as any other resources which are not there in the default list of resources monitored by Telegraf.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 5, 2024
@powersj
Copy link
Contributor

powersj commented Mar 5, 2024

Thanks for the pointers and additional background.

I spent some time looking to see if or how the kubernetes/client-go can talk to CRDs to collect information about them. I saw some talk about talking directly to them, but others talk about building custom REST clients. I looked through this blog post where the user had to define their own REST client + pass the custom resource and a RedHat post on CRDs.

With my limited understanding, it doesn't seem practical to create a generic CRD client in Telegraf. Given the need to pass a CRD's API this certainly could vary from CRD to CRD. It may make more sense for a user of Telegraf to create their own client that pulls down the data from whatever CRD API they need and return metrics via the exec input. That way a user can continue to use telegraf to collect other metrics, and pull down metrics from their custom CRD no matter the API as well.

Thoughts?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 5, 2024
@Shubhama19
Copy link
Author

Yes, i also went through some of the links and using the RESTCLIENT apis makes the most sense to me. Which would look something like this - c.RESTClient().Get().AbsPath("/apis/<api>/<version>/namespaces/<namespace>/<resource-kind>/<resource-name>").DoRaw()

I was thinking, the user can provide a list of CRDs as part of config that needs to be monitored, and we can have a function like getCRDs() similar to what we have for other k8s objects.

The struct should look something like this -

type CRDs struct {
  GroupName string
  GroupVersion string
}

this struct would be of type array and gets populated based on the CRDs provided in the config. getCRDs() function can read from this array and perform a GET call using the above RESTCLIENT.

Does this sound feasible ?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 6, 2024
@powersj
Copy link
Contributor

powersj commented Mar 6, 2024

I was thinking, the user can provide a list of CRDs as part of config that needs to be monitored, and we can have a function like getCRDs() similar to what we have for other k8s objects.

What does the return value of the the REST call look like?

Given you could have many different CRDs that I assume could produce different data, how would telegraf know how to transform that result into metrics?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 6, 2024
@Shubhama19
Copy link
Author

I think we would want scrape pretty generic metrics which should be common across all CRDs. These metrics could be like - Active Count of a resource, for example we have created a CRD with Kind Foo in apiGroup mytest.io with version v1
So we should be able to retrieve how many active object with KindFoo are available.
Then we can have other metrics like Creation_Time, Up_Time, Status etc..
These are very generic metrics and i think should be common across all CRDs.

Maybe adding support for custom metrics specific to a CRD could be a thing for future but this is a good starting point.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2024
@powersj
Copy link
Contributor

powersj commented Mar 7, 2024

I think we would want scrape pretty generic metrics which should be common across all CRDs

I asked for an example of the response, do you have one to share?

These are very generic metrics and i think should be common across all CRDs.

It was my understanding that it was up the creator of the CRD as to what this endpoint produces. Is that correct? If so how would we pick "generic metrics"?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 7, 2024
@Shubhama19
Copy link
Author

Shubhama19 commented Mar 7, 2024

So CRD basically allows us to define a custom resource which has set of specifications, we are not worried about that.
What we want to monitor is the k8s object created using that definition. That k8s object follows the standard k8s template and has standard paths.
The fields that i have mentioned above i.e. creation time, up time etc.. are all added by k8s when we create the object, so we don't need any additional information to fetch that information.

K8s CRD and Custom resource doc -> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2024
@powersj
Copy link
Contributor

powersj commented Mar 7, 2024

Are you planning to put up a PR? Otherwise I need your help to provide some answers to my questions. I have asked multiple times now to understand what we would need to do for development. If you are not going to provide one, then let's close this.

The link you provided tells me nothing that I can see about getting metrics from a CRD. It is all about creating one.

@powersj powersj added the waiting for response waiting for response from contributor label Mar 7, 2024
@Shubhama19
Copy link
Author

I have literally provided you all the information that you have asked for. I don't understand what else do you need ?
Have a config from which telegraf reads the list of Custom resources it needs to monitor which has the resource information about the kind/group/apiVersion which is all you need to fetch it using the RestClient.
once you have the resource it is like any other k8s object similar to Deployment/Daemonset/Replicaset etc etc

I am not sure what else do you need ?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2024
@powersj
Copy link
Contributor

powersj commented Mar 7, 2024

We are using the k8s client library as it can help manage authentication and calls for us. The existing client in use does not seem to have a way to talk directly to CRDs or get information about them. Have I misunderstood that?

You have made the leap to just calling API endpoints, but ignores the entire set of authentication and other existing tooling that exists with using the k8s client. While there is some use of a generic http client for kubelets, I'd like to avoid extending that usage.

I have asked twice from you to get an example response from that API. I want to see the full API response so we can get an idea of the metrics that Telegraf would create.

@powersj powersj added the waiting for response waiting for response from contributor label Mar 7, 2024
@Shubhama19
Copy link
Author

Shubhama19 commented Mar 7, 2024

We are using the k8s client library as it can help manage authentication and calls for us. The existing client in use does not seem to have a way to talk directly to CRDs or get information about them. Have I misunderstood that?

AFAIU, the existing client should be able to query and fetch the custom resource. The client implements the RESTCLIENT interface and that can be reused. Since it is the same client, i don't think we should face any auth issues. (this is my assumption).

I am assuming we can store the data in an unstructured object and fetch it using the call i had provided earlier -

data := &unstructured.Unstructured{}
var err error
data.SetGroupVersionKind(schema.GroupVersionKind{
    Group:   "GroupName",
    Kind:    "ResourceKind",
    Version: "GroupVersion",
})
data, err = c.RESTClient().Get().AbsPath("/apis/<api>/<version>/namespaces/<namespace>/<resource-kind>/<resource-name>").DoRaw()

Let me look further into it, will get back and see if it is possible or not.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2024
@Shubhama19
Copy link
Author

@powersj i have added the changes to my fork -> d8b0df4

i have tested these changes and they work, what do you think ?
This is just the initial draft, will build upon this

@powersj
Copy link
Contributor

powersj commented Mar 18, 2024

@Shubhama19,

Nice find with the dynamic client. What do you consider to be the next steps for your branch? Happy to see a PR!

@Shubhama19
Copy link
Author

@powersj can you assign the issue to me ?
I would be happy to raise a PR if the approach seems fine to you.

@powersj
Copy link
Contributor

powersj commented Mar 18, 2024

@powersj can you assign the issue to me ? I would be happy to raise a PR if the approach seems fine to you.

So far it does. What would be the remaining work? I assume something around parsing the metrics?

@Shubhama19
Copy link
Author

@powersj yes, we need to add more metrics, currently i have only tested it for collecting the created time metric.
Also i need to add UTs

@powersj
Copy link
Contributor

powersj commented May 10, 2024

@Shubhama19 - were you able to get something in a state where you wanted to put up a PR?

@powersj powersj added the waiting for response waiting for response from contributor label May 10, 2024
@telegraf-tiger
Copy link
Contributor

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins waiting for response waiting for response from contributor
Projects
None yet
Development

No branches or pull requests

2 participants