Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: run k8sgpt analyze will only complete successfully once , after trivy integration is active #1063

Open
3 of 4 tasks
liyuerich opened this issue Apr 13, 2024 · 5 comments

Comments

@liyuerich
Copy link

Checklist

  • I've searched for similar issues and couldn't find anything matching
  • I've included steps to reproduce the behavior

Affected Components

  • K8sGPT (CLI)
  • K8sGPT Operator

K8sGPT Version

0.3.29 (5db4bc2)

Kubernetes Version

v1.27.5

Host OS and its Version

Ubuntu Linux controller-node-1 5.4.0-174-generic

Steps to reproduce

  1. first I active trivy and run k8sgpt analyze successfully,
  2. then I run k8sgpt analyze again, I got error message.
  3. after deactive trivy, run k8sgpt analyze again. it completed successfully.

error message:
k8sgpt analyze --explain
fatal error: concurrent map writes
goroutine 16 [running]:
[k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName(0xc00014b3b0](http://k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName(0xc00014b3b0), {{0x2be7d26, 0x16}, {0x2bc5c63, 0x8}, {0x241e398, 0x15}}, {0x345f568?, 0xc0009b4310})
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:181 +0x345
[k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypes(0xc00014b3b0](http://k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypes(0xc00014b3b0), {{0x2be7d26?, 0x0?}, {0x2bc5c63?, 0x0?}}, {0xc000854620?, 0x16, 0x0?})
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:148 +0x176
github.com/aquasecurity/trivy-operator/pkg/apis/aquasecurity/v1alpha1.addKnownTypes(0xc0008547b8?)
/home/runner/go/pkg/mod/github.com/aquasecurity/trivy-operator@v0.17.1/pkg/apis/aquasecurity/v1alpha1/register.go:22 +0x4b7
k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme(...)
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme_builder.go:29
[github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.analyzeConfigAuditReports({0x0](http://github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.analyzeConfigAuditReports(%7B0x0)?, 0x0?}, {0xc000c0a1e0, {0x34736e0, 0x4d66ac0}, {0x0, 0x0}, {0x3473750, 0x4d21c00}, 0x0, ...})
/home/runner/work/k8sgpt/k8sgpt/pkg/integration/trivy/analyzer.go:92 +0x6e
[github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.Analyze({0x0](http://github.com/k8sgpt-ai/k8sgpt/pkg/integration/trivy.TrivyAnalyzer.Analyze(%7B0x0)?, 0x0?}, {0xc000c0a1e0, {0x34736e0, 0x4d66ac0}, {0x0, 0x0}, {0x3473750, 0x4d21c00}, 0x0, ...})
/home/runner/work/k8sgpt/k8sgpt/pkg/integration/trivy/analyzer.go:162 +0x58
[github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis.func3({0x3446300](http://github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis.func3(%7B0x3446300)?, 0xc0005c463c?}, {0xc0005552c0, 0x11})
/home/runner/work/k8sgpt/k8sgpt/pkg/analysis/analysis.go:268 +0xd9
created by github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis in goroutine 1
/home/runner/work/k8sgpt/k8sgpt/pkg/analysis/analysis.go:266 +0x685

Expected behaviour

run k8sgpt analyze should complete successfully

Actual behaviour

it failed

Additional Information

No response

@liyuerich liyuerich changed the title [bug]: run k8sgpt analyze will only complete successfully once , after active trivy integration [bug]: run k8sgpt analyze will only complete successfully once , after trivy integration is active Apr 13, 2024
@VaibhavMalik4187
Copy link
Contributor

Concurrent map writes indicate that this is a synchronization problem. I'll take a look. Thanks for reporting @liyuerich

@VaibhavMalik4187
Copy link
Contributor

Small update, I tried to reproduce the issue with the the steps mentioned above. Unfortunately, I couldn't replicate this issue on Ubuntu 23.10, K8SGPT version: master

@xiormeesh
Copy link

I'm also getting intermittent "concurrent map writes" on 0.3.29 but I don't have trivy integration enabled, this seems to happen when the system is under load but even then I can't reproduce it reliably, just rerunning the command usually produces expected output.

I had it twice, both times I was running cluster-wide analysis (not limiting by the namespaces, having all filters enabled including Log with slows down the analysis significantly), kubeapi was also quite busy with other queries (first time installing several operators in parallel, second time running another scanning tool querying kubeapi as well), both times rerunning exactly the same command right after the failure succeeds.

k8sgpt version: 0.3.29
k8s version: v1.28.7 installed via brew
running on: Ubuntu 22.04.4 LTS, kernel 5.14.0-1054-oem,

CLI commands and output (first time it failed, didn't save the log from the second one):
k8sgpt_analyze_concurrent_map_writes.log

@xiormeesh
Copy link

It happened again today working with another cluster (same k8sgpt cli installation), working fine I removed log filter and run analyze again

#k8sgpt filters remove Log
Filter(s) Log removed
#k8sgpt analyze
fatal error: concurrent map writes
fatal error: concurrent map writes

goroutine 32 [running]:
k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName(0xc000268d20, {{0x40979af, 0x19}, {0x40590bf, 0x2}, {0x3770c97, 0x7}}, {0x4a541d8, 0xc00094e680})
k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:174 +0x270
k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypes(0xc000268d20, {{0x40979af?, 0x0?}, {0x40590bf?, 0x0?}}, {0xc000bfa7f8?, 0x6?, 0xc0002bd4d0?})
k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme.go:148 +0x165
sigs.k8s.io/gateway-api/apis/v1.addKnownTypes(0xc000268d20)
sigs.k8s.io/gateway-api@v1.0.0/apis/v1/zz_generated.register.go:60 +0x186
k8s.io/apimachinery/pkg/runtime.(*SchemeBuilder).AddToScheme(...)
k8s.io/apimachinery@v0.28.4/pkg/runtime/scheme_builder.go:29
github.com/k8sgpt-ai/k8sgpt/pkg/analyzer.GatewayClassAnalyzer.Analyze({}, {0xc000938ea0, {0x4a6c5b8, 0x65a8dc0}, {0x0, 0x0}, {0x0, 0x0}, 0x0, {0x0, ...}, ...})
github.com/k8sgpt-ai/k8sgpt/pkg/analyzer/gatewayclass.go:38 +0x11f
github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis.func3({0x4a3a9c0?, 0x65a8dc0?}, {0xc000904380, 0xc})
github.com/k8sgpt-ai/k8sgpt/pkg/analysis/analysis.go:268 +0xd9
created by github.com/k8sgpt-ai/k8sgpt/pkg/analysis.(*Analysis).RunAnalysis in goroutine 1
github.com/k8sgpt-ai/k8sgpt/pkg/analysis/analysis.go:266 +0x65e

Now k8sgpt analyze is failing even if I enable back Log filter, so 100% reproducible but I still have no idea how to trigger that on purpose, because I've enabled/disabled Log filter before without issue. I'm going to wait until tomorrow and see if reinstalling k8sgpt will fix it (I'll need it for a demo tomorrow).

@chaunceyt
Copy link

Hi, I'm adding support for external-secrets via integrations and see this issue when running go run . analyze. However, if I run go run . analyze --filter SecretStore I get the expected output.

go run . integrations list
Active:
> externalsecrets
Unused:
> trivy
> prometheus
> aws
 go run . filters list
Active:
> ClusterExternalSecret (integration)
> ClusterSecretStore (integration)
> Deployment
> Ingress
> SecretStore (integration)
> MutatingWebhookConfiguration
> ExternalSecrets
> Node
> Pod
> StatefulSet
> ValidatingWebhookConfiguration
> PersistentVolumeClaim
> ExternalSecret (integration)
> ReplicaSet
> PushSecret (integration)
> HorizontalPodAutoScaler
> Service
> CronJob
Unused:
> GatewayClass
> Gateway
> HTTPRoute
> PodDisruptionBudget
> NetworkPolicy
> Log

I attributed it to the number of analyzers I introduced. Each of those required an AddToScheme.

	err := v1alpha1.AddToScheme(client.Scheme())
	if err != nil {
		return nil, err
	}

Things seem to get better when I switched to using the following:

	var mutex = &sync.RWMutex{}

	mutex.Lock()
	err := v1alpha1.AddToScheme(client.Scheme())
	if err != nil {
		return nil, err
	}
	mutex.Unlock()

Seeing the reference to Trivy it made we wonder if the issue related to the way integrations loads an integration and executes it.

OS Dawrin 13.6.6
Branch: main
Kind cluster: v1.29.2

connecurrent-map-writes.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Proposed
Development

No branches or pull requests

4 participants