Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appliance: deploy symbols service #62270

Merged
merged 1 commit into from
Apr 30, 2024
Merged

Conversation

craigfurman
Copy link
Member

@craigfurman craigfurman commented Apr 30, 2024

Closes https://github.com/sourcegraph/sourcegraph-operator/issues/45

Needs to be rebased on another PR, but it's independent of that branch, so should be reviewable.

Test plan

See golden fixtures

@cla-bot cla-bot bot added the cla-signed label Apr 30, 2024
@craigfurman craigfurman force-pushed the appliance-deploy-symbols-service branch from 00efabd to e1e92ec Compare April 30, 2024 11:08
@craigfurman
Copy link
Member Author

This PR may come with a golden fixture, but how on earth is the reviewer meant to know it's what we want? If the question is "does it do the same thing as our helm chart, and if not, why not?", this annotated output of compare helm may help 🙏

go run ./internal/appliance/dev/compare-helm \
  -component symbols \
  -golden-file internal/appliance/testdata/golden-fixtures/symbols-default.yaml \
  -helm-template-extra-args '--set symbols.serviceAccount.create=true'

Similar to the PR for repo-updater, we unconditionally create the service account. There's no harm it in, and user-facing config does come at the cost of incremental cognitive load.

I haven't commented on every diff line, just the parts I thought were interesting. Please do read it and raise points I haven't raised. Some of it is just kubernetes default values, since this golden fixture is actually pulled from a kube-apiserver.

The output:

2c2
< # helm: ServiceAccount/symbols
---
> # golden: ServiceAccount/symbols
5a6,8
>   annotations:
>     appliance.sourcegraph.com/configHash: bd6ab558ec98ed6b2607b80469bb9f209b485fa44a6dd5beaf18c37011340afc
>   creationTimestamp: "2024-04-19T00:00:00Z"
7,8d9
<     app.kubernetes.io/component: symbols
<     category: rbac
10a12,14
>   namespace: NORMALIZED_FOR_TESTING
>   resourceVersion: NORMALIZED_FOR_TESTING
>   uid: NORMALIZED_FOR_TESTING
12c16
< # helm: Service/symbols
---
> # golden: Service/symbols
16a21
>     appliance.sourcegraph.com/configHash: bd6ab558ec98ed6b2607b80469bb9f209b485fa44a6dd5beaf18c37011340afc
18a24
>   creationTimestamp: "2024-04-19T00:00:00Z"
23a30,32
>   namespace: NORMALIZED_FOR_TESTING
>   resourceVersion: NORMALIZED_FOR_TESTING
>   uid: NORMALIZED_FOR_TESTING
25c34,40
<   clusterIP: None
---
>   clusterIP: NORMALIZED_FOR_TESTING
>   clusterIPs:
>   - NORMALIZED_FOR_TESTING
>   internalTrafficPolicy: Cluster
>   ipFamilies:
>   - IPv4
>   ipFamilyPolicy: SingleStack
28a44
>     protocol: TCP
31a48
>     protocol: TCP
35,36c52,55
<     app.kubernetes.io/instance: release-name
<     app.kubernetes.io/name: sourcegraph
---
>   sessionAffinity: None
>   type: ClusterIP
> status:
>   loadBalancer: {}
38c57
< # helm: StatefulSet/symbols
---
> # golden: StatefulSet/symbols
43c62,64
<     description: Backend for symbols operations.
---
>     appliance.sourcegraph.com/configHash: b35c5d99a13bcd7e13d7d2a752a9a46e99e0d4daaef836d66223afc57cdc44c4
>   creationTimestamp: "2024-04-19T00:00:00Z"
>   generation: 1
46,47d66
<     app.kubernetes.io/instance: release-name
<     app.kubernetes.io/managed-by: Helm
49c68
<     app.kubernetes.io/version: 5.3.2
---
>     app.kubernetes.io/version: 5.3.9104
51d69
<     helm.sh/chart: sourcegraph-5.3.2
52a71,73
>   namespace: NORMALIZED_FOR_TESTING
>   resourceVersion: NORMALIZED_FOR_TESTING
>   uid: NORMALIZED_FOR_TESTING
53a75,79
>   minReadySeconds: 10
>   persistentVolumeClaimRetentionPolicy:
>     whenDeleted: Retain
>     whenScaled: Retain
>   podManagementPolicy: OrderedReady
59,60d84
<       app.kubernetes.io/instance: release-name
<       app.kubernetes.io/name: sourcegraph
65d88
<         checksum/redis: 63b58e05a2640417d599c4aee6d866cb9063e3a9aa452dc08dbfff836b7781b7

#62227

66a90
>       creationTimestamp: null
69,70d92
<         app.kubernetes.io/instance: release-name
<         app.kubernetes.io/name: sourcegraph
71a94
>       name: symbols
73d95
<       affinity: null
87c109
<           value: "10800"
---
>           value: "11059"

This is an artifact of 1024 vs 1000 bytes to a kilobyte etc etc. I would be happy to just use 1000 to fall in line with the output of helm, but this is actually surprisingly fiddly, as it would avoid foregoing the nice k8s client-go functions for parsing quantities. The helm chart is actually wrong here, as it takes a default specified in GiB (1024*1024*1024) and just chops off the "Gi" and treats it as a decimal: https://github.com/sourcegraph/deploy-sourcegraph-helm/blob/bc372d3e31c4ff7c656801dcd360dd9326639289/charts/sourcegraph/templates/symbols/symbols.StatefulSet.yaml#L54-L62

When we come to adopt resources, we'll likely be changing the cache size of many deployments by a few %. It seems unlikely that terrible things will happen when we do that, but I haven't tested this yet.

90a113
>               apiVersion: v1
98a122
>               apiVersion: v1
104a129
>           failureThreshold: 3
109a135,136
>           periodSeconds: 10
>           successThreshold: 1
114a142
>           protocol: TCP
116a145
>           protocol: TCP
117a147
>           failureThreshold: 3
121a152
>           initialDelaySeconds: 60
123c154,155
<           timeoutSeconds: 5
---
>           successThreshold: 1
>           timeoutSeconds: 1
135a168
>         terminationMessagePath: /dev/termination-log
142c175,177
<       nodeSelector: null
---
>       dnsPolicy: ClusterFirst
>       restartPolicy: Always
>       schedulerName: default-scheduler
145a181
>         runAsGroup: 101
146a183
>       serviceAccount: symbols
148c185
<       tolerations: null
---
>       terminationGracePeriodSeconds: 30
151,152d187
<         name: cache
<       - emptyDir: {}

I was surprised to discover that the helm chart provisioned an emptyDir for
cache in addition to a PVC, but I think this line in the client-go docs for
StatefulSet.Spec.VolumeClaimTemplates clears it up:

A claim in this list takes precedence over any volumes in the template, with the same name.

This should therefore behave the same way as helm. Separately, I think we need
to explore developer setups for deploying this appliance and a Sourcegraph to
something like Docker Desktop Kubernetes / k3d / kind / whatever. I can't
remember what sort of PersistentVolume controller those ship with, if any, but
we can always explore either adding one that basically behaves like emptyDir, or
restoring a LocalDevMode toggle in the appliance to replace all PVC templates
with emptyDirs. We can split an issue for that.

157,158c192,195
<   - metadata:
<       name: cache
---
>   - apiVersion: v1
>     kind: PersistentVolumeClaim

client-go is always inserting this k8s type information for the PVC template,
which looks odd. I don't think it does any harm though.

>     metadata:
>       creationTimestamp: null
165a203,278
>       volumeMode: Filesystem
>     status:
>       phase: Pending
> status:
>   availableReplicas: 0
>   replicas: 0
> ---
> # golden: ConfigMap/sg
> apiVersion: v1
> data:
>   spec: |
>     spec:
>       requestedVersion: "5.3.9104"
> 
>       blobstore:
>         disabled: true
> 
>       codeInsights:
>         disabled: true
> 
>       codeIntel:
>         disabled: true
> 
>       frontend:
>         disabled: true
> 
>       gitServer:
>         disabled: true
> 
>       indexedSearch:
>         disabled: true
> 
>       indexedSearchIndexer:
>         disabled: true
> 
>       pgsql:
>         disabled: true
> 
>       postgresExporter:
>         disabled: true
> 
>       preciseCodeIntel:
>         disabled: true
> 
>       redisCache:
>         disabled: true
> 
>       redisExporter:
>         disabled: true
> 
>       redisStore:
>         disabled: true
> 
>       repoUpdater:
>         disabled: true
> 
>       searcher:
>         disabled: true
> 
>       symbols: {}
> 
>       syntectServer:
>         disabled: true
> 
>       worker:
>         disabled: true
> kind: ConfigMap
> metadata:
>   annotations:
>     appliance.sourcegraph.com/currentVersion: 5.3.9104
>     appliance.sourcegraph.com/managed: "true"
>   creationTimestamp: "2024-04-19T00:00:00Z"
>   name: sg
>   namespace: NORMALIZED_FOR_TESTING
>   resourceVersion: NORMALIZED_FOR_TESTING
>   uid: NORMALIZED_FOR_TESTING

I wonder, do we want a PR to remove the SG configmap from the golden fixtures? I can't decide whether its presence is helpful or noisy.

@craigfurman craigfurman force-pushed the appliance-deploy-symbols-service branch from e1e92ec to 76ebe32 Compare April 30, 2024 11:16
Comment on lines -58 to -61
// set StorageClass name if a custom storage class is being sgeated.
if sg.Spec.StorageClass.Create {
p.Spec.StorageClassName = &storageClassName
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this commit, we would only set a StorageClassName if one was being created by the appliance, which is a feature we haven't added yet. I don't think this is quite right - I think we want users to be able to inject storage classes they create outside of SG, and might use across many deployments. We may also want a feature to optionally create a StorageClass.

I think my replacement (unconditionally expecting a StorageClass name) might also be wrong. I'm going to explore this a bit further and make a follow-on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #62286 and #62285 capture this (big stack incoming).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we want users to be able to inject storage classes they create outside of SG, and might use across many deployments.

Yeah the thinking was if we don't create our own, use the default. But there are some use cases of users likely wanting to use their own non-default storage classes so this is a better way to deal with it.

@@ -295,13 +297,12 @@ type SymbolsSpec struct {
// Default: 12Gi
StorageSize string `json:"storageSize,omitempty"`

// Resources allows for custom resource limits and requests.
Resources *corev1.ResourceList `json:"resources,omitempty"`

// Env defines environment variables for Symbols.
Env map[string]string `json:"env,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of the diff, but: pretty much every SG service in the helm chart exposes an extra-env-vars config element. This might be necessary for some services. If it isn't necessary for some, we want to use the appliance to streamline config to minimise cognitive load.

I propose: if it's only needed for some services, we remove it from the others. If it's needed for most, we make it a StandardConfig generic feature. I can do that in a follow-on. @jdpleiness, what are your thoughts here? 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be necessary for some services. If it isn't necessary for some

Again I'd have to double check, but I don't think many (possibly any) services actually use this in the Helm chart. I think we could likely remove it entirely to keep things simple and if needed we can always revert back and add it in later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell just from looking at the helm chart. The "proper" (but labour-intensive) way of doing this AFAICT is to look up the env var config for each service...

Do you reckon it's worth doing that, or just remove this now? (this can be done in a follow-on)

internal/appliance/symbols.go Show resolved Hide resolved
Comment on lines +46 to +50
// TODO DRY out across all services
storageClassName := sg.Spec.StorageClass.Name
if storageClassName == "" {
storageClassName = "sourcegraph"
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine my other comment about looking more deeply at storage classes, and using struct-tag defaults, and a follow-on will make this go away too 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #62286 and #62285 capture this (big stack incoming).

@craigfurman craigfurman marked this pull request as ready for review April 30, 2024 11:25
@craigfurman craigfurman requested a review from a team April 30, 2024 11:25
Base automatically changed from appliance-deployment-rm-default-replicas to main April 30, 2024 13:39
@craigfurman craigfurman force-pushed the appliance-deploy-symbols-service branch from 76ebe32 to e487b2b Compare April 30, 2024 13:40
@craigfurman craigfurman force-pushed the appliance-deploy-symbols-service branch from e487b2b to 6af497d Compare April 30, 2024 15:08
@jdpleiness
Copy link
Contributor

Similar to the PR for repo-updater, we unconditionally create the service account.

I'd have to double check, but I don't think there is a good reason for creating service accounts for any of our services aside from perhaps frontend. I know these are in the Helm chart, but that's assuming the Helm chart is always right, which is risky.

@craigfurman
Copy link
Member Author

@jdpleiness re: ServiceAccounts, good point. Shall we remove them from here, blobstore, and repo-updater?

@craigfurman
Copy link
Member Author

Let's do that in a follow-on actually

@craigfurman craigfurman merged commit eb5123b into main Apr 30, 2024
13 checks passed
@craigfurman craigfurman deleted the appliance-deploy-symbols-service branch April 30, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants