Releases · kinvolk/lokomotive

This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

14 Sep 09:31

ipochi

v0.9.0

91da0c6

v0.9.0 Latest

Latest

We're happy to announce the release of Lokomotive v0.9.0 (Indian Pacific).

Changes in v0.9.0

Kubernetes and control plane component updates

Update Kubernetes to v1.21.4 (#1567).
Update etcd to v3.4.16 (#1493).
Update calico to v3.19.1 (#1521).
Replace Packet CCM with Cloud Provider Equinix Metal (#1545).

New components

Add component azure-arc-onboarding (#1473).
Add control plane component node-local-dns (#1524).

Component updates

Update external-dns to v0.8.0 (#1499).
Update cert-manager to v1.4.0 (#1501).
Update dex to v2.28.1 (#1503).
Update velero to v1.6.0 (#1505).
Update prometheus-operator charts to v0.48.1 (#1506).
Update openebs-operator to v2.10.0 (#1509).
Update node-problem-detector to v0.8.8 (#1507).
Update rook to v1.6.5 (#1495).
Update contour to v1.16.0 (#1508).
Update linkerd to v2.10.2 (#1522)
Update cluster-autoscaler to v1.21.0 (#1512).
Update metallb to v0.9.6 (#1555).

Terraform provider updates

Update Terraform providers to their latest versions (#1523).

Features

equinixmetal: Rename documentation, code and configuration from Packet to Equinix Metal (#1545).
baremetal: Users can now configure node specific labels (#1405).
rook-ceph: Add new parameter resources for resource requests and limits (#1483).
baremetal: Add new parameter wipe_additional_disks which allows to wipe any additional disks attached to
the machine (#1486).
baremetal: Automated (re-)provisioning of worker nodes (#1502).
Add new parameter enable_node_local_dns to enable node-local-dns support for clusters (#1524).
Add parameter tolerations for prometheus-operator and its components (#1540).
Define MaxHistory to clean up old Helm releases (#1549).
Add cpu_manager_policy flag to workers in Lokomotive clusters on Equinix Metal and AWS (#1406).
cli: Allow skipping the control plane updates, if cluster is not successfully configured using the flag
--skip-control-plane-update (#1482).

Documentation

Use new label and taints syntax for rook-ceph (#1474).
Add information about restic parameter require_volume_annotation (#1539).
Rename Packet to Equinix Metal (#1537).

Bug Fixes

baremetal: Fix certificate rotation (#1478).
baremetal: Configure and persist kernel args (#1489).
Equinix Metal ARM: Use HTTP for iPXE URL (#1498)
instead of HTTPS as it's unreliable with iPXE.
terraform: Fix ignored ConditionPathExists from [Service] section to [Unit] section (#1518).
cli: Honor --upgrade-kubelets option (#1516).
Fix pre-update health check potentially rolling back to older release of control plane component
(#1515 &
#1549)

Miscellaneous

cli: Enable upgrade kubelets by default. Starting with v0.9.0 version the default value of
--upgrade-kubelets flag is changed from false to true (#1517).
baremetal: Let installer.service retry on failure (#1490).
baremetal: Set hostname from <cluster_name>-worker-<count_index> to controller_names<count_index> for
controllers and worker_names<count_index> for workers when set_standard_hostname is true
(#1488).
pkg/terraform: Increase the default parallelism (#1481).
cert-rotation: Print journal on error when restarting etcd (#1500).
Restart containers from systemd unit only, not from Docker daemon. This fixes possible race conditions while
rotating certificates (#1511).
Go module updates and cleanups (#1556).

Configuration syntax changes

Equinix Metal (formerly Packet)

Lokomotive cluster deployed on Equinix Metal needs cluster configuration change from packet to equinixmetal:

# old
cluster "packet" {
  ...
  ...
}

# new
cluster "equinixmetal" {
  ...
  ...
}

Baremetal

The variable k8s_domain_name now takes only the domain name instead of the <cluster_name>.<k8s_domain_name>.

Example:

# old
k8s_domain_name = "mercury.k8s.localdomain"

# new
k8s_domain_name = "k8s.localdomain"

Prometheus-operator

Alertmanager and operator are now configured as a block.

# old
alertmanager_retention    = "360h"
alertmanager_external_url = "https://api.example.com/alertmanager"
alertmanager_config       = file("alertmanager-config.yaml")
alertmanager_node_selector = {
  "kubernetes.io/hostname" = "worker3"
}

# new
alertmanager {
  retention    = "360h"
  external_url = "https://api.example.com/alertmanager"
  config       = file("alertmanager-config.yaml")
  node_selector = {
    "kubernetes.io/hostname" = "worker3"
  }
}

# old
prometheus_operator_node_selector = {
  "kubernetes.io/hostname" = "worker3"
}

# new
operator {
  node_selector = {
    "kubernetes.io/hostname" = "worker3"
  }
}

Baremetal features: User data changes and reprovisioning of worker nodes

The baremetal platform now supports user data changes and reprovisioning of worker nodes based on user data
changes.

From Lokomotive v0.9.0 onwards, additional files are created in the cluster assests directory.
The filename being the MAC address of the machine and the contents being the domain name.

The following upgrade paths are supported:

No user data changes to the worker nodes

In such a scenario, the only thing that needs to be done is the above mentioned change in k8s_domain_name.
By default, user data changes are ignored.

User data changes but no PXE reprovisioning of worker nodes (reprovisioning happens via SSH):

In such a scenario, Lokomotive reboots the worker nodes and applies the user data changes. To bring about
such a change:

Make user data changes (if any).
Set ignore_worker_changes = false.

User data changes and reprovisioning of worker nodes:

In such a scenario, Lokomotive forces reinstallation of worker nodes via PXE and applies the user data
changes. This requires a meaningful pxe_commands value configured for automation.

To bring about such a change:

Make user data changes (if any).
Remove the file with worker node MAC address from cluster assets directory.
Set ignore_worker_changes = false in cluster configuration.
Set pxe_commands to appropriate value.

NOTE: Reprovisioning will reinstall the operating system. If you have any stateful workloads running,
this step would result is data loss. Lokomotive does not taint or drain the worker nodes before
reprovisioning, it's recommended to be done manually before initiating reprovisioning of the worker nodes.

Updating from v0.8.0

Cluster update steps

NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a
version older than v0.8.0, update to v0.8.0 first and only then proceed with the update to v0.9.0.

Execute the following steps in your cluster configuration directory:

Download and install the lokoctl binary by following the
v0.9.0 installation guide
and verify the version:

lokoctl version
v0.9.0

Update steps for Equinix Metal (formerly Packet)

Backup the Terraform state file:

cd $assets_dir/terraform

terraform state pull > backup.state

Update Terraform provider from packethost/packet to equinix/metal:
```
terraform state replace-pr...
```

Assets 8

26 May 15:38

surajssd

v0.8.0

deb07e8

v0.8.0

We're happy to announce the release of Lokomotive v0.8.0 (Hogwarts Express).

Changes in v0.8.0

Kubernetes updates

Update AKS to 1.18.17 (#1466).

Component updates

Update prometheus-operator to 0.46.0 (#1440).
Update contour to v1.13.1 (#1450).
Update calico to v3.18.1 (#1453).

Terraform provider updates

Update Terraform providers to their latest versions (#1451).

Features

Add a certificate rotation command: lokoctl cluster certificate rotate (#1435).
Add reclaim_policy field to components rook-ceph, openebs-storage-class and aws-ebs-csi-driver. Change the default behaviour of the default storage class to Retain from Delete (#1369).

Deprecation and Removal

Remove the webhook field from the cert-manager component (#1413).

Updating from v0.7.0

Configuration syntax changes

Reclaim Policy

This is an optional step and only applies if you use any of these storage components: rook-ceph, openebs-storage-class or aws-ebs-csi-driver.

If you are relying on the default values for the PersistentVolumes to have a reclaim policy as Delete, then please add the following field explicitly now:

reclaim_policy = "Delete"

Cluster update steps

NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than v0.7.0, update to v0.7.0 first and only then proceed with the update to v0.8.0.

Execute the following steps in your cluster configuration directory:

Download and install the lokoctl binary by following the v0.8.0 installation guide and verify the version using lokoctl version:

v0.8.0

Download the release bundle:

curl -LO https://github.com/kinvolk/lokomotive/archive/v0.8.0.tar.gz
tar -xvzf v0.8.0.tar.gz

On all platforms except AKS, update Calico CRDs:

kubectl apply -f ./lokomotive-0.8.0/assets/charts/control-plane/calico/crds/

Update the control-plane:

lokoctl cluster apply -v

NOTE: If the update process gets interrupted, rerun the above command.

NOTE: If your cluster is running self-hosted kubelets, append --upgrade-kubelets to the above command.

NOTE: The command updates the cluster as well as any Lokomotive components
applied to it. Append --skip-components to the above command to avoid updating
the components. Components can then be updated individually using lokoctl component apply.

The update process typically takes about 10 minutes.
After the update, running lokoctl health should result in an output similar to the following:

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-0    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-1    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-2    True     KubeletReady    kubelet is posting ready status
Name      Status    Message              Error

etcd-0    True      {"health":"true"}

Updating Contour

Manually update the CRDs before updating the component contour:

kubectl apply -f https://raw.githubusercontent.com/projectcontour/contour/release-1.13/examples/contour/01-crds.yaml

Update the component:

lokoctl component apply contour

Updating Prometheus Operator

Manually update the CRDs before updating the component prometheus-operator:

kubectl apply -f ./lokomotive-0.8.0/assets/charts/components/prometheus-operator/crds/

Update the component:

lokoctl component apply prometheus-operator

Assets 8

16 Mar 17:38

johananl

v0.7.0

813e070

v0.7.0

We're happy to announce the release of Lokomotive v0.7.0 (Ghan).

Changes in v0.7.0

Kubernetes updates

Update Kubernetes to v1.20.4 (#1410).

New components

Add component node-problem-detector (#1384).

Component updates

Update aws-ebs-csi-driver from 0.7.0 to 0.9.0 (#1393).
Update web-ui from 0.1.3 to 0.2.1 (#1412).

Features

AWS EBS CSI Driver: Add node_affinity and tolerations (#1393).
EM: Add worker pool specific facility attribute (#1359).

Documentation

Use FLUO to update nodes (#1295).
How to add a worker pool in a different facility (#1361).
Refactor AWS quickstart guide (#1273).

Bug fixes

EM: Add Restart=on-failure and RestartSec=5s for the metadata service (#1362).
Fix wrong etcd settings, clean up leftovers from etcd move from rkt to docker based daemon (#1382).
contour: Fix hostPort regression (#1342).

Deprecation removal

baremetal: Remove enable_tls_bootstrap attribute (#1380).
Remove a deprecated cert-manager namespace label certmanager.k8s.io/disable-validation=true (#1372).

Miscellaneous

AWS EBS CSI Driver: Change the StorageClass' default ReclaimPolicy to Retain (#1393).
Include a "v" in version strings when releasing (#1417).

Updating from v0.6.1

Configuration syntax changes

Bare-metal

Delete the enable_tls_bootstrap parameter from your cluster configuration since it has been removed in this release.

Cluster update steps

NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than v0.6.1, update to v0.6.1 first and only then proceed with the update to v0.7.0.

Execute the following steps in your cluster configuration directory:

Download and install the lokoctl binary by following the
v0.7.0 installation guide
and verify the version using lokoctl version:

v0.7.0

Update the control plane:

lokoctl cluster apply -v

NOTE: If the update process gets interrupted, rerun the above command.

NOTE: If your cluster is running self-hosted kubelets, append --upgrade-kubelets to the above command.

NOTE: The command updates the cluster as well as any Lokomotive components
applied to it. Append --skip-components to the above command to avoid updating
the components. Components can then be updated individually using lokoctl component apply.

The update process typically takes about 10 minutes.
After the update, running lokoctl health should result in an output similar to the following:

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-0    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-1    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-2    True     KubeletReady    kubelet is posting ready status
Name      Status    Message              Error

etcd-0    True      {"health":"true"}

On all platforms except AKS, do the following:

Download the release bundle:

curl -LO https://github.com/kinvolk/lokomotive/archive/v0.7.0.tar.gz
tar -xvzf v0.7.0.tar.gz

Run the update script:

./lokomotive-0.7.0/scripts/update/0.6.1-0.7.0/update.sh

Assets 8

12 Feb 08:55

ipochi

v0.6.1

236b7a3

v0.6.1

This is a patch release which includes mainly bug fixes.

NOTE: Please read the updating guidelines here.

Changes in v0.6.1

Development

Velero: Add tolerations to Restic plugin (#1348).
Velero: Add e2e tests (#1353).
Update all Go dependencies (#1358).

Terraform Provider Updates

Update Packet (Equinux Metal) Terraform provider to 3.2.1 that fixes the provisioning failures of
n2.xlarge.x86 machines (#1349).

Bug fixes

Prefix ETCD_ for standard etcd environment variables only (#1308).
Update Restic TolerationSeconds type to integer and add conditional checks (#1365).

Docs

Add missing provider parameter (#1354).
Update RELEASING document to add steps to update the documentation website entry (#1326).
Improvements to the Lokomotive release process documentation (#1341).

Updating from v0.6.0

Cluster update steps

NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a
version older than v0.6.0, update to v0.6.0 first and only then proceed with the update to v0.6.1.

Please perform the following manual steps in your cluster configuration directory.

Download and install the lokoctl binary by following the v0.6.1 installation
guide.
```
lokoctl version
v0.6.1
```
Update control plane.

lokoctl cluster apply --skip-components -v

NOTE: If the update process gets interrupted, rerun above command.

The update process typically takes about 10 minutes.
After the update, running lokoctl health should result in an output similar to the following:

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-0    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-1    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-2    True     KubeletReady    kubelet is posting ready status
Name      Status    Message              Error

etcd-0    True      {"health":"true"}

Download the release bundle.

curl -LO https://github.com/kinvolk/lokomotive/archive/v0.6.1.tar.gz
tar -xvzf v0.6.1.tar.gz

Run update script

./lokomotive-0.6.1/scripts/update/0.6.0-0.6.1/update.sh

Assets 8

22 Jan 11:26

surajssd

v0.6.0

f7934f0

v0.6.0

We're happy to announce the release of Lokomotive v0.6.0 (Flying Scotsman).

This release includes several new features, many component updates, and a new platform - Tinkerbell.

Changes in v0.6.0

Kubernetes updates

Update Kubernetes to v1.19.4 and AKS to v1.18.10 (#1189).

Component updates

Update external-dns to v0.7.4 (#1115).
Update metrics-server to v2.11.2 (#1116).
Update cluster-autoscaler to version v1.1.0 (#1137).
Update rook to v1.4.6 (#1117).
Update velero to v1.5.2 (#1131).
Update openebs-operator to v2.2.0 (#1095).
Update contour to v1.10.0 (#1170).
Update experimental-linkerd to stable-2.9.0 (#1123).
Update web-ui to v0.1.3 (#1237).
Update prometheus-operator to v0.43.2 (#1162).
Update Calico to v3.17.0 (#1251).
Update aws-ebs-csi-driver to v0.7.0 (#1135).
Update etcd to 3.4.14 (#1309).

Terraform provider updates

Update Terraform providers to their latest versions (#1133).

New platforms

Add support for Tinkerbell platform (#392).

Bug fixes

Add new worker pools when TLS bootstrap is enabled without remaining stuck in the installation phase (#1181).
contour: Consistently apply node affinity and tolerations to all scheduled workloads (#1161).
Don't run control plane components as DaemonSets on single control plane node clusters (#1193).

Features

Add Packet CCM to Packet platform (#1155).
contour: Parameterize Envoy scraping interval (#1229).
Expose --conntrack-max-per-core kube-proxy flag (#1187).
Add require_volume_annotation for restic plugin (#1132).
Print bootkube journal if cluster bootstrap fails (#1166). This makes cluster bootstrap problems easier to debug.
aws-ebs-csi-driver: Add dynamic provisioning, resizing and snapshot options (#1277). Now the user has the ability to control the AWS EBS driver to enable or disable provisioning, resizing and snapshotting.
Expose the following parameters for Lokomotive Baremetal Platform#1317:
- install_disk: Disk device where Flatcar Container Linux is installed.
- install_to_smallest_disk: Installs Flatcar Container Linux to the smallest disk.
- kernel_args: Addtional kernel args to provide at PXE boot.
- download_protocol: Protocol iPXE uses to download kernel and initrd.
- network_ip_autodetection_method: Method to detect host IPv4 address.

Security enhancements

calico-host-protection: Add custom locked down PSP configuration (#1274).

Documentation

Add openebs-operator update guide (#1163).
Add rook-ceph update guide (#1165).

Miscellaneous

Pull control plane images from Quay to avoid hitting Docker Hub pulling limits (#1226).
Bootkube now waits for all control plane charts to converge before exiting, which should make the bootstrapping process more stable (#1085).
Remove deprecated CoreOS mentions from AWS (#1245) and bare metal (#1246).
Improve hardware reservations validation rules on Equinix Metal (#1186).

Updating from v0.5.0

Configuration syntax changes

AWS

Removed the undocumented cluster.os_name parameter, since Lokomotive supports Flatcar Container Linux only.

Bare-metal

The cluster.os_channel parameter got simplified by removing the flatcar- prefix.

Old

os_channel = "flatcar-stable"

New

os_channel = "stable"

Velero

Velero requires an explicit provider field to select the provider.
Example:

component `velero` {
  provider = "openebs"

  openebs {
    ...
  }
}

Updating Prometheus Operator

Due to a change in the upstream Helm chart, updating the Prometheus Operator component incurs down time. We do this before updating the cluster so no visibility is lost while the cluster update is happening.

Patch the PersistentVolume created/used by the prometheus-operator component to Retain claim policy.

kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')

kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')

NOTE: To execute the above command, the user must have a cluster wide permission.

Uninstall the prometheus-operator release and delete the existing PersistentVolumeClaim, and verify PersistentVolume become Released.

lokoctl component delete prometheus-operator

kubectl delete pvc data-prometheus-prometheus-operator-prometheus-0 -n monitoring
kubectl delete pvc data-alertmanager-prometheus-operator-alertmanager-0 -n monitoring

Remove current spec.claimRef values to change the PV's status from Released to Available.

kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')

kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')

NOTE: To execute the above command, the user must have a cluster wide permission.

Make sure that the prometheus-operator's storage_class and prometheus.storage_size are unchanged during the upgrade process.
Proceed to a fresh prometheus-operator component installation. The new release should now re-attach your previously released PV with its content.

lokoctl component apply prometheus-operator

NOTE: Etcd dashboard will only start showing data after the cluster is updated.

Delete the old kubelet service.

kubectl -n kube-system delete svc prometheus-operator-kubelet

If monitoring was enabled for rook, contour, metallb components, make sure you update them as well after the cluster is updated.

Cluster update steps

NOTE: Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than v0.5.0, update to v0.5.0 first and only then proceed with the update to v0.6.0.

Please perform the following manual steps in your cluster configuration directory.

Download the release bundle.

curl -LO https://github.com/kinvolk/lokomotive/archive/v0.6.0.tar.gz
tar -xvzf v0.6.0.tar.gz

Install the Packet CCM.

If you are running Lokomotive on Equinix Metal (formerly Packet), then install Packet CCM. Export your Packet cluster's project ID and API Key.

export PACKET_AUTH_TOKEN=""
export PACKET_PROJECT_ID=""

echo "apiKey: $PACKET_AUTH_TOKEN
projectID: $PACKET_PROJECT_ID" > /tmp/ccm-values.yaml

helm install packet-ccm --namespace kube-system --values=/tmp/ccm-values.yaml ./lokomotive-0.6.0/assets/charts/control-plane/packet-ccm/

Update node config.

On Equinix Metal (formerly Packet), this script shipped with the release tarball will add permanent MetalLB labels and kubelet config to use CCM.

NOTE: Please edit this script to disable updating certain nodes. Modify the update_other_nodes function as required.

UPDATE_BOOTSTRAP_COMPONENTS=false
./lokomotive-0.6.0/scripts/update/0.5-0.6/update.sh $UPDATE_BOOTSTRAP_COMPONENTS

If you're using the self-hosted kubelet, apply the --cloud-provider flag to it.

NOTE: If you're unsure you can run the command as it's harmless if you're not using the self-hosted kubelet.

kubectl -n kube-system get ds kubelet -o yaml | \
  sed '/client-ca-file.*/a \ \ \ \ \ \ \ \ \ \ --cloud-provider=external \\' | \
  kubectl apply -f -

Export assets directory.

export ASSETS_DIR="assets"
...

Assets 8

27 Oct 17:16

knrt10

v0.5.0

d4861e9

v0.5.0

We're happy to announce the release of Lokomotive v0.5.0 (Eurostar).

This release packs new features, bug fixes, code optimizations, platform updates and security hardening.

Changes in v0.5.0

Kubernetes updates

Update Kubernetes to v1.19.3 (#1030).

Platform updates

AKS

Update Kubernetes to 1.18.8 (#1071).

Baremetal

Expose CNI MTU on the baremetal platform (#977).

New components

Component web-ui (#981), (#1100) from headlamp.
Component inspektor-gadget (#1076) from inspektor-gadget.

Component updates

Update Velero component for Packet (OpenEBS and restic plugin support) (#881).
istio-operator: Update to 1.7.3 (#1086).
prometheus-operator: Update grafana, kube-state-metrics and node_exporter (#963).
cert-manager: Update to 1.0.3 (#1114).

Terraform updates

Update to Terraform 0.13 (#824).

Features

Support in-cluster pod traffic encryption (#911).
AWS, Packet, Baremetal: use Docker instead of rkt for host containers (#946).
Change labels and taints format from string to structured (#1042).
prometheus-operator: Add external_url (#964).

Docs

Concepts: add document for admission webhook (#943).
Coding style guide (#953).
MetalLB: Clarify address_pools knob (#996).
How to guide on backing up and restoring rook-ceph volumes with Velero (#1048).

Bug fixes

bootkube: feed output using local rather than local_file content (#1021).
Dex: fix pod reload on config change (#1040).
MetalLB: Add missing autodiscovery labels (#990).
Gangway: add a ServiceAccount (#1104).
If there is more than one component installed in single namespace, lokoctl will now
refuse to remove then namespace while running lokoctl component --delete with --delete-namespace flag (#1093).

Development

Fix error capitalization (#979).
pkg/terraform: unexport functions not used outside of package (#984).
pkg/components: remove unused List() function (#982).
docs/rook-ceph-storage: Use correct apply command (#1026).
pkg/asssets/assets_generate: Fix copyright (#1020).
Cleanup Terraform providers before Terraform 0.13 upgrades (#860).
kubelet e2e: Enable the disruptive test (#1012).
.golangci.yml: Re-enable linters (#1029).
Fix scripts/find-updates.sh (#1034), (#1068), (#1080).
pkg/terraform: improvements (#1027).
cli/cmd: cleanups part 1 (#1013).
test/components/kubernetes: remove kubelet pod when testing node labels (#1052).
Remove usage of template_file (#1046).
test: de-duplicate value timeout and retryInterval (#1049).
Packet: Read BGP peer address from metadata service (#1010).
pkg/assets: cleanup exported API (#936).
Cobra updated to v1.1.1 (#1082), (#1091).
cli/cmd: cleanups part 2 (#1015).
Add github actions (#1074).
Makefile: use latest Go when building in Docker (#1083).
cli/cmd: cleanups part 3 (#1018).
Add new CI config for Packet based FLUO testing (#1110).

Updating from v0.4.1

Configuration syntax changes

There have been some minor changes to the configurations of worker nodes.

The data type of labels and taints has been changed from string to map(string) for the AWS and Packet platforms.

Old:

labels = "testing=true"

taints = "nodeType=storage:NoSchedule"

New:

labels = {
  "testing" = "true"
}

taints = {
  "nodeType" = "storage:NoSchedule"
}

This release also changes the default cluster.oidc.client_id value from gangway to clusterauth.

This setting must match gangway.client_id and dex.static_client.id.

If you use default settings for oidc you'll need to add client_id = "gangway" or change the static_client.id and client_id parameters for dex and gangway to clusterauth respectively.

Old:

packet {
  oidc {
    client_id = "gangway"
  }
}

New:

packet {
  oidc {
    client_id = "clusterauth"
  }
}

Cluster update steps

Ensure your cluster is in a healthy state by running lokoctl cluster apply using the v0.4.1 version.

Updating multiple versions at a time is not supported so, if your cluster is older, update to v0.4.1 and only then proceed with the update to v0.5.0.

Due to Terraform and Kubernetes updates to v0.13+ and v1.19.3 respectively.

Some manual steps need to be performed when updating. In your cluster configuration directory, follow these steps:

Update local Terraform binary to version v0.13.X. You can follow this guide to do that.
Starting from your cluster directory, export your platform name and assets directory name used in your platform configuration. It will be used in next steps:

export PLATFORM="packet" && export ASSETS_DIR="assets"

Remove old asset files:

rm -f $ASSETS_DIR/terraform-modules/$PLATFORM/flatcar-linux/kubernetes/require.tf \
$ASSETS_DIR/terraform-modules/$PLATFORM/flatcar-linux/kubernetes/workers/require.tf \
$ASSETS_DIR/terraform-modules/dns/route53/require.tf

Go to the terraform directory:

cd $ASSETS_DIR/terraform

Replace the old providers:

terraform state replace-provider -auto-approve registry.terraform.io/-/ct registry.terraform.io/poseidon/ct && \
terraform state replace-provider -auto-approve registry.terraform.io/-/template registry.terraform.io/hashicorp/template

Return to original directory and use kubeconfig generated by lokomotive:

cd - && export KUBECONFIG=$ASSETS_DIR/cluster-assets/auth/kubeconfig

FelixConfiguration has been moved to calico charts. To avoid firewall interruption, label and annotate it so that it can be managed by Helm while updating:

kubectl label FelixConfiguration default app.kubernetes.io/managed-by=Helm --overwrite=true && \
kubectl annotate FelixConfiguration default meta.helm.sh/release-name=calico --overwrite=true && \
kubectl annotate FelixConfiguration default meta.helm.sh/release-namespace=kube-system --overwrite=true

Finally, run the following:

lokoctl cluster apply --skip-components -v

NOTE: On clusters with a single controller node, you need to delete the old kube-apiserver ReplicaSet during cluster update.

When lokoctl prints that kube-apiserver is being updated, run the following command:

kubectl delete rs -n kube-system $(kubectl get rs -n kube-system -l k8s-app=kube-apiserver --no-headers=true --sort-by=metadata.creationTimestamp | tac | tail -n +2 | awk '{print $1}') || true

NOTE: When this gets executed the update process will get interrupted. Re-run lokoctl cluster apply --skip-components -v to proceed.

The update process typically takes about 10 minutes.
After the update, running lokoctl health should result in an output similar to the following:

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
l...

Assets 8

16 Sep 11:49

surajssd

v0.4.1

23be6c2

v0.4.1

This is a patch release which includes mainly bug fixes.

NOTE: Please read the upgrading guidelines here.

Changes in v0.4.1

Component updates

Dex convert to helm chart and update to v2.25.0 (#962).

Features

feat: add severity labels to MetalLB alerts (#925).

Bug fixes

Override memory limits of rook operator to 512Mi (#938).
Fix envoy grafana dashboard errors (#969).
MetalLB: Fix regressions of tolerations and nodeSelectors (#927).
Fix controlplane components update order (#937).
component/metallb: Fix controller tolerations (#931).
Increased the node-ready and cluster-ping timeouts (#952).

Docs

docs: fix etcd version upgrade sed expression (#921).
docs: fix rook version update command (#930).

Development

Fix output of convertNodeSelector in rook (#945).
httpbin convert to helm chart (#965).
FLUO: convert to Helm chart (#935).
Makefile: Don't build before linting and add new target lint-bin (#901).

Assets 8

07 Sep 15:35

surajssd

v0.4.0

a98d512

v0.4.0

We're happy to announce the release of Lokomotive v0.4.0 (Darjeeling Himalayan).

This release packs new features, bug fixes, code optimizations, better user interface, latest versions of components, security hardening and much more.

Changes in v0.4.0

Kubernetes Updates

Update Kubernetes version to v1.18.8 (#861).

Platform updates

AKS

Update Kubernetes version to 1.17.9 (#849).

AWS

AWS: Add support for custom taints and labels (#832).

New Components

Add component experimental-istio-operator (#686).
Add component experimental-linkerd (#690).

Component updates

Update etcd to v3.4.13 (#838).
Update Calico to v3.15.2 (#841).
Update Grafana to 7.1.4 and chart version 5.5.5 (#842).
Update Velero chart to 1.4.2 (#830).
Update ExternalDNS chart to 3.3.0 (#845).
Update Amazon Elastic Block Store (EBS) CSI driver to v0.6.0 (#856).
Update Cluster Autoscaler to v2 version 1.0.2 (#859).
Update cert-manager to v0.16.1 (#847).
Update OpenEBS to v1.12.0 (#781).
Update MetalLB to v0.1.0-789-g85b7a46a (#885).
Update Rook to v1.4.2 (#879).
Use new bootkube image at version v0.14.0-helm-7047a87 (#775), later updated to v0.14.0-helm-ec64535 as a part of (#704).
Update Prometheus operator to 0.41.0 and chart version 9.3.0 (#757).
Update Contour to v1.7.0 (#771).

Terraform Providers Updates

Update all Terraform providers to latest versions (#835).

UX

Add autocomplete for bash and zsh in lokoctl (#880).

Run the following command to start using auto-completion for lokoctl:
```
source <(lokoctl completion bash)
```
Add kubeconfig fallback to Terraform state (#701).

Features

Add label lokomotive.kinvolk.io/name: <namespace_name> to all namespaces (#646).
Add admission webhook to lokomotive, which disables automounting default service account token (#704).
[Breaking Change] Kubelet joins cluster using TLS Bootstrapping now, add flag enable_tls_bootstrap = false to disable. (#618).
Add csi_plugin_node_selector and csi_plugin_toleration for rook-ceph's CSI plugin (#892).

Docs

Setting up third party OAuth for Grafana (#542).
Upgrading bootstrap kubelet (#592).
Upgrading etcd (#802).
How to add custom monitoring resources? (#554).
Kubernetes storage with Rook Ceph on Packet cloud (#494).

Bug fixes

aws: Add check for multiple worker pools with same LB ports (#889).
packet: ignore changes to plan and user_data on controller nodes (#907).
Introduce platform.PostApplyHook interface and implement it for AKS cluster (#886).
aws-ebs-csi-driver: add NetworkPolicy allowing access to metadata (#865).
pkg/components/cluster-autoscaler: fix checking device uniqueness (#768).

Development

Replace use of github.com/pkg/errors.Wrapf with fmt (#831, #877).
Refactor assets handling (#807).
cli/cmd: improve --kubeconfig-file flag help message formatting (#818).
Use host's /etc/hosts entries for bootkube (#409).
Refactor Terraform executor (#794).
Pass kubeconfig content around rather than a file path (#631).

Upgrading from v0.3.0

Lokoctl Host binary upgrades

terraform-provider-ct

Update the ct Terraform provider to v0.6.1, find the install instructions here.

Disable TLS Bootstrap

In this release we introduced TLS bootstrapping and we enable it by default. To avoid cluster recreation, disable it by adding the following attribute to the cluster ... block:

cluster "packet" {
  enable_tls_bootstrap = false
...

Cluster upgrade steps

Go to your cluster's directory and run the following command:

lokoctl cluster apply --skip-components -v

The update process typically takes about 10 minutes.
After the update, running lokoctl health should result in an output similar to the following.

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-0    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-1    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-2    True     KubeletReady    kubelet is posting ready status
Name      Status    Message              Error

etcd-0    True      {"health":"true"}

Cluster nodes component upgrade (optional)

Manually upgrade etcd following the steps mentioned in the doc here.
Manually upgrade the kubelet running on the nodes, by following the steps mentioned in the doc here.

Manual Cluster Changes

The latest version of Metallb changes the labels of the ingress nodes. Label all the nodes that have asn set with the new labels:

kubectl label $(kubectl get nodes -o name -l metallb.universe.tf/my-asn) \
  metallb.lokomotive.io/my-asn=65000 metallb.lokomotive.io/peer-asn=65530

Find a peer address of a node and assign it new label:

for node in $(kubectl get nodes -o name -l metallb.universe.tf/peer-address); do
  peer_ip=$(kubectl get $node -o jsonpath='{.metadata.labels.metallb\.universe\.tf/peer-address}')
  kubectl label $node metallb.lokomotive.io/peer-address=$peer_ip
done

Now it is safe to update:

lokoctl component apply metallb

Ceph Upgrade steps

These steps are curated from the upgrade doc provided by rook: https://rook.io/docs/rook/master/ceph-upgrade.html.

Keep note of the CSI images:

kubectl --namespace rook get pod -o \
  jsonpath='{range .items[*]}{range .spec.containers[*]}{.image}{"\n"}' \
  -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | \
  sort | uniq

Ensure autoscale is on

Ensure that the output of the command ceph osd pool autoscale-status | grep replicapool says on (in the last column) and not warn in the toolbox pod. If it says warn. Then run the command ceph osd pool set replicapool pg_autoscale_mode on to set it to on. This is to ensure we are not facing: rook/rook#5608.

Read more about the toolbox pod here: https://github.com/kinvolk/lokomotive/blob/v0.4.0/docs/how-to-guides/rook-ceph-storage.md#enable-and-access-toolbox.

NOTE: If you see this error [errno 5] RADOS I/O error (error connecting to the cluster) in
toolbox pod then tag the toolbox pod image to a specific version using this command: kubectl -n rook set image deploy rook-ceph-tools rook-ceph-tools=rook/ceph:v1.3.2.
Ceph Status

Run the following in the toolbox pod:
```
watch ceph status
```
Ensure that the output says that health is HEALTH_OK. Match the output such that everything looks fine as explained here: https://rook.io/docs/rook/master/ceph-upgrade.html#status-output.
Pods in rook namespace:

Watch the pods status in another from the rook namespace in another terminal window. Just running this will be enough:
```
watch kubectl -n rook get pods -o wide
```
Watch for the rook version update

Run the following command to keep an eye on the rook version update as it is rolls down for all the components:
```
watch --exec ...
```

Assets 8

31 Jul 18:23

ipochi

v0.3.0

b48e09e

v0.3.0

We're happy to announce the release of Lokomotive v0.3.0 (Coast Starlight).

This release packs new features and bugfixes. Some of the highlights are:

Kubernetes 1.18.6
For Lokomotive clusters running on top of AKS, Kubernetes 1.16.10 is installed.
Component updates

Changes in v0.3.0

Kubernetes updates

Update Kubernetes to v1.18.6(#726).

Platform updates

Packet

Update default machine type from t1.small.x86 to c3.small.x86, since t1.small.x86 are EOL and no longer available in new Packet projects (#612).

WARNING: If you haven't explicitly defined the controller_type and/or worker_pool.node_type configuration options, upgrading to this release will replace your controller and/or worker nodes with c3.small.x86 machines thereby losing all your cluster data. To avoid this, set these configuration options to the desired values.

Make sure that the below attributes are explicitly defined in your cluster configuration. This only applies to machine type t1.small.x86.
```
cluster "packet" {
  .
  .
  controller_type = "t1.small.x86"
  .
  .
  worker_pool "pool-name" {
    .
    node_type = "t1.small.x86"
    .
  }
}
```

AKS

Update Kubernetes version to 1.16.10 (#712).

Component updates

openebs: update to 1.11.0 (#673).
calico: update to v3.15.0 (#652).

UX

prometheus-operator: Organize Prometheus related attributes under a prometheus block in the configuration (#710).
Use prometheus.ingress.host to expose Prometheus instead of prometheus_external_url (#710).
contour: Remove ingress_hosts from contour configuration (#635).

Features

Add enable_toolbox attribute to rook-ceph component (#649). This allows managing and configuring Ceph using toolbox pod.
Add Prometheus feature external_labels for federated clusters to Prometheus operator component. This helps to identify metrics queried from different clusters (#710).

Docs

Add Type column to Attribute reference table in configuration references (#651).
Update contour configuration reference for usage with AWS (#674).
Add documentation related to the usage of clc_snippets for Packet and AWS (#657).
Improve documentation on using remote backends (#670).
How to guide for setting up monitoring on Lokomotive (#480).
Add codespell section in development documentation (#700).
Include a demo GIF in the readme (#636).

Bugfixes

Remove contour ingress workaround (due to an upstream issue) for ExternalDNS (#635).

Development

Do not show Helm release values in terraform output (#627).
Remove Terraform provider aliases from platforms code (#617).

Miscellaneous

Following flatcar/Flatcar#123, Flatcar 2513.1.0 for ARM contains the dig binary so the workaround is no longer needed (#703).
Improve error message for wait-for-dns output (#735).
Add codespell to enable spell check on all PRs (#661).

Upgrading from v0.2.1

Configuration syntax changes

There have been some minor changes to the configurations of following components:

contour
prometheus-operator.

Please make sure new the configuration structure is in place before the upgrade.

Contour component

Optional ingress_hosts attribute is now removed.

old:

component "contour" {
  .
  .
  ingress_hosts = ["*.example.lokomotive-k8s.net"]
}

new:

component "contour" {
  .
  .
}

Prometheus-operator component

Prometheus specific attributes are now under a prometheus block.
A new optional prometheus.ingress sub-block is introduced to expose Prometheus over ingress.
Attribute external_url is now removed and now configured under prometheus.ingress.host. Remove URL scheme (e.g. https://) and URI (e.g. /prometheus) when configuring. URI is no longer supported and protocol is always HTTPS.

old:

component "prometheus-operator" {
  .
  .
  prometheus_metrics_retention = "14d"
  prometheus_external_url      = "https://prometheus.example.lokomotive-k8s.net"
  prometheus_storage_size      = "50GiB"
  prometheus_node_selector = {
    "kubernetes.io/hostname" = "worker3"
  }
  .
  .
}

new:

component "prometheus-operator" {
  .
  .
  prometheus {
    metrics_retention = "14d"
    storage_size      = "50GiB"
    node_selector = {
      "kubernetes.io/hostname" = "worker3"
    }

    ingress {
      host                       = "prometheus.example.lokomotive-k8s.net"
    }
    .
    .
  }
  .
  .
}

Check out the new syntax in the Prometheus Operator configuration reference for details.

Upgrade steps

Go to your cluster's directory and run the following command.

lokoctl cluster apply

The update process typically takes about 10 minutes.

After the update, running lokoctl health should result in an output similar to the following.

Node                     Ready    Reason          Message

lokomotive-controller-0  True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-0    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-1    True     KubeletReady    kubelet is posting ready status
lokomotive-1-worker-2    True     KubeletReady    kubelet is posting ready status
Name      Status    Message              Error

etcd-0    True      {"health":"true"}

Post upgrade steps

Openebs

OpenEBS control plane components and data plane components work independently. Even after the OpenEBS Control Plane components have been upgraded to 1.11.0, the Storage Pools and Volumes (both jiva and cStor) will continue to work with older versions.

Upgrade functionality is still under active development. It is highly recommended to schedule a downtime for the application using the OpenEBS PV while performing this upgrade. Also, make sure you have taken a backup of the data before starting the below upgrade procedure. - Openebs documentation.

Upgrade cStor Pools

Extract the SPC name using kubectl get spc:

NAME                          AGE
cstor-pool-openebs-replica1   24h

The Job spec for upgrade cstor pools is:

#This is an example YAML for upgrading cstor SPC.
#Some of the values below needs to be changed to
#match your openebs installation. The fields are
#indicated with VERIFY
---
apiVersion: batch/v1
kind: Job
metadata:
  #VERIFY that you have provided a unique name for this upgrade job.
  #The name can be any valid K8s string for name. This example uses
  #the following convention: cstor-spc-<flattened-from-to-versions>
  name: cstor-spc-1001120

  #VERIFY the value of namespace is same as the namespace where openebs components
  # are installed. You can verify using the command:
  # `kubectl get pods -n <openebs-namespace> -l openebs.io/component-name=maya-apiserver`
  # The above command should return status of the openebs-apiserver.
  namespace: openebs
spec:
  backoffLimit: 4
  template:
    spec:
      #VERIFY the value of serviceAccountName is pointing to service account
      # created within openebs namespace. Use the non-default account.
      # by running `kubectl get sa -n <openebs-namespace>`
      serviceAccountName: openebs-operator
      containers:
      - name:  upgrade
        args:
        - "cstor-spc"

        # --from-version is the current version of the pool
        - "--from-version=1.10.0"

        # --to-version is the version desired upgrade version
        - "--to-version=1.11.0"

        # Bulk upgrade is supported from 1.9
        # To make use of it, please provide the list of SPCs
        # as mentioned below
        - "cstor-pool-name"
        # For upgrades older than 1.9.0, use
        # '--spc-name=<spc_name> form...

Assets 8

25 Jun 08:10

invidian

v0.2.1

0e8b071

v0.2.1

This is a patch release to fix AKS platform deployments.

Changes in v0.2.1

Kubernetes updates

Updated Kubernetes version on AKS platform to 1.16.9 (#626). This fixes deploying AKS clusters, as the previously used version is not available anymore.

Security

Updated golang.org/x/text dependency to v0.3.3 (#648) to address CVE-2020-14040.

Bugfixes

Fixes example configuration for AKS platform (#626). Contour component configuration syntax changed and those files had not been updated.

Misc

Bootkube Docker images are now pulled using Docker protocol, as quay.io plans to deprecate pulling images using ACI (#656.

Development

AKS platform is now being tested for every pull request and master branch changes in the CI.
Added script for finding available component updates in upstream repositories (#375).

Assets 8

Releases: kinvolk/lokomotive

v0.9.0

Changes in v0.9.0

Kubernetes and control plane component updates

New components

Component updates

Terraform provider updates

Features

Documentation

Bug Fixes

Miscellaneous

Configuration syntax changes

Equinix Metal (formerly Packet)

Baremetal

Prometheus-operator

Baremetal features: User data changes and reprovisioning of worker nodes

No user data changes to the worker nodes

User data changes but no PXE reprovisioning of worker nodes (reprovisioning happens via SSH):

User data changes and reprovisioning of worker nodes:

Updating from v0.8.0

Cluster update steps

Update steps for Equinix Metal (formerly Packet)

v0.8.0

Changes in v0.8.0

Kubernetes updates

Component updates

Terraform provider updates

Features

Deprecation and Removal

Updating from v0.7.0

Configuration syntax changes

Reclaim Policy

Cluster update steps

Updating Contour

Updating Prometheus Operator

v0.7.0

Changes in v0.7.0

Kubernetes updates

New components

Component updates

Features

Documentation

Bug fixes

Deprecation removal

Miscellaneous

Updating from v0.6.1

Configuration syntax changes

Bare-metal

Cluster update steps

v0.6.1

Changes in v0.6.1

Development

Terraform Provider Updates

Bug fixes

Docs

Updating from v0.6.0

Cluster update steps

v0.6.0

Changes in v0.6.0

Kubernetes updates

Component updates

Terraform provider updates

New platforms

Bug fixes

Features

Security enhancements

Documentation

Miscellaneous

Updating from v0.5.0

Configuration syntax changes

AWS

Bare-metal

Old

New

Velero

Updating Prometheus Operator

Cluster update steps

v0.5.0

Changes in v0.5.0

Kubernetes updates