Skip to content

Commit

Permalink
Document logging architecture (#2170)
Browse files Browse the repository at this point in the history
* Document logging architecture

* Add why loki and which logs are stored

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>

* Add architecture diagram

* Add architecture diagram explaination

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Zach Stone <zach@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

Co-authored-by: Jonas Zeiger <jonas@giantswarm.io>

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

* Update src/content/vintage/getting-started/observability/logging/architecture/index.md

---------

Co-authored-by: Zirko <64951262+QuantumEnigmaa@users.noreply.github.com>
Co-authored-by: Zach Stone <zach@giantswarm.io>
Co-authored-by: Jonas Zeiger <jonas@giantswarm.io>
  • Loading branch information
4 people committed Mar 25, 2024
1 parent 0dfb305 commit 36d5d6e
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 4 deletions.
Expand Up @@ -9,7 +9,7 @@ menu:
parent: getting-started-observability
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-02-28
last_review_date: 2024-03-21
aliases:
- /getting-started/observability/logging
- /ui-api/observability/logs/
Expand Down
@@ -0,0 +1,72 @@
---
linkTitle: Logging architecture
title: Logging architecture
description: Documentation on the logging architecture deployed and maintained by Giant Swarm.
weight: 80
menu:
main:
identifier: getting-started-observability-logging-architecture
parent: getting-started-observability-logging
user_questions:
- What is the logging architecture?
- Why is Giant Swarm using Loki?
- Why is Giant Swarm recommending Loki?
- Which logs are stored by Giant Swarm?
- Where are the logs stored by Giant Swarm?
aliases:
- /getting-started/observability/logging/architecture
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-03-21
---

Logging is an important pillar of observability and it is thus only natural that Giant Swarm provides and manages a logging solution for operational purposes.

This document gives an overview of how logging is managed by Giant Swarm, including which logs are stored, which tools we use to ship and store them, as well as why we chose those tools in the first place.

## Overview of the logging platform

Here is an architecture diagram of our current logging platform:

![Logging pipeline architecture overview](logging-architecture.png)
<!-- Source: https://drive.google.com/file/d/1Gzl0mTdJcaui_zIC9QuHcgMX3QJygALo -->

In this diagram, you can see that we run the following tools in each management cluster as part of our logging platform:

- `Grafana Loki` that is accessible through our managed Grafana instance.
- `multi-tenant-proxy`, a proxy component used to handle multi-tenancy for Loki.
- A couple of logging agents (`Grafana Promtail` and `Grafana Agent`) that run on the management cluster and your workload clusters alike. We currently need two different tools for different purposes.
- Promtail is used to retrieve the container and kubernetes audit logs
- Grafana Agent is used to retrieve the kubernetes events.

If you want to play with Loki, you should definitely check out our guides explaining [how to access Grafana]({{< relref "/vintage/getting-started/observability/visualization/access" >}}) and how to [explore logs with LogQL]({{< relref "/vintage/getting-started/observability/visualization/log-exploration" >}})

## Logs stored by Giant Swarm

Kubernetes clusters produce a vast amount of machine and container logs.

The logging agents that we have deployed on management and workload clusters currently send the following logs to Loki:

- Kubernetes Pod logs from the `kube-system` and `giantswarm` namespaces.
- Kubernetes Events created in the `kube-system` and `giantswarm` namespaces.
- [Kubernetes audit logs]({{< relref "./audit-logs#kubernetes-audit-logs" >}})

In the future, we will also store the following logs:

- [Machine (Node) audit logs]({{< relref "./audit-logs#machine-audit-logs" >}})
- Teleport audit logs, tracked in https://github.com/giantswarm/roadmap/issues/3250
- Giant Swarm customer workload logs as part of our observability platform, tracked in https://github.com/giantswarm/roadmap/issues/2771

## Why we prefer Loki over its competitors

There are numerous reasons to use Grafana Loki in favor of its competitors.

First, we are **strong believers in Open Source** so the full Elastic stack is obviously out of the question.

Second, we are quite used to the Grafana ecosystem, where the **individual tools are made to work with one another without requiring a closed ecosystem**. Alternative logging solutions are either intended to work in isolation (like OpenDistro) or need to use a full-fledged solution (i.e. being able to collect and correlate all observability data), which is rarely open-source (coming back to the first point above).

Third, we are full-fledged users of Prometheus and PromQL. **LogQL, the Loki Query Language, is a natural extension to PromQL**, which makes it easy for our platform engineers to use and love.

The fourth reason is **cost and resource consumption.** Loki is cheaper to run than its competitors because it does not rely as heavily on persistent storage and uses Object storage instead, which is always cheaper in the cloud. The storage of the index is also cheaper for Loki as it uses label-based indexing, which is smaller than any kind of text-based indexing solution used by full-text search engines.

Finally, the last reason comes down to the history of Giant Swarm and it mostly boils down to **operation and maintenance**. Before we decided to run Loki, we used to run elasticsearch as our logging solution. Elasticsearch in itself is really hard to operate, especially at scale, even more so on Kubernetes because it is by its nature a stateful application (and for good reasons). This was an especially important factor in our decision since we do not need the full capabilities of OpenDistro like full-text search.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Expand Up @@ -5,7 +5,7 @@ description: A guide explaining how to interact with audit logs on Giant Swarm c
weight: 50
menu:
main:
identifier: getting-started-observability-logs-auditlogging
identifier: getting-started-observability-logging-auditlogging
parent: getting-started-observability-logging
user_questions:
- What are audit logs?
Expand All @@ -16,7 +16,7 @@ aliases:
- /ui-api/observability/logs/audit-logging
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-02-28
last_review_date: 2024-03-21
---

In this document you will learn what are audit logs, which kind is available on Giant Swarm clusters and how to access / ship them to a remote location.
Expand Down
Expand Up @@ -17,7 +17,7 @@ user_questions:
- What app do you recommend for monitoring?
- What app do you recommend for logging?
- What app do you recommend for tracing?
last_review_date: 2024-02-28
last_review_date: 2024-03-21
---

Observability is based on four main data sources: __logs__, __metrics__, __traces__ and __profiles__. To cover these needs, Giant Swarm provides its customers with a fully managed observability platform supported by 24//7 monitoring and alerting. The tools included in the observabily platform are listed below.
Expand Down

0 comments on commit 36d5d6e

Please sign in to comment.