Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document logging architecture #2170

Merged
merged 24 commits into from Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
65d3c02
Document logging architecture
QuentinBisson Mar 21, 2024
89891e2
Add why loki and which logs are stored
QuentinBisson Mar 21, 2024
aa6839c
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
3afa04d
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
323347f
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
4e211e6
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
6bc6a6b
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
aae51ec
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
ba7c682
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
f10fd28
Add architecture diagram
QuentinBisson Mar 21, 2024
a9692fa
Add architecture diagram explaination
QuentinBisson Mar 21, 2024
1298db6
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
c51f772
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
876c534
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
b05a82d
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
2c869d1
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
5e50bce
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
14f1457
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
f007ec6
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
91b1a46
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
c7b4f52
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
998b987
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 21, 2024
ee68f43
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 22, 2024
803ebaf
Update src/content/vintage/getting-started/observability/logging/arch…
QuentinBisson Mar 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -1,16 +1,16 @@
---
linkTitle: Logging
title: Logging
description: A serie of guides explaining how to interact with logs accessible within Giant Swarm clusters.
weight: 30
menu:
main:
identifier: getting-started-observability-logging
parent: getting-started-observability
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-02-28
last_review_date: 2024-03-21
aliases:
- /getting-started/observability/logging
- /ui-api/observability/logs/
---

Check warning on line 16 in src/content/vintage/getting-started/observability/logging/_index.md

View workflow job for this annotation

GitHub Actions / Front matter problems

Found 1 less severe problems

WARN - The page should have a last_review_date
@@ -0,0 +1,72 @@
---
linkTitle: Logging architecture
title: Logging architecture
description: Documentation on the logging architecture deployed and maintained by Giant Swarm.
weight: 80
menu:
main:
identifier: getting-started-observability-logging-architecture
parent: getting-started-observability-logging
user_questions:
- What is the logging architecture?
- Why is Giant Swarm using loki?
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved
- Why is Giant Swarm recommending loki?
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved
- Which logs are stored by Giant Swarm?
- Where are the logs stored by Giant Swarm?
aliases:
- /getting-started/observability/logging/architecture
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-03-21
---

Check warning on line 21 in src/content/vintage/getting-started/observability/logging/architecture/index.md

View workflow job for this annotation

GitHub Actions / Front matter problems

Found 1 less severe problems

WARN - The page should have a last_review_date

Logging is an important pillar of observability and it is thus only natural that Giant Swarm provides and manages a logging solution for operational purposes.

This documentation will give you an overview of how logging is managed by Giant Swarm : which logs are stored, which tools we use to ship and store them as well as why we chose those tools in the first place.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

## Overview of the logging platform

Here is an architecture diagram of our current logging platform:

![Logging pipeline architecture overview](logging-architecture.png)
<!-- Source: https://drive.google.com/file/d/1Gzl0mTdJcaui_zIC9QuHcgMX3QJygALo -->

In this diagram, you can see that we run the following tools in each management cluster as part of our logging platform:

- `Grafana Loki` that is accessible through our managed Grafana instance.
- `multi-tenant-proxy`, a proxy component used to handle multi-tenancy for Loki.
- A couple of loggings agents (`Grafana Promtail` and `Grafana Agent`) that run on the management cluster and your workload clusters alike. We currently need two different tools for different purposes.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved
- Promtail is used to retrieve the container and kubernetes audit logs
- Grafana Agent to retrieve the kubernetes events.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

If you want to play with Loki, you should definitely check out our guides explaining [how to access Grafana]({{< relref "/vintage/getting-started/observability/visualization/access" >}}) and how to [explore logs with LogQL]({{< relref "/vintage/getting-started/observability/visualization/log-exploration" >}})

## Logs stored by Giant Swarm

Kubernetes clusters produces a vast amount of logs, whether they come from machines or containers.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

The loggings agents that we have deployed on both management and workload clusters currently send the following logs to Loki:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The loggings agents that we have deployed on both management and workload clusters currently send the following logs to Loki:
The logging agents that we have deployed on both management and workload clusters currently send the following logs to Loki:

QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

- `Kubernetes container logs` in the `kube-system` and `giantswarm` namespaces.
- `Kubernetes events` happenning in the `kube-system` and `giantswarm` namespaces.
- [`Kubernetes audit logs`]({{< relref "./audit-logs#kubernetes-audit-logs" >}})
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

In the future, we will also store the following logs:

- [`Machine audit logs`]({{< relref "./audit-logs#machine-audit-logs" >}})
- `Teleport audit logs` currently being worked on here: https://github.com/giantswarm/roadmap/issues/3250
- Giant Swarm `customer workload logs` as part of our observability platform being worked on https://github.com/giantswarm/roadmap/issues/2771
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

## Why we prefer Loki over its competitors

The reasons that lead us to using Grafana Loki in favor of its competitors (which boils down to only opendistro in our case) are numerous.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

First, we are **strong believers in Open Source** so the full Elastic stack is obviously out of the question.

Second, we are quite used to the Grafana ecosystem and the different tools are made to work with one another whereas the existing logging solutions are either supposed to work on their own (like OpenDistro) or need to use the full-fledged solution (i.e. being able to **collect and correlate all observability data**) which is rarely open-source (coming back to the first point above).
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

Third, we are full-fledged users of Prometheus and PromQL and **LogQL, the Loki Query Language is a natural extension to PromQL**, which makes it easy for our platform engineers to use and love.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

The fourth reason is **cost and resource consumption** as Loki is cheaper to run than its competitors because it does not rely that much on persistent storage and uses Object storage instead which is always cheaper in the cloud. The storage of the index is also cheaper for Loki as it uses label-based indexing which is definitely smaller than any kind of text-based indexing solution that full-text search engine can provide.
QuentinBisson marked this conversation as resolved.
Show resolved Hide resolved

Finally, the last reason comes down to the history of Giant Swarm and it mostly boils down to **operation and maintenance**. Before we decided to run Loki, we used to run elasticsearch as our logging solution. Elasticsearch in itself is really hard to operate, especially at scale, even more so on Kubernetes because it is by its nature a stateful application (and for good reasons). This was an especially important factor in our decision since we do not need the full capabilities of OpenDistro like full-text search.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@@ -1,23 +1,23 @@
---
linkTitle: Audit logs
title: Audit logs
description: A guide explaining how to interact with audit logs on Giant Swarm clusters.
weight: 50
menu:
main:
identifier: getting-started-observability-logs-auditlogging
identifier: getting-started-observability-logging-auditlogging
parent: getting-started-observability-logging
user_questions:
- What are audit logs?
- What is audit logging?
- How can I access Kubernetes audit logs?
aliases:
- /getting-started/observability/logging/audit-logs
- /ui-api/observability/logs/audit-logging
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
last_review_date: 2024-02-28
last_review_date: 2024-03-21
---

Check warning on line 20 in src/content/vintage/getting-started/observability/logging/audit-logs/index.md

View workflow job for this annotation

GitHub Actions / Front matter problems

Found 1 less severe problems

WARN - The page should have a last_review_date

In this document you will learn what are audit logs, which kind is available on Giant Swarm clusters and how to access / ship them to a remote location.

Expand Down
@@ -1,24 +1,24 @@
---
linkTitle: Observability
title: Observability Features
description: Overview of the observability related platform features to help you operate and improve your platform and applications.
weight: 70
menu:
main:
parent: platform-overview
identifier: platform-overview-observability
aliases:
- /platform-overview/observability
- /developer-platform/observability/
- /app-platform/apps/observability/
owner:
- https://github.com/orgs/giantswarm/teams/team-atlas
user_questions:
- What app do you recommend for monitoring?
- What app do you recommend for logging?
- What app do you recommend for tracing?
last_review_date: 2024-02-28
last_review_date: 2024-03-21
---

Check warning on line 21 in src/content/vintage/platform-overview/observability/_index.md

View workflow job for this annotation

GitHub Actions / Front matter problems

Found 1 less severe problems

WARN - The page should have a last_review_date

Observability is based on four main data sources: __logs__, __metrics__, __traces__ and __profiles__. To cover these needs, Giant Swarm provides its customers with a fully managed observability platform supported by 24//7 monitoring and alerting. The tools included in the observabily platform are listed below.

Expand Down