Catch and display service errors #249

gardar · 2023-11-20T14:31:09Z

When the services fail due to misconfiguration we see the service failure but it doesn't display any useful error message, this makes it difficult to realize what's causing the issue in the CI.

It's debatable if this should be added to role runs outside of molecule as this issue usually doesn't come up outside of development.

I already added this to testinfra 083ff4e but that doesn't catch service errors which occur in ansible tasks.

I see two possible options:

Turn on debug logging in molecule
This would not be the preferred option as enabling debug output for all tasks and make the already long CI logs even longer and most likely slow them down as well.
Catch the error in a block.
This would add extra tasks in many places of the playbook and could be a bit messy, especially if we only want to capture the errors when running the molecule tests.

Related: #213 (comment)

The text was updated successfully, but these errors were encountered:

wookietreiber · 2023-11-21T12:47:14Z

Can be helpful setting ANSIBLE_DIFF_ALWAYS=True in CI. With that, you can see the content of the service file, but otherwise it can be extremely spammy 😅

gardar · 2023-11-21T13:15:13Z

Oh interesting, that seems like the ideal solution, thanks! I've hardly ever used that option so it didn't cross my mind for this use case.

wookietreiber · 2023-11-23T08:17:53Z

@gardar It hugely depends on the quality of the diff mode. For #213 it helps a little, because you can see the content of the service file, but you still do not see what the service complains about.

It might be useful for CI to add additional tasks at the end to run systemctl status prometheus.service and/or journalctl -xeu prometheus.service. I don't know how to implement this with molecule, maybe with post_tasks or something like that which runs unconditionally at the end in the container.

wookietreiber · 2023-12-15T10:47:28Z

@gardar What I ended up using for #213 was this (at the end of each respective tasks/main.yml):

# currently last task 

- name: Ensure blackbox_exporter service is started and enabled
  ansible.builtin.systemd:
    ...

# my uncommitted additions to show the problems

- command: systemctl status blackbox_exporter.service
  changed_when: no
  failed_when: no
  register: status

- debug: var=status.stdout_lines

- command: journalctl -x -u blackbox_exporter.service
  changed_when: no
  failed_when: no
  register: journal

- debug: var=journal.stdout_lines

This could be added to each role setting up a systemd service and optionally to only run in CI. I decided against adding this to #213.

As for the diff mode, I wouldn't generally recommend using ANSIBLE_DIFF_ALWAYS=True in CI, as mentioned before, but we could enforce diff mode for the service template task by adding diff: yes.

I could draft a PR for both of these changes for all roles.

gardar · 2023-12-15T21:13:10Z

Yes these tasks are along the lines of what I had in mind, my problem with them is just that it doesn't feel natural including tasks like that in all the roles.

Ideally I would want something that we could have just when running the roles with molecule / in the CI.

I don't think it's necessary to display the output of failed services for users outside of what Ansible provides by default.
The chances of the user breaking the services are hopefully lower than the services breaking during development as the CI tests should catch most of that, and if the service breaks for the user he can probably just jump onto the server and look at the service output where as we can't diagnose anything other than the logs from the CI output and will have to run the CI again to debug which can take a log time.

wookietreiber · 2023-12-16T07:55:13Z

What I do in some of my roles to limit tasks to CI / container is this:

# vars/main.yml
__locale_virtualized: >-
  {{
    ansible_facts.virtualization_role == "guest"
    and
    ansible_facts.virtualization_type == "docker"
  }}

# tasks/main.yml
- name: update package cache
  ansible.builtin.package:
    update_cache: yes
  become: yes
  when:
    - __locale_virtualized

This particular example is for containers that require package cache updates before you can use the package manager.

Taking a more general approach, there must be some GitHub workflow or molecule specific environment variables (or even define your own __PROMETHEUS_CI in the workflow vars) you could check on instead of just determining if you're running in a container. Then, you can put the service check commands from above inside a block and conditionally run that block based on these vars.

gardar · 2023-12-16T15:21:44Z

Taking a more general approach, there must be some GitHub workflow or molecule specific environment variables (or even define your own __PROMETHEUS_CI in the workflow vars) you could check on instead of just determining if you're running in a container. Then, you can put the service check commands from above inside a block and conditionally run that block based on these vars.

Yes indeed, there are Ansible tags that molecule looks for that could be used as well as several env vars that both molecule and the GitHub actions set at runtime, and in fact we are using the CI vars already here 82d0c73

I wish we could do something similar to that on a task level to get the service output rather than adding extra tasks. Unfortunately simply increasing the verbosity does not help as it won't print the service log output 😞

wookietreiber · 2023-12-18T07:53:10Z

What's the harm in adding the extra tasks? We may even be able to DRY it up by using a role for these tasks with include_role.

gardar added enhancement New feature or request help wanted Extra attention is needed labels Nov 20, 2023

wookietreiber mentioned this issue Dec 18, 2023

enhancement: add service debug role to catch and display service errors #265

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch and display service errors #249

Catch and display service errors #249

gardar commented Nov 20, 2023

wookietreiber commented Nov 21, 2023

gardar commented Nov 21, 2023

wookietreiber commented Nov 23, 2023 •

edited

wookietreiber commented Dec 15, 2023

gardar commented Dec 15, 2023

wookietreiber commented Dec 16, 2023

gardar commented Dec 16, 2023

wookietreiber commented Dec 18, 2023

Catch and display service errors #249

Catch and display service errors #249

Comments

gardar commented Nov 20, 2023

wookietreiber commented Nov 21, 2023

gardar commented Nov 21, 2023

wookietreiber commented Nov 23, 2023 • edited

wookietreiber commented Dec 15, 2023

gardar commented Dec 15, 2023

wookietreiber commented Dec 16, 2023

gardar commented Dec 16, 2023

wookietreiber commented Dec 18, 2023

wookietreiber commented Nov 23, 2023 •

edited