Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch and display service errors #249

Open
gardar opened this issue Nov 20, 2023 · 8 comments
Open

Catch and display service errors #249

gardar opened this issue Nov 20, 2023 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gardar
Copy link
Member

gardar commented Nov 20, 2023

When the services fail due to misconfiguration we see the service failure but it doesn't display any useful error message, this makes it difficult to realize what's causing the issue in the CI.

It's debatable if this should be added to role runs outside of molecule as this issue usually doesn't come up outside of development.

I already added this to testinfra 083ff4e but that doesn't catch service errors which occur in ansible tasks.

I see two possible options:

  • Turn on debug logging in molecule
    This would not be the preferred option as enabling debug output for all tasks and make the already long CI logs even longer and most likely slow them down as well.

  • Catch the error in a block.
    This would add extra tasks in many places of the playbook and could be a bit messy, especially if we only want to capture the errors when running the molecule tests.

Related: #213 (comment)

@gardar gardar added enhancement New feature or request help wanted Extra attention is needed labels Nov 20, 2023
@wookietreiber
Copy link
Contributor

Can be helpful setting ANSIBLE_DIFF_ALWAYS=True in CI. With that, you can see the content of the service file, but otherwise it can be extremely spammy 😅

@gardar
Copy link
Member Author

gardar commented Nov 21, 2023

Oh interesting, that seems like the ideal solution, thanks! I've hardly ever used that option so it didn't cross my mind for this use case.

@wookietreiber
Copy link
Contributor

wookietreiber commented Nov 23, 2023

@gardar It hugely depends on the quality of the diff mode. For #213 it helps a little, because you can see the content of the service file, but you still do not see what the service complains about.

It might be useful for CI to add additional tasks at the end to run systemctl status prometheus.service and/or journalctl -xeu prometheus.service. I don't know how to implement this with molecule, maybe with post_tasks or something like that which runs unconditionally at the end in the container.

@wookietreiber
Copy link
Contributor

@gardar What I ended up using for #213 was this (at the end of each respective tasks/main.yml):

# currently last task 

- name: Ensure blackbox_exporter service is started and enabled
  ansible.builtin.systemd:
    ...

# my uncommitted additions to show the problems

- command: systemctl status blackbox_exporter.service
  changed_when: no
  failed_when: no
  register: status

- debug: var=status.stdout_lines

- command: journalctl -x -u blackbox_exporter.service
  changed_when: no
  failed_when: no
  register: journal

- debug: var=journal.stdout_lines

This could be added to each role setting up a systemd service and optionally to only run in CI. I decided against adding this to #213.

As for the diff mode, I wouldn't generally recommend using ANSIBLE_DIFF_ALWAYS=True in CI, as mentioned before, but we could enforce diff mode for the service template task by adding diff: yes.

I could draft a PR for both of these changes for all roles.

@gardar
Copy link
Member Author

gardar commented Dec 15, 2023

Yes these tasks are along the lines of what I had in mind, my problem with them is just that it doesn't feel natural including tasks like that in all the roles.

Ideally I would want something that we could have just when running the roles with molecule / in the CI.

I don't think it's necessary to display the output of failed services for users outside of what Ansible provides by default.
The chances of the user breaking the services are hopefully lower than the services breaking during development as the CI tests should catch most of that, and if the service breaks for the user he can probably just jump onto the server and look at the service output where as we can't diagnose anything other than the logs from the CI output and will have to run the CI again to debug which can take a log time.

@wookietreiber
Copy link
Contributor

What I do in some of my roles to limit tasks to CI / container is this:

# vars/main.yml
__locale_virtualized: >-
  {{
    ansible_facts.virtualization_role == "guest"
    and
    ansible_facts.virtualization_type == "docker"
  }}
# tasks/main.yml
- name: update package cache
  ansible.builtin.package:
    update_cache: yes
  become: yes
  when:
    - __locale_virtualized

This particular example is for containers that require package cache updates before you can use the package manager.


Taking a more general approach, there must be some GitHub workflow or molecule specific environment variables (or even define your own __PROMETHEUS_CI in the workflow vars) you could check on instead of just determining if you're running in a container. Then, you can put the service check commands from above inside a block and conditionally run that block based on these vars.

@gardar
Copy link
Member Author

gardar commented Dec 16, 2023

Taking a more general approach, there must be some GitHub workflow or molecule specific environment variables (or even define your own __PROMETHEUS_CI in the workflow vars) you could check on instead of just determining if you're running in a container. Then, you can put the service check commands from above inside a block and conditionally run that block based on these vars.

Yes indeed, there are Ansible tags that molecule looks for that could be used as well as several env vars that both molecule and the GitHub actions set at runtime, and in fact we are using the CI vars already here 82d0c73

I wish we could do something similar to that on a task level to get the service output rather than adding extra tasks. Unfortunately simply increasing the verbosity does not help as it won't print the service log output 😞

@wookietreiber
Copy link
Contributor

What's the harm in adding the extra tasks? We may even be able to DRY it up by using a role for these tasks with include_role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants