Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[manala.telegraf] service unable to start during initial provisioning #650

Open
lisuml opened this issue Jan 17, 2023 · 5 comments
Open

Comments

@lisuml
Copy link
Contributor

lisuml commented Jan 17, 2023

manala.roles version: 3.2.0

During an initial provisioning of the node with manala.telegraf role attached, the service is not being started properly:

TASK [manala.roles.telegraf : Configs > Templates present] ****************************************************************************************************************************************************************************************************
changed: [d-test.euc1.XXX.lan] => (item={'state': 'present', 'template': 'configs/_default.j2', 'file': '/etc/telegraf/telegraf.d/os.conf', 'config': '[[inputs.cpu]]\n  totalcpu = true\n[[inputs.disk]]\n  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]\n[[inputs.diskio]]\n[[inputs.kernel]]\n[[inputs.mem]]\n[[inputs.net]]\n[[inputs.netstat]]\n[[inputs.processes]]\n[[inputs.system]]\n'})
changed: [d-test.euc1.XXX.lan] => (item={'state': 'present', 'template': 'configs/_default.j2', 'file': '/etc/telegraf/telegraf.d/output.conf', 'config': '[[outputs.influxdb]]\n  urls = [ "udp://metrix.euc1.XXX.lan:8089" ]\n  udp_payload = "1024B"\n'})

TASK [manala.roles.telegraf : Configs > Files absent] *********************************************************************************************************************************************************************************************************

TASK [manala.roles.telegraf : Services > Services] ************************************************************************************************************************************************************************************************************
failed: [d-test.euc1.XXX.lan] (item=telegraf) => {"ansible_loop_var": "item", "changed": false, "item": "telegraf", "msg": "Unable to start service telegraf: Job for telegraf.service failed because the control process exited with error code.\nSee \"systemctl status telegraf.service\" and \"journalctl -xe\" for details.\n"}

As you can see, the configs are defined properly, but it seems they are not ready on service start.
The error I see in systemd:

Jan 17 13:40:15 d-test.euc1.XXX.lan telegraf[8968]: 2023-01-17T13:40:15Z E! [telegraf] Error running agent: no outputs found, did you provide a valid config file?
Jan 17 13:40:15 d-test.euc1.XXX.lan systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE

During the 2nd provisioning attempt, the error is gone and the service starts normally.

@lisuml
Copy link
Contributor Author

lisuml commented Jan 17, 2023

More investigation made and it seems the issue is only present with telegraf 1.25.0 (most recent one at the moment).

The issue is caused by the fact, the official debian packages provided by influxdata automatically try to start the telegraf systemd service on installation time and the working configuration for the outputs is expected to be part of the config file at that time, but the outputs configuration is not there.

This looks like a bug of telegraf itself or/and telegraf official debian packages. I'm going to file an github issue on the official telegraf repository.

For me, the workaround was simply to pick lower version of the telegraf to install from ansible playbook:

manala_telegraf_install_packages_default:
      - telegraf=1.24.4-1

@nervo
Copy link
Member

nervo commented Jan 17, 2023

@lisuml we ran on the same issue on v1.25.0 and fixed our tests like that #642

Would you provide all your values passed to the role ?

btw, use manala_telegraf_install_packages instead of manala_telegraf_install_packages_default:)

@lisuml
Copy link
Contributor Author

lisuml commented Jan 17, 2023

@nervo: thanks for the followup!

Would you provide all your values passed to the role ?

These are my ansible variables:

    manala_apt_preferences:
      - influxdb@influxdata
    manala_telegraf_install_packages:
      - telegraf=1.24.4-1
    manala_telegraf_config_template: config/telegraf/base/telegraf.conf.j2
    manala_telegraf_config:
      global_tags:
        environment: "{{ env }}"
    manala_telegraf_configs:
      - file: os.conf
        config: |
          [[inputs.cpu]]
            totalcpu = true
          [[inputs.disk]]
            ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
          [[inputs.diskio]]
          [[inputs.kernel]]
          [[inputs.mem]]
          [[inputs.net]]
          [[inputs.netstat]]
          [[inputs.processes]]
          [[inputs.system]]
      - file: output.conf
        config: |
          [[outputs.influxdb]]
            urls = [ "udp://metrix.euc1.XXX.lan:8089" ]
            udp_payload = "1024B"

use manala_telegraf_install_packages instead of manala_telegraf_install_packages_default

Roger that.

FYI: I created an issue in telegraf github repo: influxdata/telegraf#12514

@nervo
Copy link
Member

nervo commented Jan 17, 2023

Ok, so let's wait for the next telegraf version :)

(btw, you should also use explicit telegraf apt preference)

        manala_apt_preferences:
          - telegraf@influxdata

@lisuml
Copy link
Contributor Author

lisuml commented Jan 17, 2023

(btw, you should also use explicit telegraf apt preference)

My bad. Thanks for pointing this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants