Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Cells not sending logs to syslog drain when posted via https. #39

Open
pusherofbrooms opened this issue Sep 10, 2020 · 8 comments

Comments

@pusherofbrooms
Copy link

pusherofbrooms commented Sep 10, 2020

Since we upgraded from cf-deployment 12.45.0 up to 13.7.0 and up to present 13.10.0, our windows application and cell logs are not arriving at the syslog drains which use https post.

The loggr-syslog-agent-windows logs contain the following suspicious error:
2020/09/08 15:50:21 failed to write to syslogdain.cloud.pcftest.com, retrying in 15s, err: x509: certificate signed by unknown authority. syslogdain.cloud.pcftest.com does in fact have a valid cert signed by Digicert.

I can work around this by setting drain_skip_cert_verify, but that doesn't seem optimal or desired.

Here is how I reproduce the issue:

  1. Deploy an application which can accept posts like https://github.com/pusherofbrooms/syslogdrain (change manifest if you actually use this)
  2. setup the service with cf cups syslogdrain -l "https://MYLOGURL"
  3. setup a windows application like the one found at git@github.com:cloudfoundry-incubator/NET-sample-app.git and bind it to the syslog drain
  4. restart the windows application
  5. observe that no CELL or APP/PROC/WEB logs are sent to the drain application
  6. observe the above err: x509 cert error in the loggr-syslog-agent-windows logs
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@MasslessParticle
Copy link

MasslessParticle commented Sep 10, 2020

Thanks for the report!

Is the syslogdrain.cloud.pcftest.com cert trusted by the golang cert store by default? What if you set the drain_ca_cert property?

drain_ca_cert:
description: The CA certificate for key/cert verification.

Does this error only appear on loggr-syslog-agent-windows or does it also appear on the other syslog agents?

@pusherofbrooms
Copy link
Author

This problem doesn't show up on our diego-cells.

We haven't set any drain_ca_cert. Is this a list, or a single cert? Our customers post drains to arbitrary URL's, so it might be problematic to get the whole set of CA's needed.

I can use powershell tools to get urls without any cert errors. For instance:

wget -UseBasicParsing https://syslogdrain-boisterous-sable-or.cloud.pcftest.com


StatusCode        : 200
StatusDescription : OK
Content           :
RawContent        : HTTP/1.1 200 OK
                    X-Vcap-Request-Id: f181318d-ae80-4a0e-631d-d7a8e5b64f80
                    Connection: keep-alive
                    Content-Length: 0
                    Content-Type: text/html; charset=utf-8
                    Date: Fri, 11 Sep 2020 09:57:18 GMT
                    Server...
Forms             :
Headers           : {[X-Vcap-Request-Id, f181318d-ae80-4a0e-631d-d7a8e5b64f80], [Connection, keep-alive], [Content-Length, 0],
                    [Content-Type, text/html; charset=utf-8]...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        :
RawContentLength  : 0

So it seems like at least the CA's used by windows tools include Digicert.

@pianohacker
Copy link
Contributor

pianohacker commented Oct 13, 2020

@pusherofbrooms Hello! Sorry for this regression.

We believe that this is due to syslog drains being serviced by syslog agents in cf-deployment v13+, rather than syslog adapters. The syslog agents run within the Windows Diego cells, rather than separate Linux VMs, so we're affected by this Golang issue.

Our assumption is that you did not have syslog agents enabled in v12. Is this correct?

The fix for this will be somewhat complex, but we hope to have it released within the month.

@kkburr
Copy link
Contributor

kkburr commented Nov 20, 2020

Before I close this issue, just want to make sure that you are no longer experiencing this issue @pusherofbrooms cc @pianohacker

@pusherofbrooms
Copy link
Author

@pianohacker We had default settings in this area, so in v12, if the default was to not enable syslog agents, that's probably what we had.

@kkburr I don't see that the issue referenced by @pianohacker was closed or resolved. As of the latest version of v13 cf-deployment, we still have the issue. We won't be deploying v15 cf-deployment until January (barring critical security updates).

@acrmp
Copy link
Member

acrmp commented Nov 27, 2021

Hi @pusherofbrooms,

Sorry for the delay in getting back to you regarding this issue.

As mentioned earlier this is due to Go not supporting loading the system cert pool on Windows: golang/go#16736

The good news is that there was recently a commit to Go master which restores the ability for Go to load the system cert pool on Windows. I tested compiling the syslog-agent with gotip and confirmed that the certificate was then trusted:

If this is not backported to earlier versions of Go, this would ship in Go 1.18 (expected in Feb 2022) and be picked up when loggregator-agent-release bumps its version of Go.

I think that it may also be possible with current versions of Go to have the syslog-agent not provide a cert pool on Windows when configuring TLS (and have Go fallback to using the platform support).

As a workaround you can currently set drain_ca_cert on the loggr-syslog-agent-windows job. You can provide it a concatenated series of certificates:
https://pkg.go.dev/crypto/x509#CertPool.AppendCertsFromPEM

Thanks,

Andrew.

@acrmp acrmp moved this from Reviewer Assigned to Review in Progress in DEPRECATED App Platform - Logging and Metrics Nov 27, 2021
@pusherofbrooms
Copy link
Author

Greetings acrmp,
I'll look forward to a February golang bump then. Thanks for the information.

@acrmp acrmp moved this from Review in Progress to Issue - Triage complete. Needs fix. in DEPRECATED App Platform - Logging and Metrics Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for Changes | Open for Contribution
DEPRECATED - WG-Application-Runtime-P...
Old PRs and Issues (pre-project creat...
DEPRECATED App Platform - Logging and...
Issue - Triage complete. Needs fix.
Development

No branches or pull requests

6 participants