Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tyk should have an option to shutdown in case a Go native plugin is not loaded #5319

Open
diegobernardes opened this issue Jul 12, 2023 · 5 comments

Comments

@diegobernardes
Copy link

Is your feature request related to a problem? Please describe.
We're using Go native plugins to implement critical logic that should be executed before hitting any of the internal services. The problem with Go plugins is that they can fail to load but Tyk will simply start without them.

Describe the solution you'd like
An option at the configuration file like this:

{
  "shutdownOnFailedGoNativePluginLoad": true
}

If true and the plugin fails to load, tyk should shutdown.

Describe alternatives you've considered
I've implemented a proxy application that gets executed instead of tyk and then the application starts tyk and parse the logs during the startup. If it founds a log entry with Could not load Go-plugin it kills Tyk and exit. This ensures that Tyk is always executing with the plugin.

Additional context
This is the proxy application that I've implemented.

package main

import (
	"bufio"
	"fmt"
	"os"
	"os/exec"
	"os/signal"
	"regexp"
	"syscall"
	"time"
)

func main() {
	cmd := exec.Command("./tyk")
	cmdReader, err := cmd.StdoutPipe()
	if err != nil {
		fmt.Fprintln(os.Stderr, "failed to create the pip to execute the command: %w", err)
		os.Exit(1)
	}
	cmd.Stderr = cmd.Stdout

	scanner := bufio.NewScanner(cmdReader)
	go func() {
		checkLog := true
		t := time.Now().Add(time.Minute)
		r, err := regexp.Compile("Could not load Go-plugin")
		if err != nil {
			fmt.Fprintln(os.Stderr, "failed to compile the regex: %w", err)
			os.Exit(1)
		}
		for scanner.Scan() {
			fmt.Fprintln(os.Stdout, scanner.Text())

			// This error happens just when tyk is starting so we don't need to analize the logs forever.
			if checkLog {
				if r.Find(scanner.Bytes()) != nil {
					fmt.Fprintln(os.Stderr, "tyk failed to load the plugin")
					os.Exit(1)
				}
				if time.Now().After(t) {
					checkLog = false
				}
			}
		}
	}()

	if err := cmd.Start(); err != nil {
		fmt.Fprintln(os.Stderr, "Error starting Cmd", err)
		os.Exit(1)
	}

	c := make(chan os.Signal, 1)
	signal.Notify(c, os.Interrupt, syscall.SIGTERM)
	<-c
	if err := cmd.Process.Signal(os.Interrupt); err != nil {
		fmt.Fprintln(os.Stderr, "Failed to send a signal to stop the process: %w", err)
		os.Exit(1)
	}

	if err := cmd.Wait(); err != nil {
		fmt.Fprintln(os.Stderr, "Error waiting for Cmd", err)
		os.Exit(1)
	}
}
@andyo-tyk
Copy link
Contributor

Hi @diegobernardes,

Thanks for getting in touch and taking the time to contribute your idea.

In Tyk 4.0.13 and 5.0.1 we introduced a change so that, if a Go Plugin fails to load correctly, Tyk will log an error but will continue to operate normally.

If a call is made to the API where the Go Plugin failed to load, then Tyk will return HTTP 500 and will not proxy the request upstream.

We felt that this was an appropriate behaviour for the Gateway - it protects your upstream service while not impacting your other APIs. By monitoring the error log during API load, you'd spot any incorrectly loaded plugins.

We've also, in v5.0.3, added the same behaviour for JS plugins.

https://github.com/TykTechnologies/tyk/releases/tag/v4.0.13
https://github.com/TykTechnologies/tyk/releases/tag/v5.0.1

I hope this helps - thanks again for supporting Tyk!

@diegobernardes
Copy link
Author

Hi @diegobernardes,

Thanks for getting in touch and taking the time to contribute your idea.

In Tyk 4.0.13 and 5.0.1 we introduced a change so that, if a Go Plugin fails to load correctly, Tyk will log an error but will continue to operate normally.

If a call is made to the API where the Go Plugin failed to load, then Tyk will return HTTP 500 and will not proxy the request upstream.

We felt that this was an appropriate behaviour for the Gateway - it protects your upstream service while not impacting your other APIs. By monitoring the error log during API load, you'd spot any incorrectly loaded plugins.

We've also, in v5.0.3, added the same behaviour for JS plugins.

https://github.com/TykTechnologies/tyk/releases/tag/v4.0.13 https://github.com/TykTechnologies/tyk/releases/tag/v5.0.1

I hope this helps - thanks again for supporting Tyk!

But do you think that the startup check is still an option? You can have Tyk running non stop and update it as the API definitions change from the management console or the k8s operator, but you can also, and that's our case, have a centralized repository containing the API definitions and when anything changes, redeploy Tyk. The problem with the 500 is that we're pushing to runtime something that can be validated, at some cases, during the startup. Another problem with the runtime checks is that once you have enough endpoints in Tyk you'll find that some have way more throughput then others and it starts to get harder to create proper alerts because 1k requests failing on a given endpoint is nothing, but on another could represent a 50% failure rate.

@buger
Copy link
Member

buger commented Aug 5, 2023

If this API with Go plugin is very important for you, then you can change the health check of your application group/pod group to point to that API directly. E.g. if API has not loaded and not respond properly, healthcheck has failed.

@andyo-tyk
Copy link
Contributor

Hi @diegobernardes,

Did you manage to try @buger's suggestion of using the healthcheck?

I'm interested to understand the use case further as to why you prefer to redeploy the Gateway (and tear it down again if a single Plugin fails to load).

Do you find it more efficient to re-deploy Tyk for each change, rather than using Tyk's hot-reload functionality?

@diegobernardes
Copy link
Author

Hi @diegobernardes,

Did you manage to try @buger's suggestion of using the healthcheck?

I'm interested to understand the use case further as to why you prefer to redeploy the Gateway (and tear it down again if a single Plugin fails to load).

Do you find it more efficient to re-deploy Tyk for each change, rather than using Tyk's hot-reload functionality?

Hey, sorry for the late reply, totally missed the notification from this issue. Yes, I changed to what @buger has suggested, but I still think that this should be an option at tyk and not something that we, as clients, need to work around. I've created a virtual endpoint at the Go plugin and mapped it as a health endpoint that returns a 200.

Regards to why we redeploy instead of hot-reload:

  • We use canary releases.
  • No complex build. I do not need to track what got modified or not, new PR merged? It was just the Go plugin? Redeploy everything.
  • We have lots of pre processors that change the API JSON file definitions before they're deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants