Evaluate test frameworks #253

bobcatfish · 2018-02-27T23:25:39Z

Our initial conformance test was implemented using Ginkgo, however we have had feedback from the k8s team that choosing to use Ginkgo for their conformance tests was the wrong decision and they'd rather they not use it.

This task is to evaluate options for test frameworks before we commit to one for conformance tests going forward.

sclevine · 2018-02-28T00:44:58Z

If we're looking for a BDD-style testing library, I built spec to address some of the common criticisms of Ginkgo. I'm using it to test my port of the CF buildpacks to Build CRD.

bobcatfish · 2018-03-01T15:07:44Z

Thanks @sclevine ! I'll try it out :D

bobcatfish · 2018-04-05T00:11:43Z

I've gathered a list of test frameworks I want to compare. I'll try to narrow down the list a little, then I'll take our existing conformance tests, rewrite them using the framework, and compare.

I'll also see what it looks like to not use a framework at all, this article was very compelling re. not using any frameworks or even assertion libs, and I find myself agreeing with this article as well:

Go has a perfectly good testing framework built in. Frameworks are also one more barrier to entry for other developers contributing to your code.

These are the frameworks/libs I've found that I want to look into, if anyone knows of any other good candidates I should add to the list please let me know ASAP (note I've ruled out any projects that haven't been active in the last 6 months):

BDD:

Other test frameworks/libs:

bobcatfish · 2018-04-05T23:55:16Z

Here is some feedback from the k8s team (@thockin and @ixdy) on troubles they have run into with Ginkgo:

Ginkgo is a large dependency vs the value it provides
The control flow in Ginkgo tests is different from most Go code and can be hard to follow, especially when new to it (e.g. invoking/instantiating test via var _ = , BeforeSuite, AfterSuite, Context, etc.)
Failures are hard to debug
- For example Expect(err).NotTo(HaveOccurred()) (which is everywhere in Ginkgo based tests) doesn’t provide enough context to debug compared to more thoughtful assertions (e.g. this error message from the k8s test framework vs. this error message from the Ginkgo lib)
- When combined with a bug in Ginkgo itself this can lead to agonizing investigationes (#24620 is the latest in a series of investigations which all tripped on a bug in how Ginkgo behaves when parallel tests hang)
The Ginkgo maintainers sometimes do not respond to issues for years (for example this issue was open for a year before anyone replied)
The Ginkgo directives (e.g. It, By, Describe, Context):
- Are often misunderstood and misused, e.g. test names which don’t follow BDD best practices. For example:
  - [sig-node] Dockershim [Serial] [Disruptive] [Feature:Docker] When all containers in pod are missing should complete pod sandbox clean up based on the information in sandbox checkpoint
  - [k8s.io] Security Context when creating containers with AllowPrivilegeEscalation should allow privilege escalation when not explicitly set and uid != 0
  - [k8s.io] DynamicKubeletConfiguration [Feature:DynamicKubeletConfig] [Serial] [Disruptive] When a remote config becomes the new last-known-good, and then the Kubelet is updated to use a new, bad config the Kubelet should report a status and configz indicating that it rolled back to the new last-known-good
- The resulting test names do not display well
- The tags in the test names are hacks to work around the fact that Ginkgo does not provide any other way to easily filter and run tests, leading to running tests with args such as --ginkgo.focus=\\[Feature:Federation\\] --ginkgo.skip=\\[Slow\\]|\\[Serial\\]|\\[Disruptive\\]|\\[Flaky\\]|\\[NoCluster\\]

See also discussion in kubernetes/kubernetes#3837 and kubernetes/kubernetes#3131.

dprotaso · 2018-04-09T14:10:51Z

This is great feedback - I've relayed it to some of our k8s & ginkgo contributors to see if they can prioritize some of the issues.

This also helped tease out some criteria that we can apply to the other testing frameworks

the ability to filter and run specific tests
test suite randomization
parallel test execution
custom reporting & readability of the test output

My comments/questions on the other points:

The control flow in Ginkgo tests is different from most Go code

Given this control flow Ginkgo hooks into a single go test func (ie. func TestXX(t *testing.T) {...}) which has the side effect of not working with subtests which were introduce in Go 1.7.

For example Expect(err).NotTo(HaveOccurred()) (which is everywhere in Ginkgo based tests) doesn’t provide enough context to debug compared to more thoughtful assertions

Both examples provided ultimately call Expect(err).NotTo(HaveOccurred()). What's providing more context is the error being matched. Curious does the k8s framework produce an error with more context automatically?

e.g. test names which don’t follow BDD best practices ... The resulting test names do not display well

I'm curious if this a side-effect of BDD - would following best practices lead to resulting test names being digestible at a quick glance?

how ginkgo behaves when parallel tests hang
...
the tags in the test names are hacks

Hopefully I can poke the right people to get these prioritized. In the Cloud Foundry world they've utilized a config & some ginkgo skip logic to help with focusing/selecting tests. See https://github.com/cloudfoundry/cf-acceptance-tests. This config approach helps remove tags out of test names/descriptions.

ixdy · 2018-04-12T20:50:46Z

One additional useful thing ginkgo provides (loosely mentioned above) is test output (junit XML) that's easily consumable by CI systems (e.g. Jenkins or Prow).
Though it's also worth noting: go 1.10 has a -json output, though I don't think anything has integrated with it yet, and bazel natively produces junit, too.

The other aspect of "weird code flow" is that everything is created at init time with the var _ = pattern.
Some folks are also displeased with having to import github.com/onsi/ginkgo and github.com/onsi/gomega into the top-level namespace.

The Expect(err).NotTo(HaveOccurred()) is probably my largest pet peeve, mostly in that it's too easy for this to be a meaningless error.

Part of the issue is that I can't ever remember which line in the error is the one I care about; e.g. for

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/monitoring/stackdriver.go:68
Expected error:
    <*errors.errorString | 0xc42009b160>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/monitoring/stackdriver.go:115

do I look at line 68 or 115?

Kubernetes additionally makes it hard on ourselves because we use k8s.io/apimachinery/pkg/util/wait approximately everywhere and it's the source of that useless timed out waiting for the condition error message.

I think the test spec names end up bad for a few reasons:

some developers don't know they're supposed to be readable phrases
we end up composing them from disparate code locations
the tags end up interfering with the readability

FWIW we proposed adding more structure to tests in onsi/ginkgo#144, but it was decided that it was too hard to do that in a backwards-compatible way, so we ended up with the tag hack.

bobcatfish · 2018-04-12T20:59:01Z

Thanks for addressing these issues @dprotaso ! I'm glad to hear that these criteria will help make Ginkgo better. After looking at a lot of Go test frameworks, I'm really happy with how mature and usable Ginkgo is, especially the great documentation (which it turns out is super rare).

Expect(err)

Both examples provided ultimately call Expect(err).NotTo(HaveOccurred()). What's providing more context is the error being matched. Curious does the k8s framework produce an error with more context automatically?

If instead of calling Expect(err).NotTo(HaveOccurred()) the user had to use t.Errorf or t.Fatalf, they (hopefully) would write an actionable error message. With the k8s libs we frequently poll resources until they get into a desired state, which often fails by timing out. The difference in errors would be something like:

With Expect(err).NotTo(HaveOccurred()):

Code:

	By("The Revision will be updated when it is ready to serve traffic")
	err := test.WaitForRevisionState(clients.Revisions, revisionName, test.IsRevisionReady(revisionName))
	Expect(err).NotTo(HaveOccurred())

Error:


    Expected error:
        <*errors.errorString | 0xc4200c76c0>: {
            s: "timed out waiting for the condition",
        }
        timed out waiting for the condition
    not to have occurred

    /usr/local/google/home/christiewilson/Code/go/src/github.com/elafros/elafros/test/conformance/route_test.go:99

With t.Errorf/t.Fatalf:

Code:

	log.Println("The Revision will be marked as Ready when it can serve traffic")
	err := test.WaitForRevisionState(clients.Revisions, revisionName, test.IsRevisionReady(revisionName))
	if err != nil {
		t.Fatalf("Revision %s did not become ready to serve traffic: %v", revisionName, err)
	}

Error:

2018/04/12 13:49:31 The Revision will be marked as Ready when it can serve traffic
--- FAIL: TestRouteCreation (120.95s)
	route_test.go:80: Revision prod-00001 did not become ready to serve traffic: timed out waiting for the condition
FAIL

I think the point @jgrafton and co. are making is that in the Expect(err).NotTo(HaveOccurred()) case, it is still possible to debug the error, but the first error you are presented with is missing context (since the timeout could have theoretically happened at any point in the test).

BDD

I'm curious if this a side-effect of BDD - would following best practices lead to resulting test names being digestible at a quick glance?

You might be right! That would mean this is (maybe) one more instance of the maintainers and contributors failing to follow BDD best practices.

This is evidence that following BDD best practices is a challenge, meaning:

Potential contributors will have the additional barrier of needing to come up to speed on BDD
Maintainers will need to enforce those standards.

This is doable but I think we need strong evidence in support of BDD to choose this route.

The main benefit I can see for these tests is BDD gives us meaningful "step" output as the tests run, and I don't think that's a strong enough reason to commit to BDD.

Prototypes

Follow up, I've created prototypes with various frameworks/libs:

Godog - A Go implementation of cucumber
Ginkgo - New and improved
spec - A library (not a framework!) for BDD
testify - A popular Go test library (not BDD)
No framework - aka just the testing lib

I decided not to prototype these frameworks/libs:

goblin - Seems to be almost identical to Ginkgo, but with less features.
goconvey - The main feature seems to be a web ui which we don’t need, the maintainers don’t seem to be active anymore and it doesn’t seem to offer anything we need that Ginkgo doesn’t offer.
gogiven - The main feature seems to be generating output such as this (the inverse of Godog), which we don’t need.
endly - Has a far wider scope of features than what we need, e.g. source code downloading, DB initialization, Selenium testing

(If I had to describe my ideal framework/lib for these tests, what I'd really find useful is a way of describing the state transitions we expect custom resources to experience and assert against that.)

My conclusions are that:

Ginkgo is the strongest of the BDD frameworks in Golang
spec is a great way to mix BDD tests in with other tests (since it's a library and not a framework)
We don't have a strong reason to use BDD.

∴ I think we should move forward with "no framework" aka using the testing lib.

josephburnett · 2018-04-19T14:31:58Z

I think we should move forward with "no framework" aka using the testing lib

+1.

…

On Thu, Apr 12, 2018 at 1:59 PM Christie Wilson ***@***.***> wrote: Thanks for addressing these issues @dprotaso <https://github.com/dprotaso> ! I'm glad to hear that these criteria will help make Ginkgo better. After looking at a lot of Go test frameworks, I'm really happy with how mature and usable Ginkgo is, especially the great documentation (which it turns out is super rare). Expect(err) Both examples provided ultimately call Expect(err).NotTo(HaveOccurred()). What's providing more context is the error being matched. Curious does the k8s framework produce an error with more context automatically? If instead of calling Expect(err).NotTo(HaveOccurred()) the user had to use t.Errorf or t.Fatalf, they (hopefully) would write an actionable error message. With the k8s libs we frequently poll resources until they get into a desired state, which often fails by timing out. The difference in errors would be something like: With Expect(err).NotTo(HaveOccurred()): Code: By("The Revision will be updated when it is ready to serve traffic") err := test.WaitForRevisionState(clients.Revisions, revisionName, test.IsRevisionReady(revisionName)) Expect(err).NotTo(HaveOccurred()) Error: Expected error: <*errors.errorString | 0xc4200c76c0>: { s: "timed out waiting for the condition", } timed out waiting for the condition not to have occurred /usr/local/google/home/christiewilson/Code/go/src/github.com/elafros/elafros/test/conformance/route_test.go:99 With t.Errorf/t.Fatalf: Code: log.Println("The Revision will be marked as Ready when it can serve traffic") err := test.WaitForRevisionState(clients.Revisions, revisionName, test.IsRevisionReady(revisionName)) if err != nil { t.Fatalf("Revision %s did not become ready to serve traffic: %v", revisionName, err) } Error: 2018/04/12 13:49:31 The Revision will be marked as Ready when it can serve traffic --- FAIL: TestRouteCreation (120.95s) route_test.go:80: Revision prod-00001 did not become ready to serve traffic: timed out waiting for the condition FAIL I think the point @jgrafton <https://github.com/jgrafton> and co. are making is that in the Expect(err).NotTo(HaveOccurred()) case, it is still *possible* to debug the error, but the first error you are presented with is missing context (since the timeout could have theoretically happened at *any* point in the test). BDD I'm curious if this a side-effect of BDD - would following best practices lead to resulting test names being digestible at a quick glance? You might be right! That would mean this is (maybe) one more instance of the maintainers and contributors failing to follow BDD best practices. This is evidence that following BDD best practices is a challenge, meaning: 1. Potential contributors will have the additional barrier of needing to come up to speed on BDD 2. Maintainers will need to enforce those standards. This is doable but I think we need strong evidence in support of BDD to choose this route. The main benefit I can see for these tests is BDD gives us meaningful "step" output as the tests run, and I don't think that's a strong enough reason to commit to BDD. Prototypes Follow up, I've created prototypes with various frameworks/libs: - Godog <https://github.com/bobcatfish/elafros/tree/test_framework_prototypes/test/godog> - A Go implementation of cucumber - Ginkgo <https://github.com/bobcatfish/elafros/tree/test_framework_prototypes/test/conformance> - New and improved - spec <https://github.com/bobcatfish/elafros/tree/test_framework_prototypes/test/spec> - A library (not a framework!) for BDD - testify <https://github.com/bobcatfish/elafros/tree/test_framework_prototypes/test/testify> - A popular Go test library (not BDD) - No framework <https://github.com/bobcatfish/elafros/tree/test_framework_prototypes/test/none> - aka just the testing lib I decided not to prototype these frameworks/libs: - goblin - Seems to be almost identical to Ginkgo, but with less features. - goconvey - The main feature seems to be a web ui which we don’t need, the maintainers don’t seem to be active anymore and it doesn’t seem to offer anything we need that Ginkgo doesn’t offer. - gogiven - The main feature seems to be generating output such as this (the inverse of Godog), which we don’t need. - endly - Has a far wider scope of features than what we need, e.g. source code downloading, DB initialization, Selenium testing (If I had to describe my ideal framework/lib for these tests, what I'd really find useful is a way of describing the state transitions we expect custom resources to experience and assert against that.) My conclusions are that: 1. Ginkgo is the strongest of the BDD frameworks in Golang 2. spec is a great way to mix BDD tests in with other tests (since it's a library and not a framework) 3. We don't have a strong reason to use BDD. ∴ I think we should move forward with "no framework" aka using the testing lib. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#253 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDXDX3Ke6RTobwY0DtU2wQy9cyXU16Hks5tn8AWgaJpZM4SV0UU> .

dprotaso · 2018-04-27T15:59:02Z

@bobcatfish @ixdy FYI There's some work being done on ginkgo to help with debugging the hanging tests
See: onsi/ginkgo#461 (comment)

As explored in the prototypes in #253, we have decided that although Ginkgo is the best of the Go test frameworks, we do not want to be tied to BDD and feel these tests will be cleaner and easier to contribute to if we use no framework. I have also refactored the supporting logic of the tests into a `test` library which can be reused for other e2e tests, documented in test/adding_tests.md * Improved docs on running tests: now all docs on running are in test/README.md, there is only one place to look to see how to run all of the tests. Instructions on how to run conformance tests are hopefully more clear. * Updated links that pointed at CONTRIBUTING.md since the content moved Fixes #253

Fix create-release-branch script.

…-pick-251-to-release-v1.8 [release-v1.8] Delete the created golang cache directory during docker build

bobcatfish added this to the M3 milestone Feb 27, 2018

bobcatfish self-assigned this Feb 27, 2018

mattmoor mentioned this issue Mar 15, 2018

Set up a better integration testing harness. knative/build#2

Closed

bobcatfish modified the milestones: M3, M4 Mar 15, 2018

mattmoor added area/build Build topics specifically related to Knative area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels Mar 19, 2018

bobcatfish mentioned this issue Apr 29, 2018

No test framework #778

Merged

bobcatfish closed this as completed in #778 May 1, 2018

glyn mentioned this issue Jul 25, 2018

Use table tests for some of the revision scaler tests #1677

Closed

shashwathi mentioned this issue Jan 22, 2019

wip: example of using gotest.tools tektoncd/pipeline#409

Closed

markusthoemmes pushed a commit to markusthoemmes/knative-serving that referenced this issue Oct 9, 2019

Merge pull request knative#253 from openshift/fix-script

503b443

Fix create-release-branch script.

ReToCode added a commit to ReToCode/serving that referenced this issue Apr 5, 2023

Merge pull request knative#253 from openshift-cherrypick-robot/cherry…

d5696b6

…-pick-251-to-release-v1.8 [release-v1.8] Delete the created golang cache directory during docker build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate test frameworks #253

Evaluate test frameworks #253

bobcatfish commented Feb 27, 2018

sclevine commented Feb 28, 2018 •

edited

bobcatfish commented Mar 1, 2018

bobcatfish commented Apr 5, 2018

bobcatfish commented Apr 5, 2018

dprotaso commented Apr 9, 2018

ixdy commented Apr 12, 2018 •

edited

bobcatfish commented Apr 12, 2018

josephburnett commented Apr 19, 2018 via email •

edited by bobcatfish

dprotaso commented Apr 27, 2018

Evaluate test frameworks #253

Evaluate test frameworks #253

Comments

bobcatfish commented Feb 27, 2018

sclevine commented Feb 28, 2018 • edited

bobcatfish commented Mar 1, 2018

bobcatfish commented Apr 5, 2018

bobcatfish commented Apr 5, 2018

dprotaso commented Apr 9, 2018

ixdy commented Apr 12, 2018 • edited

bobcatfish commented Apr 12, 2018

Expect(err)

BDD

Prototypes

josephburnett commented Apr 19, 2018 via email • edited by bobcatfish

dprotaso commented Apr 27, 2018

sclevine commented Feb 28, 2018 •

edited

ixdy commented Apr 12, 2018 •

edited

josephburnett commented Apr 19, 2018 via email •

edited by bobcatfish