Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: gRPC Unavailable errors are ambiguous #17701

Merged
merged 3 commits into from
Aug 17, 2017

Conversation

bdarnell
Copy link
Contributor

This error code is used for fail-fast errors (which can be retried
unambiguously), but it is also used in other cases (such as a server
draining) in which we cannot assume that the previous attempt was not
completed. (It's unclear whether this assumption was once true and
changed or if it's always been incorrect. The specific source of
ambiguous Unavailable errors we're seeing is grpc/grpc-go#1147)

This is expected to increase prevalence of AmbiguousResultErrors; this
will be fixed in a follow-up change.

Fixes #17491

@bdarnell bdarnell requested review from a team August 16, 2017 20:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -73,3 +73,15 @@ variable "controller_root_disk_size" {
variable "controller_root_disk_type" {
default = "pd-standard" # can set this to 'pd-ssd' for persistent SSD
}

# Local path to the cockroach binary. An empty value downloads a
# pre-built binary using cockroach_sha.
Copy link
Contributor

@benesch benesch Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem quite right.

}

# SHA of the cockroach binary to download if cockroach_binary is
# unset. Defaults to the latest master build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

@tamird
Copy link
Contributor

tamird commented Aug 16, 2017

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


build/jepsen/terraform/main.tf, line 184 at r2 (raw file):

      "mkdir -p /tmp/cockroach",
      "[ $(stat --format=%s cockroach) -ne 0 ] || curl -sfSL https://edge-binaries.cockroachdb.com/cockroach/cockroach.linux-gnu-amd64.${var.cockroach_sha} -o cockroach",
      "cp cockroach /tmp/cockroach/",

why not curl it directly to the destination? i'm confused about what goes on in this area of the code.


pkg/kv/dist_sender.go, line 1177 at r3 (raw file):

/ The Unavailable code is used by GRPC to indicate that a
// request fails fast and is not sent, so we can be sure there
// is no ambiguity on these errors. Note that these are common
// if a node is down.

Did you mean to remove this comment?


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/jepsen/terraform/main.tf, line 184 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why not curl it directly to the destination? i'm confused about what goes on in this area of the code.

The destination path is mentioned twice on the preceding line, as well as in the file provisioner above, so it's easier to move/copy the file here than to adjust all of those. All of this is because jepsen expects as input a tarball containing an executable at cockroach/cockroach


build/jepsen/terraform/variables.tf, line 78 at r3 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

This comment doesn't seem quite right.

What's wrong with it?


pkg/kv/dist_sender.go, line 1177 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

/ The Unavailable code is used by GRPC to indicate that a
// request fails fast and is not sent, so we can be sure there
// is no ambiguity on these errors. Note that these are common
// if a node is down.

Did you mean to remove this comment?

Done.


Comments from Reviewable

@benesch
Copy link
Contributor

benesch commented Aug 16, 2017

Review status: 3 of 4 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


build/jepsen/terraform/variables.tf, line 78 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What's wrong with it?

Ah, never mind, I misread. I wish terraform had conditions.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Aug 16, 2017

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


build/jepsen/terraform/main.tf, line 146 at r4 (raw file):

      "curl https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein > /home/ubuntu/lein",
      "chmod +x /home/ubuntu/lein",
      "cd /home/ubuntu && git clone -q https://github.com/cockroachdb/jepsen && cd jepsen && git checkout bdarnell/wip",

was this intentional? none of the commit messages mention it.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions.


build/jepsen/terraform/main.tf, line 146 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

was this intentional? none of the commit messages mention it.

Oops, fixed (the -q was intentional, the branch change was not)


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Aug 16, 2017

Reviewed 2 of 2 files at r5.
Review status: 3 of 4 files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

Copied from the pre-cockroachdb#17457 version of the other terraform configs
This error code is used for fail-fast errors (which can be retried
unambiguously), but it is also used in other cases (such as a server
draining) in which we cannot assume that the previous attempt was not
completed. (It's unclear whether this assumption was once true and
changed or if it's always been incorrect. The specific source of
ambiguous Unavailable errors we're seeing is grpc/grpc-go#1147)

This is expected to increase prevalence of AmbiguousResultErrors; this
will be fixed in a follow-up change.

Fixes cockroachdb#17491
@bdarnell
Copy link
Contributor Author

Still waiting on an LGTM.

@tamird
Copy link
Contributor

tamird commented Aug 17, 2017

:lgtm:


Reviewed 2 of 2 files at r6, 2 of 2 files at r7, 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


Comments from Reviewable

@bdarnell bdarnell merged commit caa0565 into cockroachdb:master Aug 17, 2017
@bdarnell bdarnell deleted the jepsen-updates branch August 17, 2017 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jepsen: Sets test flaky
4 participants