Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase gRPC message size #759

Open
LahiLuk opened this issue Sep 20, 2023 · 5 comments
Open

Increase gRPC message size #759

LahiLuk opened this issue Sep 20, 2023 · 5 comments

Comments

@LahiLuk
Copy link

LahiLuk commented Sep 20, 2023

Hello,

I encountered the following error while running a task with TPI:

│ Error: Plugin error
│ 
│   with iterative_task.data-segmentation,
│   on main.tf line 6, in resource "iterative_task" "data-segmentation":
│    6: resource "iterative_task" "data-segmentation" {
│ 
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).PlanResourceChange: rpc error: code =
│ ResourceExhausted desc = grpc: received message larger than max (9953646 vs. 4194304)

Due to the error, the provisioned EC2 instance stopped producing expected outputs, but still kept running. I could not run terraform destroy and had to terminate the instance and all other resources manually.

When running the same task on a smaller subset of data, the task completes successfully.

If I understand correctly, gRPC uses the default 4MB message size unless configured to allow a larger size. Is there a way for TPI plugin users to configure this setting?

Environment Details:

Terraform v1.5.7
on linux_amd64
+ provider registry.terraform.io/iterative/iterative v0.11.19
@dacbd
Copy link
Contributor

dacbd commented Sep 23, 2023

@LahiLuk I'll have to look into it, can you share a minimal example that will result in the error? If it is as simple as adding an option when we initialize the gcp client feel free to open a PR yourself, currently, I can't give you a timeline as to when I can resolve this.

@LahiLuk
Copy link
Author

LahiLuk commented Sep 29, 2023

Hi @dacbd,

anything that will produce a log larger than 4 MB should result in the error. Here's an example:

terraform {
  required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}

resource "iterative_task" "grpc-error-example" {
  cloud   = "aws"
  machine = "t2.micro"
  spot    = -1
  image   = "ubuntu"
  region = "eu-west-1"

  storage {
    workdir = ""
    output = ""
  }
  script = <<-END
    #!/bin/bash
    while true; do
      echo "Hello, World!"
      sleep 0.01  # Slow down log creation a bit
    done
  END
}

I'm not sure if I'll be able to open a PR since I'm new to Terraform, and I've never used Go, but I'll try and look into it.
It seems in any case that other providers were able to increase the maximum message size, see for example terraform-plugin-go.

In the meantime, do you have a suggestion for a workaround? A crude one I came up with is to redirect all shell output to a file, but that complicates log monitoring... Since TPI is geared towards running ML experiments, I find it a bit weird that no one ran into this issue yet, since those logs tend to be quite detailed and the datasets large...

@dacbd
Copy link
Contributor

dacbd commented Sep 29, 2023 via email

@LahiLuk
Copy link
Author

LahiLuk commented Sep 29, 2023

@dacbd,

I also forgot to mention... It seems that the task itself actually keeps on running after the error, and the logs keep being written to s3. It's just that any terraform commands run locally fail with the error.

@dacbd
Copy link
Contributor

dacbd commented Sep 29, 2023

@0x2b3bfa0 can you try and take a look at this, I'm hoping it might be as simple as updating our terraform-provider-sdk, or we need to add something more to the plugin.Serve https://github.com/iterative/terraform-provider-iterative/blob/763e7a1026bca3d31790727c52dacb5e02e98abf/main.go#L13C2-L17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants