Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seahorse http connection pool may crash after fork(2) #3020

Closed
yokonao opened this issue May 12, 2024 · 7 comments · Fixed by #3021
Closed

Seahorse http connection pool may crash after fork(2) #3020

yokonao opened this issue May 12, 2024 · 7 comments · Fixed by #3021
Assignees
Labels
bug This issue is a bug. in-progress Work is in progress to resolve the issue.

Comments

@yokonao
Copy link

yokonao commented May 12, 2024

Describe the bug

If you call fork(2) when multi-threaded AWS API calls, some error may occur in the child process.
RuntimeError: can't add a new key into hash during iteration occurs as reported in #1331.

I think calling Aws.empty_connection_pools! after fork will prevent this crash, but resetting the connection pool using Aws.empty_connection_pools! may be incomplete.
In Aws.empty_connection_pools! we call Hash#clear on the hash that is pooling the http connection, but Hash#clear does not reset the state that it is iterating.
ref. https://github.com/ruby/ruby/blob/v3_1_5/hash.c#L2867-L2869
So, it may crash even if Aws.empty_connection_pools! is executed immediately after fork(2).

Expected Behavior

We can prevent RuntimeError: can't add a new key into hash during iteration by discarding the seahorse connection pool after fork(2).

Current Behavior

RuntimeError: can't add a new key into hash during iteration may occurs even if Aws.empty_connection_pools! is executed immediately after fork(2).

Reproduction Steps

A simple ruby script for reproduction.

require 'aws-sdk-s3'
require 'aws-sdk-dynamodb'

Aws::S3::Client.new.list_buckets

# iterate Seahorse::Client::NetHttp::ConnectionPool @pool forever
Thread.new { loop { Seahorse::Client::NetHttp::ConnectionPool.pools.first.size } }

sleep 0.1 # wait a miniute for the thread to start

fork do
  Aws.empty_connection_pools!

  Aws::DynamoDB::Client.new.list_tables
end
  • In the parent process, the connection pool hash is iterated in another thread
  • Aws.empty_connection_pools! is called after fork, and call AWS API (push a new connection into the pool internally)

Executing the above script may causes RuntimeError: can't add a new key into hash during iteration.

I checked with the following versions.

  • aws-sdk-core 3.170.0
  • aws-sdk-s3 1.106.0
  • aws-sdk-dynamodb 1.84.0

Possible Solution

More radical destruction is required to ensure that the connection pooling Hash is not reused in child processes.

I have confirmed that methods like below are effective to prevent the error

class Seahorse::Client::NetHttp::ConnectionPool
  def self.discard_connection_pools
    @pools_mutex.synchronize { @pools = {} }
  end
end

Adding this method to the reproduction script:

require 'aws-sdk-s3'
require 'aws-sdk-dynamodb'

class Seahorse::Client::NetHttp::ConnectionPool
  def self.discard_connection_pools
    @pools_mutex.synchronize { @pools = {} }
  end
end

Aws::S3::Client.new.list_buckets

# iterate Seahorse::Client::NetHttp::ConnectionPool @pool forever
Thread.new { loop { Seahorse::Client::NetHttp::ConnectionPool.pools.first.size } }

sleep 0.1 # wait a miniute for the thread to start

fork do
  Seahorse::Client::NetHttp::ConnectionPool.discard_connection_pools

  Aws::DynamoDB::Client.new.list_tables
end

I have confirmed that no errors occur even after several hundred runs.

Additional Information/Context

No response

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-core

Environment details (Version of Ruby, OS environment)

Ruby 3.1 (Our environment)

@yokonao yokonao added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 12, 2024
@mullermp
Copy link
Contributor

Thanks for opening an issue. I think we could fix the empty! method to not use iteration as well. We will investigate.

@jterapin jterapin added in-progress Work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels May 13, 2024
@jterapin jterapin self-assigned this May 13, 2024
@jterapin
Copy link
Contributor

Hi! We have a pending PR in efforts to resolve this issue. I will let you know when it is merged.

I did noticed that you are using an older version of the aws-sdk-core (3.170.0). That version of the core will not have changes that was done for the previously mentioned fix.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@jterapin
Copy link
Contributor

This fix will be released with tomorrow's update

@yokonao
Copy link
Author

yokonao commented May 13, 2024

Thank you!
Does this mean that there is no possibility of the connection pool hash being iterated?

@jterapin
Copy link
Contributor

Yes, that was our intention. We did an audit of the ConnectionPool but let us know if you come across any issues after using tomorrow's release.

@mullermp
Copy link
Contributor

Does this mean that there is no possibility of the connection pool hash being iterated?

I think it will always be iterated, but I think the difference of whether it happens in C or Ruby when using fork is what matters. I think this should fix it for you!

Thanks @jterapin for picking this up while I am away at RubyKaigi :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. in-progress Work is in progress to resolve the issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants