Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to roll over quickly with multiple hosts in green mode #1253

Open
zhangyangyu opened this issue Mar 30, 2021 · 3 comments
Open

unable to roll over quickly with multiple hosts in green mode #1253

zhangyangyu opened this issue Mar 30, 2021 · 3 comments

Comments

@zhangyangyu
Copy link

zhangyangyu commented Mar 30, 2021

Using psycopg2, I find it's unable that if you provide mutliple hosts to connect to and fail quickly over unconnectable hosts.

For example, if I specify four hosts, host1,2,3,4 in dsn, and host1, host2 are not connectable, currently I find in green mode, I must go through the whole tcp timeout with host1 and host2, and then connect to host3 successfully. I'll explain the reason below, correct me if it's wrong.

For a typical green python application using postgres, we use psycopg2's set_wait_callback, for example eventlet's:

def eventlet_wait_callback(conn, timeout=-1):
    """A wait callback useful to allow eventlet to work with Psycopg."""
    while 1:
        state = conn.poll()
        if state == extensions.POLL_OK:
            break
        elif state == extensions.POLL_READ:
            eventlet.hubs.trampoline(conn.fileno(), read=True)
        elif state == extensions.POLL_WRITE:
            eventlet.hubs.trampoline(conn.fileno(), write=True)
        else:
            raise psycopg2.OperationalError(
                "Bad result from poll: %r" % state)

conn.poll will use libpq's PQConnectPoll to return a socket, but in nonblocking mode, the socket returned doesn't already connect successfully, so we have to poll the socket to see when it's ready. This is how it's designed to work. poll will block until the connect failed, for example typically 127s. It's too long to accept! I could hack the trampoline to give it a timeout, but the timeout could only affect application level, psycopg2 and libpq won't know the timeout. There is no way to tell libpq that ohh don't stick to the host since I think it already fails and try next host please. Inside libpq if the connection is still not made it will move it's state machine and go to next stage if you call the PQConnectPoll again.

The only way I can hack over this is tweak the tcp_sync_retries tcp option to make the connect stage fails more quickly(connect_timeout and tcp_user_timeout not work). But of course it's rather not ideal. I think psycopg2 could provide a function to tweak pgconn's internal status to accomplish this.

@dvarrazzo
Copy link
Member

Hello,

I am unsure of what to do: it seems a problem that should be solved in the libpq? It also seems that the solutions are not portable: I assume different platforms have different way to customise the TCP/IP behaviour of the socket.

connection.fileno() returns the connection socket number, which can be used to customise the socket behaviour. Have you used it already to tweak and test what the right way to configure it should be?

I am open to improve in this area: your help would be welcome to understand what to do and how.

@zhangyangyu
Copy link
Author

The time you get the connection.fileno() it's already be called connect on. I think it's too late to configure the socket behaviour. The socket is created and configured inside PQConnectPoll.

Why I raise the problem in psycopg2 is because I think there is oppotunity to make it at C level. If psycopg2 could expose a method that correctly configures pgconn internal state, it could drive PQConnectPoll moving correctly and goes to try next host.

@dvarrazzo
Copy link
Member

Wouldn't psycopg have the same problem? Psycopg calls the libpq to get a socket: by the time it is created, isn't it too late to configure it? If it isn't, I may be wrong but I think that socket is passed to Python when psycopg is in green mode (i.e. there should be a first poll between PQconnectStart and PQconnectPoll: isn't that a good moment to configure the connection?

I think you are seeing the problem with more details than what I do. Please know that if there is some solution that is generic enough and portable enough it would be good to have it in psycopg2 but I think you are the best person to suggest that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants