Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Request.Sessions for faster and less open connections #923

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

LGXerxes
Copy link

@LGXerxes LGXerxes commented Feb 2, 2024

Pull Request

Related issue

Fixes #<issue_number>

What does this PR do?

  • Changes all requests.get/etc methods to session.get/etc, in order to not create new connections for each request

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

@LGXerxes
Copy link
Author

LGXerxes commented Feb 2, 2024

Do need some help as to why this test doesn't pass:

==================================================================== short test summary info =====================================================================
FAILED tests/errors/test_timeout_error_meilisearch.py::test_client_timeout_error - Failed: DID NOT RAISE <class 'meilisearch.errors.MeilisearchTimeoutError'>
=========================================================== 1 failed, 243 passed, 1 xfailed in 33.49s ============================================================

@sanders41
Copy link
Collaborator

sanders41 commented Feb 2, 2024

Do need some help as to why this test doesn't pass:

It fails because the test patches requests.get, but you changed the code to use session.get so the patch isn't running.

I understand why in theory this could be faster in the right situation, but did you do any benchmarking? In my extreamly quick and crude benchmark there was no improvement.

@LGXerxes
Copy link
Author

LGXerxes commented Feb 2, 2024

Do need some help as to why this test doesn't pass:

It fails because the test patches requests.get, but you changed the code to use session.get so the patch isn't running.

I understand why in theory this could be faster in the right situation, but did you do any benchmarking. In my extreamly quick and crude benchmark there was no improvement.

Haven't done any benchmarks, other than knowing that requests.get() needs to open a new TCP connection and session.get() can re-use the same connection.

The bigger "issue" which arose in my setup is Connection timeout's due to the quantity of connections were being made to my Meilisearch Server, probably nginx settings. But I also got There is no reason to not reuse a TCP connection when possible.
Other SDK's are using the same logic, by sharing the client/Session between meilisearch.Client and meilisearch.Index:
https://github.com/meilisearch/meilisearch-go/blob/main/index.go#L82-L88

Copy link
Collaborator

@sanders41 sanders41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an issue with it in principal. It's just that it is a change for the sake of a change if we can't show it makes any difference. @curquiza I'll let you make the call here, I don't have a preference either way.

If we do implement it I think there are a few changes we should make.

@@ -31,6 +31,7 @@ def __init__(
primary_key: Optional[str] = None,
created_at: Optional[Union[datetime, str]] = None,
updated_at: Optional[Union[datetime, str]] = None,
http: Optional[HttpRequests] = None,
Copy link
Collaborator

@sanders41 sanders41 Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since HttpRequests comes from a private module it's not great to pass it around like this. I think it would be better to pass around the session because of this.

session also needs to be added to the docstring

Suggested change
http: Optional[HttpRequests] = None,
session: Optional[Session] = None,

@@ -43,7 +44,7 @@ def __init__(
Primary-key of the index.
"""
self.config = config
self.http = HttpRequests(config)
self.http = http if http is not None else HttpRequests(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.http = http if http is not None else HttpRequests(config)
self.session = session if session else Session()

@@ -130,6 +130,7 @@ def get_indexes(self, parameters: Optional[Mapping[str, Any]] = None) -> Dict[st
index["primaryKey"],
index["createdAt"],
index["updatedAt"],
self.http,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.http,
self.session = Session()

@@ -178,7 +179,7 @@ def get_index(self, uid: str) -> Index:
MeilisearchApiError
An error containing details about why Meilisearch can't process your request. Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
"""
return Index(self.config, uid).fetch_info()
return Index(self.config, uid, http=self.http).fetch_info()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Index(self.config, uid, http=self.http).fetch_info()
return Index(self.config, uid, http=self.session).fetch_info()

@@ -216,7 +217,7 @@ def index(self, uid: str) -> Index:
An Index instance.
"""
if uid is not None:
return Index(self.config, uid=uid)
return Index(self.config, uid=uid, http=self.http)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Index(self.config, uid=uid, http=self.http)
return Index(self.config, uid=uid, http=self.session)

@@ -22,6 +22,7 @@ def __init__(self, config: Config) -> None:
"Authorization": f"Bearer {self.config.api_key}",
"User-Agent": _build_user_agent(config.client_agents),
}
self.session = requests.Session()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add session: Session to the __init__ parameters.

Suggested change
self.session = requests.Session()
self.session = session

@LGXerxes
Copy link
Author

LGXerxes commented Feb 2, 2024

Currently trying to apply your changes, but not passing around the HttpRequests also affects how tasks are created.
As each task also creates a HttpReqeusts object.
https://github.com/LGXerxes/meilisearch-python/blob/lgx/requests-session/meilisearch/task.py#L14-L29

Perhaps it is easier to:

global_session = None

def get_global_session():
    global global_session
    if global_session:
        return global_session
    global_session = requests.Session()
    return global_session

class HttpRequests:
    def __init__(self, config: Config) -> None:
        self.config = config
        self.headers = {
            "Authorization": f"Bearer {self.config.api_key}",
            "User-Agent": _build_user_agent(config.client_agents),
        }
        self.session = get_global_session()

This way when creating HttpRequests it just fetches the session.
And makes it also easier to exchange the session for other libraries etc

@sanders41
Copy link
Collaborator

Good point. I think what we could do is have the TaskHandler __init__ take an optional session like in the Index and create the handler the same way as index. This is an "off the top of my head" solution that I haven't tested yet so feel free to point out any issues if you see some.

@sanders41
Copy link
Collaborator

Another option would be to create a http sigelton in _httprequests.py and have everything just use that.

@LGXerxes
Copy link
Author

@sanders41 I think the singleton pattern is probably the best choice here.

Don't see any issues with this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants