Result of `query_entities` of `TableClient` cannot be returned in `with` keyword successfully #25640

matt-mi · 2022-08-10T03:03:05Z

Package Name: azure-data-tables
Package Version: 12.4.0
Operating System: macos
Python Version: 3.9.13

Describe the bug
Result of query_entities of TableClient cannot be returned in with keyword

To Reproduce
Steps to reproduce the behavior:

create a record in table store. E.g., a row has value cluster4 under RowKey.
Query the record from table store by following

def get_cluster():
    cluster_name_filter = "RowKey eq 'cluster4'"
    with TableClient(endpoint=sta_endpoint, table_name=table_name, credential=credential) as table_client:
        return table_client.query_entities(query_filter=cluster_name_filter)

results = get_cluster()
for i in results:
    print(i['RowKey'])

Execute code snippet above and got exception below

(.venv) ➜  /cronjob/.venv/bin/python scheduler.py
Traceback (most recent call last):
  File "/cronjob/scheduler/scheduler.py", line 135, in <module>
    for i in results:
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/paging.py", line 128, in __next__
    return next(self._page_iterator)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/paging.py", line 76, in __next__
    self._response = self._get_next(self.continuation_token)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/data/tables/_models.py", line 363, in _get_next_cb
    return self._command(
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/data/tables/_generated/operations/_table_operations.py", line 380, in query_entities
    pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 211, in run
    return first_node.send(pipeline_request)  # type: ignore
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  [Previous line repeated 3 more times]
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/data/tables/_policies.py", line 201, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  [Previous line repeated 1 more time]
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/_base.py", line 103, in send
    self._sender.send(request.http_request, **request.context.options),
  File "/cronjob/.venv/lib/python3.9/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 327, in send
    response = self.session.request(  # type: ignore
AttributeError: 'NoneType' object has no attribute 'request'

Expected behavior
Results can be returned successfully from with keyword

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
If I'm not using with to automatically create and close TableClient, results can be returned and lopped successfully.

The text was updated successfully, but these errors were encountered:

azure-sdk · 2022-08-10T03:03:29Z

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Tables:0.4404969,Storage:0.094060935,Azure.Core:0.06135202'

annatisch · 2022-08-10T04:05:24Z

Thanks @matt-mi!
This is an interesting scenario :)

The context manager of the TableClient will close the connection - though you are correct that this will leave the auto-iterating query response with a severed connection that will fail when it needs to collect the next page of results.
On the one hand, this is the expected behaviour.
On the other hand - that is a pretty awful stacktrace/error message, so we should look at fixing that!

There are two ways you could resolve this.
The first and easiest is to simply use list() to fully iterate the query before exiting the TableClient context:

def get_cluster():
    cluster_name_filter = "RowKey eq 'cluster4'"
    with TableClient(endpoint=sta_endpoint, table_name=table_name, credential=credential) as table_client:
        return list(table_client.query_entities(query_filter=cluster_name_filter))

results = get_cluster()
for i in results:
    print(i['RowKey'])

However the above solution will lose the incremental nature of the query, and if there's a lot of data to return this could be very slow. So the second option is to turn the entire function into a generator that will yield the results and therefore both iteratively download the data, but also not close the connection until the iteration is complete:

def get_cluster():
    cluster_name_filter = "RowKey eq 'cluster4'"
    with TableClient(endpoint=sta_endpoint, table_name=table_name, credential=credential) as table_client:
        for result in table_client.query_entities(query_filter=cluster_name_filter):
            yield result

results = get_cluster()
for i in results:
    print(i['RowKey'])

Hopefully one of these two options will fix your current error, and we will look at improving the failure experience for when the connection is closed on an orphaned iterator.

@iscai-msft, @xiangyan99 - do you know if we have tests for this scenario in the azure-core tests for ItemPaged?

matt-mi · 2022-08-10T04:25:23Z

Thanks @annatisch for your prompt response on this!

There's one thing I'm kind of confused about, which is for example if I was trying to manually create and close the TableClient connection, it's also working ..

def get_cluster():
    cluster_name_filter = "RowKey eq 'cluster4'"
    table_client = TableClient(endpoint=sta_endpoint, table_name=table_name, credential=credential)
    results = table_client.query_entities(query_filter=cluster_name_filter)
    table_client.close()
    return results

for i in get_cluster():
    print(i['RowKey'])

I'm not quite sure what's the difference and forgive me about my limited knowledge in python ...

annatisch · 2022-08-10T04:40:14Z

Thanks @matt-mi - that is.... very interesting indeed.
I've repro-ed the same behaviour and having a dig around now.

annatisch · 2022-08-10T05:01:35Z

Ah - I have discovered the reason - and there's possibly something we could look at fixing as well... though might need more discussion.

In this case, using the context manager - i.e. the with syntax, you are pre-emptively opening the connection pool, even though no request has yet been made. And in fact, this get_cluster function is not making a single request to the service, which is not done until we attempt to iterate on the results.
In other words, when opened in a context manager, that connection is then severed with the return statement.

However in the second example, simply creating the TableClient does not open the connection - this is done lazily on the first request. So the client.close() that's being called, isn't actually closing anything, because it was never opened. This call to close is being completely disregarded when the first actual service call is made outside of the function, and the client is treating this as being opened for the first time. Therefore - at the end of the script, the connection pool is not being cleaned up and the connection remains open.

@xiangyan99 - it looks like we have two questions for core here:

We should improve the error message when we attempt to use a transport that has been closed.
When we call close on a client that hasn't yet been opened - that should probably prevent it from ever being opened so that cases like this don't result in uncleaned up resources. Though we might need to look into whether this could break anyone.

matt-mi · 2022-08-10T06:23:59Z

Thank you for your detailed explanation Anna! I thought the connection was already initiated when a TableClient spawned, turns out when the item is being iterated then connection is initiated ...

So in this case if I don't want to use with keyword then I would need to close the client (table_client.close()) after everything is iterated right?

annatisch · 2022-08-10T06:47:35Z

Yes @matt-mi - that is correct, so the following (adding list() - which forces the full iteration up-front) would close the connection:

def get_cluster():
    cluster_name_filter = "RowKey eq 'cluster4'"
    table_client = TableClient(endpoint=sta_endpoint, table_name=table_name, credential=credential)
    results = list(table_client.query_entities(query_filter=cluster_name_filter))
    table_client.close()
    return results

for i in get_cluster():
    print(i['RowKey'])

Edit:
Though I should add that using the with syntax is still preferable - as it means the connection will be closed even in the case of a failure (for example, an intermittent network failure, or, say, a typo in the table name resulting in a ResourceNotFound error etc).

matt-mi · 2022-08-10T06:51:58Z

Awesome, I will try to adapt with syntax to my code. Thanks again @annatisch, you save my day!

annatisch · 2022-08-10T06:54:07Z

You're very welcome @matt-mi!
And thank you so much for reporting this!

martinitus · 2023-07-19T14:56:04Z

I found this issue after I got confused by your documentation.

your toplevel readme doesn't use with contexts in the examples.
your examples use the with statements.

As a user, I now don't know which way would be correct (with with or without with), and whether there actually is some stuff that needs to be cleaned up - or if the exiting the context has no effect in general. Also, it would be nice to know if above changes when switching from sync to async.

Having to use with statements or not can have quite an impact on the design of our application.

IMHO: If there is stuff that needs to be cleaned up, then one should have to use the with statement - otherwise users will experience weird bugs, unsubmitted requests or whatever that are hard to debug. I would "simply" prevent usage of the clients without a context. Same as with open its just really bad practice to not use a context for that. - On the other hand this will probably break the code of thousands of users xD

ghost added customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Aug 10, 2022

github-actions bot added the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Aug 10, 2022

annatisch added Client This issue points to a problem in the data-plane of the library. Tables and removed needs-triage This is a new issue that needs to be triaged to the appropriate team. labels Aug 10, 2022

ghost added the needs-team-attention This issue needs attention from Azure service team or SDK team label Aug 10, 2022

annatisch added the Azure.Core label Aug 10, 2022

xiangyan99 mentioned this issue Aug 10, 2022

preventing opening transport if it is closed #25651

Closed

xiangyan99 added feature-request This issue requires a new behavior in the product in order be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 14, 2023

lmazuel linked a pull request May 8, 2024 that will close this issue

More explicit error message if transport is already closed #35559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Result of `query_entities` of `TableClient` cannot be returned in `with` keyword successfully #25640

Result of `query_entities` of `TableClient` cannot be returned in `with` keyword successfully #25640

matt-mi commented Aug 10, 2022

azure-sdk commented Aug 10, 2022

annatisch commented Aug 10, 2022

matt-mi commented Aug 10, 2022 •

edited

annatisch commented Aug 10, 2022

annatisch commented Aug 10, 2022

matt-mi commented Aug 10, 2022 •

edited

annatisch commented Aug 10, 2022 •

edited

matt-mi commented Aug 10, 2022

annatisch commented Aug 10, 2022 •

edited

martinitus commented Jul 19, 2023 •

edited

Result of query_entities of TableClient cannot be returned in with keyword successfully #25640

Result of query_entities of TableClient cannot be returned in with keyword successfully #25640

Comments

matt-mi commented Aug 10, 2022

azure-sdk commented Aug 10, 2022

annatisch commented Aug 10, 2022

matt-mi commented Aug 10, 2022 • edited

annatisch commented Aug 10, 2022

annatisch commented Aug 10, 2022

matt-mi commented Aug 10, 2022 • edited

annatisch commented Aug 10, 2022 • edited

matt-mi commented Aug 10, 2022

annatisch commented Aug 10, 2022 • edited

martinitus commented Jul 19, 2023 • edited

Result of `query_entities` of `TableClient` cannot be returned in `with` keyword successfully #25640

Result of `query_entities` of `TableClient` cannot be returned in `with` keyword successfully #25640

matt-mi commented Aug 10, 2022 •

edited

matt-mi commented Aug 10, 2022 •

edited

annatisch commented Aug 10, 2022 •

edited

annatisch commented Aug 10, 2022 •

edited

martinitus commented Jul 19, 2023 •

edited