Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async select not returning all rows #331

Open
Subdued5455 opened this issue Apr 30, 2024 · 5 comments
Open

Async select not returning all rows #331

Subdued5455 opened this issue Apr 30, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@Subdued5455
Copy link

  1. What versions are you using?
    Oracle DB 19.22.0.0.0
    platform.platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
    sys.maxsize > 2**32: True
    platform.python_version: 3.11.0rc1
    oracledb.version: 2.1.2
  1. Is it an error or a hang or a crash?
    An error.

  2. What error(s) or behavior you are seeing?
    When using async mode, fetches seem to end early. The issue seems to occur on every execution, though the exact number of rows returned is inconsistent. It works properly when using synchronous mode. I am relatively new to python, but believe the basic test code is correct - would be happy if this is just a problem of me doing something dumb. I am connecting to a database I have no control over.

Here is my script call and output:
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 35.00 seconds
async COUNT(1): 9375506
async total_fetched_rows: 32802
async cursor.rowcount: 32802

  1. Does your application call init_oracle_client()?
    No.
  1. Include a runnable Python script that shows the problem.
import asyncio
import oracledb
import getpass
import time


ORACLE_SOURCE_DSN=""
ORACLE_SOURCE_USER=""
ORACLE_SOURCE_PASS=""
    
async def async_test(table_name,pool):
    print(f"async starting...")
    start = time.time()
    total_fetched_rows = 0
    cursor_rowcount = 0
    async with oracledb.connect_async(user=ORACLE_SOURCE_USER,password=ORACLE_SOURCE_PASS,dsn=ORACLE_SOURCE_DSN) as connection:
        print(f"async connection acquired...")
        async with connection.cursor() as cursor:
            print(f"async cursor created...")
            print(f"async getting count(1)...")
            await cursor.execute(f"SELECT COUNT(1) FROM {table_name}")
            (row_count,) = await cursor.fetchone()
            print(f"async getting rows...")
            await cursor.execute(f'SELECT * FROM {table_name}')
            while True:
                rows = await cursor.fetchmany()
                if not rows:
                    break
                total_fetched_rows = total_fetched_rows + len(rows)
            cursor_rowcount = cursor.rowcount
    elapsed = time.time() - start
    print(f"async Elapsed Time: {elapsed:04.2f} seconds")
    print(f"async COUNT(1): {row_count}")
    print(f"async total_fetched_rows: {total_fetched_rows}")
    print(f"async cursor.rowcount: {cursor_rowcount}")

async def main():
    table_name = ""
    await async_test(table_name,pool=None)

ORACLE_SOURCE_PASS=getpass.getpass()
asyncio.run(main())
@Subdued5455 Subdued5455 added the bug Something isn't working label Apr 30, 2024
@anthony-tuininga
Copy link
Member

I haven't seen this behavior myself, but I also don't see anything obviously wrong with your code. Can you supply a full test case? It is possible that the data being fetched is important in some fashion. Ideally you have a create table statement and a PL/SQL block (or Python script) that populates the table with the data. Then the script posted above can be run to demonstrate the issue.

As an aside, why are you using the release candidate for Python 3.11? :-)

@Subdued5455
Copy link
Author

I will work on seeing if I can recreate the issue in a database I have more permissions on, with data that is generated and shareable.

Your question re: Python 3.11 prompted me to go back and check. At some point in the past as I was trying to figure this out I was just trying the different versions of python available in my package manager to see if that had any impact. I guess I just ended up on 3.11 and my version of Ubuntu has a release candidate as the current python 3.11 version:

Package: python3.11
Version: 3.11.0~rc1-1~22.04

Anyway, I will hopefully get back to you relatively soon. Thank you for all of your work on this!

@Subdued5455
Copy link
Author

Subdued5455 commented May 1, 2024

I have this test case which for me is working. I have dropped and recreated the table several times and each time it has successfully brought forth the error, however since the data load is using random data I do not know that I can guarantee it will always work. I also do not know what part of the test case is actually important - I took one of the tables that was causing issues and used its basic schema, loading it with random data.
New Table:

create table ASYNC_TEST
(
  a                 NUMBER(8) not null,
  b            VARCHAR2(6) not null,
  c               NUMBER(3) not null,
  d                  VARCHAR2(5) not null,
  e            VARCHAR2(4) not null,
  f VARCHAR2(5) not null,
  g VARCHAR2(2),
  h VARCHAR2(3),
  i VARCHAR2(4),
  j VARCHAR2(4),
  k VARCHAR2(1),
  l VARCHAR2(30),
  m NUMBER(4),
  n VARCHAR2(30),
  o VARCHAR2(1),
  p DATE not null,
  q VARCHAR2(3),
  r VARCHAR2(3),
  s DATE,
  t DATE,
  u NUMBER(7,3),
  v VARCHAR2(3),
  w VARCHAR2(1),
  x DATE,
  y DATE,
  z NUMBER(3),
  aa VARCHAR2(100),
  bb NUMBER(2),
  cc NUMBER(19) not null,
  dd NUMBER(19) not null,
  ee VARCHAR2(30),
  ff VARCHAR2(30),
  gg VARCHAR2(6),
  PRIMARY KEY(a,b,c)
);

Data Load:

INSERT INTO async_test
SELECT
dbms_random.value(0,99999999),
dbms_random.string('p',dbms_random.value(1,6)),
dbms_random.value(0,999),
dbms_random.string('p',dbms_random.value(1,5)),
dbms_random.string('p',dbms_random.value(1,4)),
dbms_random.string('p',dbms_random.value(1,5)),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',2) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',4) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',4) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,9999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
sysdate + dbms_random.value(-10000,10000),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
sysdate + dbms_random.value(-10000,10000),
sysdate + dbms_random.value(-10000,10000),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,9999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN sysdate + dbms_random.value(-10000,10000) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN sysdate + dbms_random.value(-10000,10000) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',100) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,99) END,
dbms_random.value(0,9999999999999999999),
dbms_random.value(0,9999999999999999999),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',6) END
FROM DUAL
connect by LEVEL < 1000000;

This is some example output I got:
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 153.27 seconds
async COUNT(1): 999999
async fetch_count: 7416
async total_fetched_rows: 741402
async cursor.rowcount: 741402

I had also tried originally with loading 10million records. This takes a very long time though (at least on my server):
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 115.61 seconds
async COUNT(1): 9999999
async fetch_count: 5545
async total_fetched_rows: 554302
async cursor.rowcount: 554302

Anyway, I really hope you are able to reproduce.

@anthony-tuininga
Copy link
Member

I just tried your script with Python 3.12 and did not notice any difficulties and I ran it multiple times. Asyncio is still relatively new and there have been issues reported. Can you try with Python 3.12 and the newly released python-oracledb 2.2.0 and see if you are still able to reproduce in your setup? I also used the recently released Oracle Database 23ai but that shouldn't have any bearing on this issue!

Is the database local? Or is there significant latency between the client and the database?

@Subdued5455
Copy link
Author

I just tried with Python 3.12 and the latest oracledb 2.2.0 and I'm still getting the same issue. I'm running on a virtualbox guest & the database is not local. I have not noticed any particularly noticeable latency before between the database, but I have never actually tried and measured such a thing either. It is both geographically and network-topographically quite close.

I do a lot of oracle development (SQL & PL/SQL), but I am not really much of an administrator - are there any database parameters that could possibly be of interest here? I will try a few more things and see if I can figure out anything else.

If I can figure out anything I will update here. Thank you again for all of your efforts on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants