Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database is locked error in kypher #657

Open
saggu opened this issue Jul 5, 2022 · 4 comments
Open

Database is locked error in kypher #657

saggu opened this issue Jul 5, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@saggu
Copy link
Member

saggu commented Jul 5, 2022

Describe the bug
Running this command,

kgtk("""
    query -i "$OUT"/derived.class.P1963computed.subclass.tsv.gz -i subclasscountstar
        --match '
            computed:
                (class)-[edge_id:P1963subclass_star]->(property),
                (edge_id)-[:P1114]->(property_count),
            count:
                (class)-[]->(instance_count)'
        --return 'edge_id as node1, "P1107" as label, cast(property_count, float)/cast(instance_count, float) as node2'
    / add-id --id-style wikidata
    / cat -i - -i "$OUT"/derived.class.P1963computed.subclass.tsv.gz
    / sort
    -o $OUT/derived.P1963computed.subclass.count.star.tsv.gz
""")

throws an error,

Exception in thread background thread for pid 108149:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env-ckg07/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_2: 

  RAN: /usr/bin/bash -c 'kgtk      query -i "$OUT"/derived.class.P1963computed.subclass.tsv.gz -i subclasscountstar         --match '"'"'             computed:                 (class)-[edge_id:P1963subclass_star]->(property),                 (edge_id)-[:P1114]->(property_count),             count:                 (class)-[]->(instance_count)'"'"'         --return '"'"'edge_id as node1, "P1107" as label, cast(property_count, float)/cast(instance_count, float) as node2'"'"'     / add-id --id-style wikidata     / cat -i - -i "$OUT"/derived.class.P1963computed.subclass.tsv.gz     / sort     -o $OUT/derived.P1963computed.subclass.count.star.tsv.gz'

  STDOUT:


  STDERR:

INTERNAL ERROR: sqlite3.database is locked

[Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

Running this command second time works, this is in a cell in a notebook, I dont know what triggers this error. But it seems to be happening consistently.

Creating this issue to debug in future

@versant2612
Copy link

I got this error too.

I tried to run a simpler query while a more complex and time consuming one is running in the background.

(base) root@vm096:/app/kgtk/temp# kgtk query -i $GRAPH_CLAIMS --match '(item)-[:P1340]->(value1), (item)-[:P1340]->(value2)' --where 'value1 < value2' --return 'item, value1, value2' --limit 50 > prop_P1340.txt &
[1] 54515
(base) root@vm096:/app/kgtk/temp# database is locked
[1]+ Exit 1 kgtk query -i $GRAPH_CLAIMS --match '(item)-[:P1340]->(value1), (item)-[:P1340]->(value2)' --where 'value1 < value2' --return 'item, value1, value2' --limit 50 > prop_P1340.txt

(base) root@vm096:/app/kgtk/temp# ps -ef | grep kgtk
root 53995 52878 31 17:36 pts/0 01:52:25 /app/kgtk/miniconda3/bin/python /app/kgtk/miniconda3/bin/kgtk query -i /app/kgtk/data/wikidata/claims.tsv.gz --as claims -i /app/kgtk/data/wikidata/alias.en.tsv.gz --as aliases --match c: (item)-[p1]->(value1), c: (item)-[p2]->(value2), c: (prop)-[:P31]->(:Q18608871), a: (prop)-[:alias]->(value3) --where value1 < value2 and p1.label = p2.label and p1.label = prop --return prop, value3

root 54525 54423 0 23:38 pts/1 00:00:00 grep --color=auto kgtk

@versant2612
Copy link

I got this error just trying to check database cache while a query was runing

(base) root@vm096:/app/kgtk/temp# kgtk query --show-cache
database is locked

@chalypso
Copy link
Collaborator

The SQLite3 database uses a rollback journal by default which uses an exclusive lock when the database is written. That prevents any concurrent reading activity, thus the error when you try to read something from the DB if another query currently imports data in the background or tries to create an index or any other update activity. --show-cache also needs to read the database and fails for similar reasons. See here for some background on DB locking: https://www.sqlite.org/lockingv3.html

There is a disclaimer in the query manual against the use of query pipes for these reasons, but it might not be clear that that also applies to other scenarios where the DB gets written.

I am not quite sure why the pipe at the beginning of this thread fails, but some of the generic KGTK commands also use or try to read the DB, which would explain why it fails with the same error.

There is a write-ahead-logging (or WAL) journal mode that I could enable by default. It does allow concurrent readers (but only one writer) which would fix the problem. I've played with that in the past. There are some advantages and disadvantages (see here: https://www.sqlite.org/wal.html). Let me look into that again to see if it's safe to change to that.

@CraigMiloRogers CraigMiloRogers added this to To do in KGTK Development via automation Nov 21, 2022
@CraigMiloRogers CraigMiloRogers added the bug Something isn't working label Nov 21, 2022
@chalypso
Copy link
Collaborator

I pushed a fix to dev that changes the default journal mode to WAL which allows one writer and multiple parallel readers which should solve the various problems above. Unfortunately, that does slow down import of large data files significantly, so I added a new --single-user mode option that switches back to the old journal mode and behavior to get the faster import times for those files if needed (see manual). It's ok to switch back and forth between the two modes without causing problems for the DB.

In the process I also upgraded query pipelines which are working nicely now, before they were very limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

4 participants