You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My tindydb json file grows with 5GB per week if I do not delete.
Currently we just load the tinydb data.json and delete all internalids below a given threshold.
But the major problem is that we need to close tinydb handler to do this, this do not work well in a multiprocessing asyncio fastapi app.
I like to keep max 1000 entries in the list and delete everything below the 1000 highest.
Any guidelines on how to do this while keeping the app running would be great.
A suggestion I got was adding a timestamps (epoch) and delete any timestamps below the 1000 highest, but it bloats up the table.
and add extra logic.
Thanks
The text was updated successfully, but these errors were encountered:
fenchu
changed the title
Feature request: get all the internalids and delete by internalid
is there any way to speed up deletion
Oct 18, 2023
This can be obtained using db.max(), but it is slow
def keep_newest(key:str='jobid', maxlen:int=1000) -> Optional[List]:
""" keep the newest maxlen entries in database """
global db
if not db:
db = TinyDB(db_path)
currlen = len(db.all())
if currlen<=maxlen:
#log.warning(f"database size is:{currlen} which is less than {maxlen} - no deletion")
return False
ids = []
for d in db.all()[:currlen - maxlen + 1]:
id = db.remove(where(key)==d[key])
if id:
ids.append(id)
#log.info(f"removed {d} with index {id}")
return ids
number of entries in database: 10000
number of entries in database: 999
deleting 9001 took 450.83sec
The json direct version is way faster: 1875 times faster?
def keep_newest_json(fname:str, maxlen:int=1000, table:str='_default') -> Optional[List]:
""" keep the newest maxlen entries in database """
dat = None
with open(fname, 'r', encoding='utf8') as FR:
dat = json.load(FR)
if table not in dat:
log.fatal(f"table:{table} not found in dat:{list(dat.keys())}")
return None
currlen = len(dat[table].keys())
if currlen<=maxlen:
log.info(f"table:{table} has {currlen} entries, less than maxlen:{maxlen}")
return None
ids = []
for id in list(dat[table].keys())[:currlen - maxlen + 1]:
del dat[table][id]
ids.append(id)
#log.info(f"removed index {id} from {table}")
with open(fname, 'w', encoding='utf8') as FW:
FW.write(json.dumps(dat, indent=2, sort_keys=True))
return ids
number of entries in database: 10000
number of entries in database: 999
deleting 9001 took 0.24sec
My tindydb json file grows with 5GB per week if I do not delete.
Currently we just load the tinydb data.json and delete all internalids below a given threshold.
But the major problem is that we need to close tinydb handler to do this, this do not work well in a multiprocessing asyncio fastapi app.
I like to keep max 1000 entries in the list and delete everything below the 1000 highest.
Any guidelines on how to do this while keeping the app running would be great.
A suggestion I got was adding a timestamps (epoch) and delete any timestamps below the 1000 highest, but it bloats up the table.
and add extra logic.
Thanks
The text was updated successfully, but these errors were encountered: