Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8634

dxdc · 2023-11-11T23:44:16Z

Checklist

I have verified that the issue exists against the main branch of Celery.
This has already been asked to the discussions forum first.
I have read the relevant section in the
contribution guide
on reporting bugs.
I have checked the issues list
for similar or identical bug reports.
I have checked the pull requests list
for existing proposed fixes.
I have checked the commit log
to find out if the bug was already fixed in the main branch.
I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).

Mandatory Debugging Information

I have included the output of celery -A proj report in the issue.
(if you are not able to do this, then at least specify the Celery
version affected).
I have verified that the issue exists against the main branch of Celery.
I have included the contents of pip freeze in the issue.
I have included all the versions of all the external dependencies required
to reproduce this bug.

Optional Debugging Information

I have tried reproducing the issue on more than one Python version
and/or implementation.
I have tried reproducing the issue on more than one message broker and/or
result backend.
I have tried reproducing the issue on more than one version of the message
broker and/or result backend.
I have tried reproducing the issue on more than one operating system.
I have tried reproducing the issue on more than one workers pool.
I have tried reproducing the issue with autoscaling, retries,
ETA/Countdown & rate limits disabled.
I have tried reproducing the issue after downgrading
and/or upgrading Celery and its dependencies.

Related Issues and Possible Duplicates

Related Issues

SQLAlchemy bug report

Possible Duplicates

None

Environment & Settings

Celery version:

celery report Output:

software -> celery:5.3.5 (emerald-rush) kombu:5.3.3 py:3.11.6
            billiard:4.2.0 redis:5.0.1
platform -> system:Linux arch:64bit
            kernel version:6.4.16-linuxkit imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:db+mssql+pyodbc://sa:**@mssql:1433/docker

broker_url: 'redis://redis:6379/0'
result_backend: 'db+mssql+pyodbc://sa:********@mssql:1433/docker'
deprecated_settings: None
result_extended: True
result_expires: None
result_backend_always_retry: True
result_backend_thread_safe: True
database_table_names: 
 'group': 'results_groupmeta', 'task': 'results_taskmeta'}

Steps to Reproduce

Use MSSQL Server as database backend

Required Dependencies

Minimal Python Version: N/A or Unknown
Minimal Celery Version: N/A or Unknown
Minimal Kombu Version: N/A or Unknown
Minimal Broker Version: N/A or Unknown
Minimal Result Backend Version: N/A or Unknown
Minimal OS and/or Kernel Version: N/A or Unknown
Minimal Broker Client Version: N/A or Unknown
Minimal Result Backend Client Version: N/A or Unknown

Python Packages

pip freeze Output:

SQLAlchemy==2.0.23
kombu==5.3.3
celery==5.3.5

Other Dependencies

N/A

Minimally Reproducible Test Case

celery = Celery(
    "tasks",
    broker=broker_url,
    result_backend=backend_url, # this must be type mssql server
)

Expected Behavior

Celery crashes when creating the schema for the results backend table when using sqlalchemy. The problem appears to be a mismatch in type between sa.Sequence and sa.Integer for the id column on Task and TaskSet.

You can see a mismatch between the id column type (int) and the task_id_sequence (bigint).

This simple monkey patch solves the issue, and while this could be a bug in sqlalchemy, I suspect it could be an issue with the schema definition itself so I wanted to ask here first. I've linked (above) what appears to be a related bug in SQLalchemy that was recently fixed.

from celery.backends.database import models
from sqlalchemy import BigInteger

# Monkey Patch for MSSQL Compatibility in Celery's Result Backend
# ---------------------------------------------------------------
# This code addresses a compatibility issue when using Microsoft SQL Server (MSSQL)
# as the backend for Celery's task results. In MSSQL, the 'id' field in the results
# tables can exceed the maximum value for an Integer data type, leading to overflow errors.
# To prevent this, the data type of the 'id' column in both the Task and TaskSet models
# is changed from Integer to BigInteger. This alteration ensures that the 'id' field can
# accommodate larger values, aligning with MSSQL's capacity for handling big integers.
# This patch modifies the column type directly in the SQLAlchemy table definition
# for Celery's backend models before any tables are created in the database.
# It is a simple yet effective solution to avoid overflow issues in MSSQL
# while using Celery's database backend for storing task results.

models.Task.__table__.c.id.type = BigInteger()
models.TaskSet.__table__.c.id.type = BigInteger()

Actual Behavior

sqlalchemy.exc.DataError: (pyodbc.DataError) ('22003', '[22003] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Arithmetic overflow error converting expression to data type int. (8115) (SQLExecDirectW)')
[SQL: INSERT INTO results_taskmeta (id, task_id, status, result, date_done, traceback, name, args, kwargs, worker, retries, queue) OUTPUT inserted.id VALUES (NEXT VALUE FOR task_id_sequence, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('c8c7432b-255f-49c4-b546-c7954d935e40', 'PENDING', <pyodbc.NullParam object at 0x7fea2ec0e610>, datetime.datetime(2023, 11, 11, 22, 32, 27, 680391), None, None, <pyodbc.NullParam object at 0x7fea2ec0e610>, <pyodbc.NullParam object at 0x7fea2ec0e610>, None, None, None)]

The text was updated successfully, but these errors were encountered:

auvipy · 2023-11-12T04:19:07Z

never got any report regarding MSSQL bugs to be honest. are you suggesting that we should adjust the SQLA data types in celery to make it work properly with all back end or just MSSQL only?

dxdc · 2023-11-12T05:22:56Z

It might make sense to have a MSSQL specific change, at least for now. I think the cause may be here:

sa.Sequence('task_id_sequence'), primary_key=True, autoincrement=True)

Sequence is not supported in all dialects (for example, not MySQL)... and I suspect a type mismatch between the Sequence type and Integer type is causing the issue. For example, in the case of MSSQL, it appears to be using bigint for the Sequence.

Would something like this work?

import sqlalchemy as sa

@sa.event.listens_for(Task.__table__, "before_create")
def before_create(target, connection, **kwargs):
    if connection.dialect.name == 'mssql':
        target.columns['id'].type = sa.BigInteger()

## or, maybe it would be better to eliminate the sequence and only
## add it for non-mssql types. That way, it uses a standard int
## column for the PK which should be more than enough.

@sa.event.listens_for(Task.__table__, "before_create")
def before_create(target, connection, **kwargs):
    if connection.dialect.name != 'mssql':
        target.columns['id'].default = sa.Sequence('task_id_sequence')

Btw, is there a preferred way to customize the backend schema? For example, several of the columns are binary data (result, args, kwargs, etc.), which makes them unsearchable directly.

If I modify the result column type for example to JSON, e.g.,

from sqlalchemy import JSON, BigInteger
models.Task.__table__.c.result.type = JSON()

That works, and has several advantages for me, because I can now query the result backend db directly.

auvipy · 2023-11-12T06:27:09Z

good shares. we can learn and incorporate some of the good suggestion you are sharing here in SQLA model code base

trianglesis · 2023-12-06T14:57:18Z

That works, and has several advantages for me, because I can now query the result backend db directly.

This feature will be a real deal.

Since I am now making binary to JSON conversion manually (similar to Django rest serializers), to see what tasks are in progress and to see data passed to them.

UPD: I do it even uglier, currently, by pickle.loads(string)

Having tasks args, kwargs, results in json will be a nice feature.

twacn · 2024-02-26T06:59:52Z

@dxdc If I want to use the monkeypatch you suggested while waiting for the long-term fix, where does that code need to be added (in which module)?

dxdc · 2024-02-26T14:21:59Z

@twacn you add it in the same place just before you initialize celery, e.g.

from celery.backends.database import models
from celery import Celery
from sqlalchemy import BigInteger

# add monkey patch

models.Task.__table__.c.id.type = BigInteger()
models.TaskSet.__table__.c.id.type = BigInteger()

# initialize celery

celery = Celery("tasks", broker=...)

…d Limitation Fixes celery#8634

…d Limitation and format black Fixes celery#8634

dxdc added the Issue Type: Bug Report label Nov 11, 2023

auvipy added the Component: SQLAlchemy Database Results Backend label Nov 12, 2023

auvipy added this to the 5.4 milestone Nov 12, 2023

auvipy modified the milestones: 5.4, Future Feb 26, 2024

feiticeiro-tec added a commit to feiticeiro-tec/celery that referenced this issue Mar 22, 2024

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Fiel…

aa7a532

…d Limitation Fixes celery#8634

feiticeiro-tec linked a pull request Mar 22, 2024 that will close this issue

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8920

Draft

feiticeiro-tec added a commit to feiticeiro-tec/celery that referenced this issue Mar 22, 2024

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Fiel…

c23dc9e

…d Limitation and format black Fixes celery#8634

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8634

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8634

dxdc commented Nov 11, 2023 •

edited

auvipy commented Nov 12, 2023

dxdc commented Nov 12, 2023

auvipy commented Nov 12, 2023

trianglesis commented Dec 6, 2023 •

edited

twacn commented Feb 26, 2024

dxdc commented Feb 26, 2024

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8634

Arithmetic Overflow Error with MSSQL Backend due to Integer 'id' Field Limitation #8634

Comments

dxdc commented Nov 11, 2023 • edited

Checklist

Mandatory Debugging Information

Optional Debugging Information

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

Environment & Settings

Steps to Reproduce

Required Dependencies

Python Packages

Other Dependencies

Minimally Reproducible Test Case

Expected Behavior

Actual Behavior

auvipy commented Nov 12, 2023

dxdc commented Nov 12, 2023

auvipy commented Nov 12, 2023

trianglesis commented Dec 6, 2023 • edited

twacn commented Feb 26, 2024

dxdc commented Feb 26, 2024

dxdc commented Nov 11, 2023 •

edited

trianglesis commented Dec 6, 2023 •

edited