Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add expected errors in MariaDB #834

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ZhengLin-Li
Copy link
Contributor

fix: add expected errors in MariaDB

@bajinsheng
Copy link
Collaborator

It seems SQLancer triggers another issue in MariaDB?

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 21, 2023

I reproduced the queries on Ubuntu 22.04 and found the following:

image

The statements I run are:

DROP DATABASE IF EXISTS database11;
CREATE DATABASE database11;
USE database11;
CREATE OR REPLACE TABLE t0(c0 MEDIUMINT  UNIQUE, c1 VARCHAR(100) , c2 REAL  UNIQUE PRIMARY KEY);
CREATE OR REPLACE TABLE t1(c0 VARCHAR(100)  NOT NULL) engine=Aria;
CREATE OR REPLACE TABLE t2 LIKE t1;
SET SESSION autocommit = 1;
TRUNCATE t0  NOWAIT;
OPTIMIZE TABLE t0 NOWAIT;

Question 1: It seams that table t0 does not support optimize. How do we deal with this?
Question 2: I can suucessfully run TRUNCATE t0 NOWAIT; on my Ubuntu 22.04 but it caused an error in CI, what's the reason?

@bajinsheng
Copy link
Collaborator

1: I think it can be solved by checking whether the table uses the InnoDB storage engine. If it is difficult, we can simply ignore it as it is not a significant issue.
2. Perhaps you can try to run the full log, like database0-cur.log? I guess some statements that are ignored in the crash log, may incur this issue.

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 21, 2023

1: I think it can be solved by checking whether the table uses the InnoDB storage engine. If it is difficult, we can simply ignore it as it is not a significant issue. 2. Perhaps you can try to run the full log, like database0-cur.log? I guess some statements that are ignored in the crash log, may incur this issue.

  1. Yes, it uses InnoDB engine. But how which engine will affact optimize statement?
  2. Below is the database11.log. In table t0, we have only one boolean column, but we insert a value of Integer? It does not meet expectation.
--java.lang.AssertionError: INSERT INTO t0 VALUES (-1497191741);
--	at sqlancer.common.query.SQLQueryAdapter.checkException(SQLQueryAdapter.java:111)
--	at sqlancer.common.query.SQLQueryAdapter.execute(SQLQueryAdapter.java:93)
--	at sqlancer.Main$QueryManager.execute(Main.java:265)
--	at sqlancer.GlobalState.executeStatement(GlobalState.java:108)
--	at sqlancer.mariadb.MariaDBProvider.generateDatabase(MariaDBProvider.java:151)
--	at sqlancer.mariadb.MariaDBProvider.generateDatabase(MariaDBProvider.java:1)
--	at sqlancer.ProviderAdapter.generateAndTestDatabase(ProviderAdapter.java:52)
--	at sqlancer.Main$DBMSExecutor.run(Main.java:364)
--	at sqlancer.Main$2.run(Main.java:559)
--	at sqlancer.Main$2.runThread(Main.java:541)
--	at sqlancer.Main$2.run(Main.java:532)
--	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
--	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
--	at java.base/java.lang.Thread.run(Thread.java:829)
--Caused by: java.sql.SQLSyntaxErrorException: (conn=570) Table 'database11.t0' doesn't exist
--	at org.mariadb.jdbc.export.ExceptionFactory.createException(ExceptionFactory.java:282)
--	at org.mariadb.jdbc.export.ExceptionFactory.create(ExceptionFactory.java:370)
--	at org.mariadb.jdbc.message.ClientMessage.readPacket(ClientMessage.java:134)
--	at org.mariadb.jdbc.client.impl.StandardClient.readPacket(StandardClient.java:855)
--	at org.mariadb.jdbc.client.impl.StandardClient.readResults(StandardClient.java:794)
--	at org.mariadb.jdbc.client.impl.StandardClient.readResponse(StandardClient.java:713)
--	at org.mariadb.jdbc.client.impl.StandardClient.execute(StandardClient.java:637)
--	at org.mariadb.jdbc.Statement.executeInternal(Statement.java:941)
--	at org.mariadb.jdbc.Statement.execute(Statement.java:1067)
--	at org.mariadb.jdbc.Statement.execute(Statement.java:458)
--	at sqlancer.common.query.SQLQueryAdapter.execute(SQLQueryAdapter.java:87)
--	... 12 more
---- Time: 2023/06/21 12:30:12
-- Database: database11
-- Database version: 10.3.38-MariaDB-0ubuntu0.20.04.1
-- seed value: 11
DROP DATABASE IF EXISTS database11;
CREATE DATABASE database11;
USE database11;
CREATE OR REPLACE TABLE t0(c0 BOOLEAN  UNIQUE);
SET SESSION myisam_stats_method = nulls_unequal;
SET SESSION join_cache_level = 5;
INSERT INTO t0 VALUES (299405238);
CHECKSUM TABLE t0;
INSERT INTO t0 VALUES (-486923179);
INSERT INTO t0 VALUES (-471262417);
SET SESSION optimizer_search_depth = 4;
INSERT INTO t0 VALUES (618607252);
INSERT INTO t0 VALUES (-433413318);
INSERT INTO t0 VALUES (266645617);
SET SESSION max_length_for_sort_data = 3932438;
SET SESSION sql_auto_is_null = ON;
INSERT INTO t0 VALUES (916765166);
SET SESSION sql_log_off = ON;
SET SESSION max_sort_length = 6112935;
INSERT INTO t0 VALUES (550488471);
SET SESSION optimizer_prune_level = 0;
INSERT INTO t0 VALUES (499498418);
INSERT INTO t0 VALUES (1540761514);
SET SESSION autocommit = 1;
SET SESSION unique_checks = ON;
SET SESSION join_cache_level = 5;
INSERT INTO t0 VALUES (1476203969);
INSERT INTO t0 VALUES (-2079398998);
SET SESSION join_cache_level = 1;
INSERT INTO t0 VALUES (-839316519);
TRUNCATE t0  WAIT 1;
SET SESSION sql_buffer_result = OFF;
INSERT INTO t0 VALUES (1712230220);
SET SESSION sql_quote_show_create = OFF;
INSERT INTO t0 VALUES (1984650463);
INSERT INTO t0 VALUES (NULL);

@bajinsheng
Copy link
Collaborator

1 I guess only innodb engines can not use optimization, so perhaps you can use storage engine to identify it.
2 Is it the full log? Sometimes the column data types are weak, and accept different data types. Or the staement is rejected by database and sqlancer does not show the error information.

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 22, 2023

It seams that the error message in CI is different from before (after force push), now the only CI error message is:
AssertionError: Lock wait timeout exceeded; try restarting transaction

How will we deal with it? Still add to expected errors?

@bajinsheng
Copy link
Collaborator

Seems it is a flaky issue that cannot be reproduced stably.
Can you look into the root reason of this issue? It's definitely ideal to avoid this error message. If it is impossible or too hard, we can consider adding the error message to the expected error class.

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 22, 2023

I tested it on ubuntu 22.04 and it result in build success FOR TWO TIMES.

First run:
image
Second run:
image

I will test for serveral times and see what's the result.

@ZhengLin-Li
Copy link
Contributor Author

@bajinsheng could you return this workflow accroding to https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs. I can not see the re-run bottom.

@bajinsheng
Copy link
Collaborator

Done.

@ZhengLin-Li
Copy link
Contributor Author

I can not reproduce the error besides GitHub Action.

Acutally I found there are too many errors in the SQL queries we excuted. For example, we intern a null value to a NOT NULL column. And we optimize a InnoDB engine table.

I think the optimize operation is the cause but can not determine the root cause. How about adding one SHOW ENGINE INNODB STATUS statement before the optimize operation, this will let us see the InnoDB engine status? Accroding to https://stackoverflow.com/a/5837308

@mrigger
Copy link
Contributor

mrigger commented Jun 23, 2023

Acutally I found there are too many errors in the SQL queries we excuted. For example, we intern a null value to a NOT NULL column. And we optimize a InnoDB engine table.

I think we are making a deliberate tradeoff here. Checking for all these different things is quite tedious, and by inserting invalid values, we can also test the DBMSs. Also see https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#expected-errors regarding data types.

@mrigger
Copy link
Contributor

mrigger commented Jun 23, 2023

I tested it on ubuntu 22.04 and it result in build success FOR TWO TIMES.

Did you use the same MariaDB version we use in the CI? I assume you also copied the statements for running the tests?

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 23, 2023

I tested it on ubuntu 22.04 and it result in build success FOR TWO TIMES.

Did you use the same MariaDB version we use in the CI? I assume you also copied the statements for running the tests?

Yes, the same version. Yes, copied the logs from CI.

I tested again and found the reason for these two seccuss #834 (comment) here is that, SQLancer on my local mechine ran 0 queries and 0 was succesful, resulting the build success.

So, the conclusion is that, the problem still exists. It can be reproduced in CI, although it can not be reproduced on my local ubuntu.

I will try to reproduce it on my local ubuntu again.

@mrigger
Copy link
Contributor

mrigger commented Jun 23, 2023

SQLancer on my local mechine ran 0 queries and 0 was succesful, resulting the build success.

It seems like some kind of deadlock is triggered?

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 23, 2023

I tested again in GitHub's Codespace (Ubuntu 22.04) and got 0 queries executed again. I am wondering if there are any errors with my commands. Could anyone help take a look?

The commands I run are, (same as they in ci file):

sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
sudo add-apt-repository 'deb [arch=amd64,arm64,ppc64el] http://sfo1.mirrors.digitalocean.com/mariadb/repo/10.3/ubuntu bionic main'   
sudo apt update
sudo apt install mariadb-server
sudo service mysql start
sudo mysql -uroot -proot -e "CREATE USER 'sqlancer'@'localhost' IDENTIFIED BY 'sqlancer'; GRANT ALL PRIVILEGES ON * . * TO 'sqlancer'@'localhost';"

git clone https://github.com/sqlancer/sqlancer.git
cd sqlancer
MARIADB_AVAILABLE=true mvn -Dtest=TestMariaDB test

@ZhengLin-Li
Copy link
Contributor Author

SQLancer on my local mechine ran 0 queries and 0 was succesful, resulting the build success.

It seems like some kind of deadlock is triggered?

Not sure what's the reason. Could you help take a look at the commands I ran above?

@bajinsheng
Copy link
Collaborator

It looks like a deadlock bug. If so, it is hard to bypass this bug in the CI.

@mrigger
Copy link
Contributor

mrigger commented Jun 24, 2023

According to

"--num-queries", "0", "mariadb" }));
, it seems expected that we don't generate any queries. I remember that queries often crashed MariaDB, so this might be the reason why we disabled it. So perhaps this is an issue with multiple threads and generating databases? @ZhengLin-Li, perhaps you could try to set the number of threads to 1 here, to see if we can avoid the deadlock issue?

@ZhengLin-Li
Copy link
Contributor Author

It looks like a deadlock bug. If so, it is hard to bypass this bug in the CI.

You mean the 0 query bug? I solved it by running setting --num-threads 1. This bug only happens on my Ubuntu, and it did not occur in CI.

According to
 

"--num-queries", "0", "mariadb" }));

 
, it seems expected that we don't generate any queries. I remember that queries often crashed MariaDB, so this might be the reason why we disabled it. So perhaps this is an issue with multiple threads and generating databases? @ZhengLin-Li, perhaps you could try to set the number of threads to 1 here, to see if we can avoid the deadlock issue?

Oh, yes, setting --num-threads 1 solved the issue. I tested it again with the command java -jar sqlancer-*.jar --num-threads 1 mariadb and it can successfully run queries. 

But the problem is that I still cannot reproduce the Lock wait timeout exceeded errors as those in CI.

I analyzed the CI log and found all assertion errors were triggered by the optimize table statement, but all optimize table statements on my Ubuntu work fine.

@mrigger
Copy link
Contributor

mrigger commented Jun 24, 2023

In this case, what about disabling OPTIMIZE for now using the pattern described in https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs?

@ZhengLin-Li
Copy link
Contributor Author

In this case, what about disabling OPTIMIZE for now using the pattern described in https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs?

So in this case, we need to report this issue to the MariaDB team and refer to their issue page?

@bajinsheng
Copy link
Collaborator

If you can produce a minimal test case to reproduce the issue, say executing several SQL statements triggers this deadlock, you can report it to MySQL.

@mrigger
Copy link
Contributor

mrigger commented Jun 26, 2023

In this case, what about disabling OPTIMIZE for now using the pattern described in https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs?

So in this case, we need to report this issue to the MariaDB team and refer to their issue page?

This would be ideal, as suggested by @bajinsheng, but we could also just refer to this PR for now (or create a separate issue in the repo so someone can investigate it in the future).

@ZhengLin-Li
Copy link
Contributor Author

Do not know why it is still reporting errors after we using https://github.com/sqlancer/sqlancer/blob/main/CONTRIBUTING.md#unfixed-bugs

@bajinsheng
Copy link
Collaborator

This error happens in the TRUNCATE statement, so you should add expected errors to https://github.com/sqlancer/sqlancer/blob/main/src/sqlancer/mariadb/gen/MariaDBTruncateGenerator.java
Otherwise, the error can not be caught/


if (MariaDBBugs.bug835) {
errors.add("Specified key was too long; max key length is 2300 bytes");
errors.add("Lock wait timeout exceeded; try restarting transaction");
Copy link
Contributor

@mrigger mrigger Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought rather than setting the deadlock as expected, we could just skip generating the statement(s) causing it. Also, the ("Specified key was too long; max key length is 2300 bytes" statement doesn't seem to be part of the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. avoiding the generation of such statements also make sense. will try to do this
  2. Specified key was too long; max key length is 2300 bytes is a bug here: https://github.com/sqlancer/sqlancer/actions/runs/5357089828/jobs/9717469891
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a new limitation? In this case, we shouldn't consider it as a bug, but just add it as an expected error.

@bajinsheng
Copy link
Collaborator

You disabled the TRUNCATE statements? I meant adding expected errors to the generator of `TRUNCATE1

@mrigger
Copy link
Contributor

mrigger commented Jun 30, 2023

You disabled the TRUNCATE statements? I meant adding expected errors to the generator of `TRUNCATE1

Does this statement result in a deadlock? If so, we might want to avoid generating the statement, since otherwise, SQLancer cannot generate any additional tests. If not, adding an expected error might be better.

@@ -126,6 +129,9 @@ public void generateDatabase(MariaDBGlobalState globalState) throws Exception {
query = MariaDBInsertGenerator.insert(globalState.getSchema(), globalState.getRandomly());
break;
case OPTIMIZE:
if (MariaDBBugs.bug834) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here looks and below quite complex. Can we just set nrPerformed to 0?

@ZhengLin-Li
Copy link
Contributor Author

You disabled the TRUNCATE statements? I meant adding expected errors to the generator of `TRUNCATE1

Does this statement result in a deadlock? If so, we might want to avoid generating the statement, since otherwise, SQLancer cannot generate any additional tests. If not, adding an expected error might be better.

After analsis, I think such a TRUNCATE statement will not result in deadlock, so maybe we can add it to the expected errors.

@ZhengLin-Li
Copy link
Contributor Author

ZhengLin-Li commented Jun 30, 2023

From the CI, it turned out that, presently, we got:

  • 198 java.lang.AssertionError:
    • 46 java.lang.AssertionError: Got error -2 "Internal error < 0 (Not system error)" from storage engine InnoDB
    • 148 java.lang.AssertionError: Operation failed
    • 4 java.lang.AssertionError: Error updating stats for table 't0' after table rebuild: Lock wait timeout

I think we also need add this errors into exected errors?

@mrigger
Copy link
Contributor

mrigger commented Jun 30, 2023

Some of these seem like potential bugs that we should look into and potentially report. I guess this might take a while, so not sure if you might want to focus on StoneDB first?

@bajinsheng
Copy link
Collaborator

I asked @ZhengLin-Li to fix this ERROR in CI so that I can also merge his PR. Since this bug is hard to fix and you can help to merge pr, I think we can lower its priority now.

@ZhengLin-Li
Copy link
Contributor Author

ok, I will just add it to the todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants