Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query with distinct gets a unexpected result #51109

Closed
wjhuang2016 opened this issue Feb 18, 2024 · 9 comments · Fixed by #53129
Closed

query with distinct gets a unexpected result #51109

wjhuang2016 opened this issue Feb 18, 2024 · 9 comments · Fixed by #53129

Comments

@wjhuang2016
Copy link
Member

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

CREATE TABLE `t6bc5e086` (
  `col_70` bigint(20) unsigned DEFAULT '9223372036854775808',
  `col_71` tinyint(4) NOT NULL,
  `col_72` float DEFAULT NULL,
  `col_73` time NOT NULL,
  UNIQUE KEY `idx_20` (`col_71`,`col_70`,`col_72`),
  KEY `idx_21` (`col_71`,`col_70`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

INSERT INTO `t6bc5e086` VALUES(15242073906335789640,-102,9263.75,'21:28:26'),(15242073906335789640,63,8911.2295,'18:55:18'),(5650879118438068915,18,2417.9355,'03:07:39'),(2391437967716741992,72,1268.0848,'12:00:54'),(16842291581975134164,23,5269.977,'07:01:46'),(15242073906335789640,29,8801.557,'12:36:50'),(15242073906335789640,-52,7716.05,'13:01:15'),(15507298043385957721,49,927.7105,'10:29:00'),(15242073906335789640,26,6904.7275,'12:48:42'),(15242073906335789640,2,4869.162,'08:53:37'),(12140733497004968952,-73,1217.2577,'00:59:11'),(15242073906335789640,-50,9006.439,'01:28:43'),(0,84,6401.6997,'09:41:30'),(968832891468028983,-127,4482.356,'22:01:54'),(15242073906335789640,-8,3361.4575,'09:59:04'),(6536341120790540942,30,5149.999,'09:27:10'),(8216174125157395407,57,5763.5903,'17:59:59'),(15242073906335789640,-3,4413.678,'01:17:10'),(7129924612827952577,22,787.43146,'22:53:42'),(15242073906335789640,-72,8438.392,'09:51:47'),(12016244632742842738,10,1073.6714,'10:33:50'),(15242073906335789640,-7,2284.8687,'17:46:00'),(15242073906335789640,-84,8953.24,'13:37:23'),(1,-67,3319.354,'03:18:39'),(2611515382005583796,-103,2514.688,'19:36:58'),(15242073906335789640,-123,8663.264,'10:44:00'),(2,98,712.35175,'08:09:22'),(6520724307455039932,-67,7650.7134,'05:59:48'),(1028328711778178426,-109,NULL,'11:33:01'),(5030147228288485821,85,504.75803,'17:12:14'),(15787361145984624429,31,5070.2656,'23:48:56'),(17919038050022880520,2,NULL,'12:09:06'),(17021112764584971712,29,8771.466,'08:58:35'),(3874752878014189615,15,2199.217,'12:29:25'),(3065949335007035965,119,1722.856,'22:00:39'),(6485362317771755205,22,2851.3997,'05:16:32'),(16388042458896678883,58,1653.477,'00:17:59'),(4727172558070321496,94,4474.7812,'02:31:36'),(NULL,40,1003.89417,'23:12:56'),(5285172573024762477,-46,2761.6448,'17:34:17'),(17199550678785079218,0,2003.1162,'09:56:10'),(7422276380179817095,25,454.63428,'22:23:56');
(SELECT DISTINCT SUBSTR(`t6bc5e086`.`col_72`, 9) AS `r0`,`t6bc5e086`.`col_71` AS `r1` FROM `t6bc5e086`) except (SELECT SUBSTR(`t6bc5e086`.`col_72`, 9) AS `r0`,`t6bc5e086`.`col_71` AS `r1` FROM `t6bc5e086`) ;

2. What did you expect to see? (Required)

Empty result

3. What did you see instead (Required)

mysql> (SELECT DISTINCT SUBSTR(`t6bc5e086`.`col_72`, 9) AS `r0`,`t6bc5e086`.`col_71` AS `r1` FROM `t6bc5e086`) except (SELECT SUBSTR(`t6bc5e086`.`col_72`, 9) AS `r0`,`t6bc5e086`.`col_71` AS `r1` FROM `t6bc5e086`) ;
+------+----+
| r0   | r1 |
+------+----+
| 3    | 94 |
+------+----+
1 row in set (0.00 sec)

4. What is your TiDB version? (Required)

4e41699

@windtalker
Copy link
Contributor

The root case is in TiKV cast(float as string) returns wrong result


mysql>  select cast(col_72 as char(10)),col_72 from t6bc5e086 where col_71=94;
+--------------------------+-----------+
| cast(col_72 as char(10)) | col_72    |
+--------------------------+-----------+
| 4474.7812                | 4474.7812 |
+--------------------------+-----------+
1 row in set (0.00 sec)

mysql> set tidb_opt_projection_push_down=1;
Query OK, 0 rows affected (0.00 sec)

mysql>  select cast(col_72 as char(10)),col_72 from t6bc5e086 where col_71=94;
+--------------------------+-----------+
| cast(col_72 as char(10)) | col_72    |
+--------------------------+-----------+
| 4474.7813                | 4474.7812 |
+--------------------------+-----------+
1 row in set (0.00 sec)

But since float itself is not an accurate type, I think the severity can be changed to major? @jebter

@kennedy8312
Copy link

The same testcase returns empty in mysql v8.3.0.

6) except (SELECT SUBSTR(t6bc5e086.col_72, 9) AS r0,t6bc5e086.col_71ASr1FROMt6bc5e086`) ;
Empty set (0.00 sec)
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.3.0 |
+-----------+

@solotzg
Copy link
Contributor

solotzg commented May 7, 2024

/assign @solotzg

@AilinKid
Copy link
Contributor

AilinKid commented May 7, 2024

package main

import (
	"fmt"
)

func main() {
	var a float32 = 4474.7812
	fmt.Printf("%b\n", a)
	fmt.Printf("%e\n", a)
	fmt.Printf("%f\n", a)
	fmt.Printf("%g\n", a)
	fmt.Printf("%.15f\n", a)
	var b float64 = 4474.7812
	fmt.Println("***************")
	fmt.Printf("%b\n", b)
	fmt.Printf("%e\n", b)
	fmt.Printf("%f\n", b)
	fmt.Printf("%g\n", b)
	fmt.Printf("%.15f\n", b)
	fmt.Println("***************")
	var c float64 = 4474.7815
	fmt.Printf("%b\n", c)
	fmt.Printf("%e\n", c)
	fmt.Printf("%f\n", c)
	fmt.Printf("%g\n", c)
	fmt.Printf("%.15f\n", c)
}

9164352p-11
4.474781e+03
4474.781250
4474.7812
4474.781250000000000
***************
4920073961153443p-40
4.474781e+03
4474.781200
4474.7812
4474.781200000000354
***************
4920074291006931p-40
4.474782e+03
4474.781500
4474.7815
4474.781500000000051

// The default precision for %e and %f is 6

if you convert float32(b) to float32, and then get float64 for comparison of original one later. you will get 4474.78125
as you see, the nearest IEEE754 double value of 4474.7812 is 4474.781200000000354, and its explicit cast
as float32 will make the truncated representation as 4474.7812, next when you try to get float64 from that, you
will get the nearest value 4474.781250000000000 in bits representation, of which the value of GetFloat64: 4474.78125 is not equivalent to the original one 4474.7812 gotten from the 4474.781200000000354.GetFloat64.

So value float64 -> float32 is not always the same and safe as we thought unless you deliberately compare it with the original field type: float32. That means you need to do the comparison based on: 4474.781250000000000.GetFloat32 And 4474.781200000000354.GetFloat32, where you will get 4474.7812 and 4474.7812.
From

ret.SetFloat32(float32(f))

downcast is dangerous or not is up to what you do next, in ranger builder, you may ganna compare it with the original thing to decide range exclusion or not even based on float64. That's definitely a dangerous thing.

@solotzg
Copy link
Contributor

solotzg commented May 8, 2024

The default behaviors of casting real type to string are different between tidb and tikv.
TODO: implement unified behavior.

@yibin87
Copy link
Contributor

yibin87 commented May 14, 2024

/remove-severity major

@yibin87
Copy link
Contributor

yibin87 commented May 14, 2024

/severity minor

@solotzg
Copy link
Contributor

solotzg commented May 15, 2024

Bug fix #53129 has been reverted because tidb-tools (such as lightning) rely on crc function of tidb. crc module needs to cast basic type into string at first, which means all behavior about casting into str cannot be modified in order to keep backward compatibility.

@solotzg
Copy link
Contributor

solotzg commented May 15, 2024

For float type, when more than 6 significant digits, the precision cannot be guaranteed in ieee rules. This issue is able to be regarded as corner case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants