Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Result mismatch issues Tracker #4652

Open
8 of 15 tasks
zhouyuan opened this issue Feb 5, 2024 · 28 comments
Open
8 of 15 tasks

[VL] Result mismatch issues Tracker #4652

zhouyuan opened this issue Feb 5, 2024 · 28 comments
Labels
bug Something isn't working triage

Comments

@zhouyuan
Copy link
Contributor

zhouyuan commented Feb 5, 2024

Backend

VL (Velox)

Bug description

There are several data mismatch issues either related with operator or functions. Some of the fixes are landed in Gluten, and some are in Velox repo.
We will use this issue to track the status as these are critical for production envs.

@zhouyuan zhouyuan added bug Something isn't working triage labels Feb 5, 2024
@FelixYBW
Copy link
Contributor

FelixYBW commented Feb 8, 2024

#4678 issue in hashagg

@FelixYBW
Copy link
Contributor

#4587

Currently we disabled all complex data read

@zhouyuan
Copy link
Contributor Author

zhouyuan commented Mar 5, 2024

#4818

@zhouyuan
Copy link
Contributor Author

zhouyuan commented Mar 7, 2024

#4872

@kecookier
Copy link
Contributor

#4891

@kecookier
Copy link
Contributor

#4928

@kecookier
Copy link
Contributor

#4930

@kecookier
Copy link
Contributor

#4947

@FelixYBW
Copy link
Contributor

FelixYBW commented Mar 20, 2024

3 issues we met:

  1. parquet scan + filter pushdown wrongly return "", should return null. Fixed by Fix read parquet for different encodings across row groups facebookincubator/velox#9129
  2. distinct hash agg + spill returned duplicated keys. distinct hash aggregate returned duplicated value if spill happens facebookincubator/velox#9219
  3. max_by function return wrong result

@FelixYBW FelixYBW pinned this issue Mar 20, 2024
@FelixYBW FelixYBW changed the title [VL] Umbrella tracker for data mismatch issues [VL] Result mismatch issues Tracker Mar 20, 2024
@ulysses-you
Copy link
Contributor

  1. distinct hash agg + spill returned duplicated keys.

@FelixYBW Has this issue not been fixed by #4443 ?

@FelixYBW
Copy link
Contributor

@FelixYBW Has this issue not been fixed by #4443 ?

No, it's tested from main branch. A new issue

@FelixYBW
Copy link
Contributor

No, it's tested from main branch. A new issue

facebookincubator/velox#9219

@FelixYBW
Copy link
Contributor

  1. max_by function return wrong result

@yma11 Did you submit a fix to the issue?

@apache apache deleted a comment from rui-mo Mar 22, 2024
@apache apache deleted a comment from rui-mo Mar 22, 2024
@yma11
Copy link
Contributor

yma11 commented Mar 23, 2024

  1. max_by function return wrong result

@yma11 Did you submit a fix to the issue?

Not yet. Only have pushed to golden branch and will submit one in Velox upstream.

@NEUpanning
Copy link
Contributor

#5253

@FelixYBW
Copy link
Contributor

FelixYBW commented Apr 2, 2024

#5253

Looks the issue of get_json_object. @PHILO-HE maybe we need a fully tests of json functions, like the re2.

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Apr 3, 2024

#5253

Looks the issue of get_json_object. @PHILO-HE maybe we need a fully tests of json functions, like the re2.

@FelixYBW, I will do that. Thanks!

@FelixYBW
Copy link
Contributor

FelixYBW commented Apr 3, 2024

#5248

@kecookier
Copy link
Contributor

#5366

@FelixYBW
Copy link
Contributor

#5366

UPdated desc. thank you. do you know which function (cast, avg, round ) caused the issue?

@FelixYBW
Copy link
Contributor

#5372

@yma11
Copy link
Contributor

yma11 commented Apr 28, 2024

  1. max_by function return wrong result

@yma11 Did you submit a fix to the issue?

Not yet. Only have pushed to golden branch and will submit one in Velox upstream.

@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?

@FelixYBW
Copy link
Contributor

@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?

Is it a Gluten issue? I'd think veox has some bug here.

@yma11
Copy link
Contributor

yma11 commented Apr 30, 2024

@FelixYBW This fix should be done at cpp side. The formal fix is in PR. Can you help review it?

Is it a Gluten issue? I'd think veox has some bug here.

Yes. I think so. It's caused by the additional projects we added before/after shuffle. The logic of partial/final handle in Velox upstream has no problem. The ideal way is to add struct support for shuffle in Gluten so that we can remove the hack.

@FelixYBW
Copy link
Contributor

FelixYBW commented May 7, 2024

@PHILO-HE Any update of the issues here?

@zjuwangg
Copy link
Contributor

#5682

@NEUpanning
Copy link
Contributor

#5701

@PHILO-HE
Copy link
Contributor

@PHILO-HE Any update of the issues here?

@FelixYBW, Some were actually fixed. Just updated the list. Will fix or seek help to fix other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

8 participants