Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change image comparison to PSNR #1051

Open
wants to merge 1,383 commits into
base: master
Choose a base branch
from
Open

Change image comparison to PSNR #1051

wants to merge 1,383 commits into from

Conversation

Meakk
Copy link
Contributor

@Meakk Meakk commented Oct 13, 2023

No description provided.

@Meakk Meakk self-assigned this Oct 13, 2023
@github-actions
Copy link

You are modifying libf3d public API! ⚠️Please update bindings accordingly⚠️!
You can find them in their respective directories: python, java, webassembly.

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #1051 (3ceadb2) into master (07d6c6e) will decrease coverage by 0.03%.
Report is 2 commits behind head on master.
The diff coverage is 98.48%.

@@            Coverage Diff             @@
##           master    #1051      +/-   ##
==========================================
- Coverage   96.22%   96.19%   -0.03%     
==========================================
  Files         120      120              
  Lines        7015     7069      +54     
==========================================
+ Hits         6750     6800      +50     
- Misses        265      269       +4     
Files Coverage Δ
application/F3DStarter.cxx 97.83% <100.00%> (ø)
...VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.cxx 99.00% <100.00%> (+0.03%) ⬆️
...y/VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.h 100.00% <100.00%> (ø)
library/VTKExtensions/Rendering/vtkF3DRenderer.cxx 99.86% <100.00%> (+<0.01%) ⬆️
library/src/image.cxx 96.29% <97.95%> (-1.61%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mwestphal
Copy link
Contributor

Im a bit confused as to why you could remove hight thresold for vtk comparison ? is PSNR less sensible than what VTK is doing ?

Looking at the code, PSNR seems a much simpler algorithm, maybe it is better but how does it change our testing, what makes this an improvement ?

@mwestphal
Copy link
Contributor

If Im not mistaken VTK is implementing a form of SSIM. Why do you thing PNSR is a better metric ?

@Meakk
Copy link
Contributor Author

Meakk commented Oct 14, 2023

Honestly, VTK's algorithm is a bit confusing, I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.
Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.
PSNR is quite standard (SSIM too I agree), simple to implement (I've check my implementation returns the same value than image magick), so it is returning a meaningful value for the user.
Moreover, I think the API is cleaner by splitting the metric computation and the difference image in two distinct APIs.
It happened that on our test base, a large majority of tests return an expected value (above 30, often above 50), and only some of them need a smaller threshold.

From this paper: https://projet.liris.cnrs.fr/imagine/pub/proceedings/ICPR-2010/data/4109c366.pdf

The study has revealed that the PSNR is more sensitive to additive Gaussian noise
than the SSIM, while the opposite is observed for jpeg compression.

That explains why I had to lower the threshold for raytracing tests since I think raytracing introduced noise similar to gaussian noise.

In the end, I think it would be great to have different metrics, and choose it in the tests (use SSIM for raytracing maybe?) but I dislike VTK's metric.

@mwestphal
Copy link
Contributor

VTK's algorithm is a bit confusing,

Agreed

I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.

It may be related in the sense that it looks for similar pixel in a square of pixels, but it is definitely not a specific SSIM algorithm we can pin point.Keep in mind it was written probably before these concepts were fully created.

Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.

It was when I broke it, but I fixed it since then.

I dislike VTK's metric.

Me too, and I also agree that having a standard metric would be much beneficial.

I also read that PSNR is specialized to evalauted the quality of lossy compression, which is not at all what we are working with.

At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive.
In that objective, could you show image difference that required you to set a lower PSNR thresh ? Also Some image were very different and required a high VTK threshold, but it doesnt seem needed anymore, could you share such images that are not flagged as different by PSNR ?

@mwestphal
Copy link
Contributor

mwestphal commented Oct 15, 2023

BTW Yohann is working on implementing proper SSIM in VTK

https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10508

I tagged you on it but I should have discussed directly before you started on this I suppose

@Meakk
Copy link
Contributor Author

Meakk commented Oct 15, 2023

At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive.

I agree, let me write a script to generate a table of generated+ref+diff+psnr for each test in order to evaluate PSNR properly.

@mwestphal
Copy link
Contributor

@Meakk with SSIM being added to VTK, I'm not sure we want to move forward on this one.

@Meakk
Copy link
Contributor Author

Meakk commented Nov 11, 2023

Ok, let me know when everything is in place in VTK and we can discuss about the next steps.
We can keep this PR opened for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants