Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy Bug: Over-collection in Opt-in Performance Statistics #161

Open
David-Fryd opened this issue Apr 15, 2024 · 3 comments
Open

Privacy Bug: Over-collection in Opt-in Performance Statistics #161

David-Fryd opened this issue Apr 15, 2024 · 3 comments

Comments

@David-Fryd
Copy link

Performance analytics are collected and written to mcaptcha_pow_analytics, even if the user has not opted in, or if they have opted out after previously opting in.

The handling of every verify_pow request results in the generation of a CreatePerformanceAnalytics and subsequent call to analysis_save, where the information is collected regardless of the opt-in status of the sitekey.

@David-Fryd David-Fryd changed the title Privacy Over-collection Bug: Opt-in Performance Statistics Privacy Bug: Over-collection in Opt-in Performance Statistics Apr 16, 2024
@realaravinth
Copy link
Member

Hello 👋

This is not a bug, this is the intended behavior. The checkbox controls weather the performance data collected should be published to the larger mCaptcha net.

PoW in CAPTCHAs is new, it is hard to determine the correct range of difficulty factors that offer effective protection while also being accessible to a wide range of devices. This data tells how each difficulty factor performs.

Also, right now, we are describing CAPTCHAs in terms of difficulty factors, which is not very intuitive. I think ifwe start describing them in terms of seconds (as in time), then not only will the protection be more effective but also significantly improve UX. To switch to a seconds model, we'll have to know how PoW performs. We have a cronjob that takes the average of all recorded times for each difficulty factor and offer instance local statistics. This stuff isn't shared with other mCaptcha instances when the checkbox is enabled though.

These are the reasons why I've enabled performance statistics collection by default. I'm open for discussion on this matter :)

@David-Fryd
Copy link
Author

David-Fryd commented Apr 17, 2024

Hello! 👋

Thank you for the prompt response! :)

Why is data deleted from mcaptcha_pow_analytics when users revoke their consent for publishing data? This seems to imply that the collection of user data is somehow related to the opt-in behavior. From the system you described, it would seem as if you would only want to remove the key's pseudo_id, and not the data stored in mcaptcha_pow_analytics.

As a side note- I wasn't able to find the cronjob that you referenced- is that in the repo somewhere?

@realaravinth
Copy link
Member

Why is data deleted from mcaptcha_pow_analytics when users revoke their consent for publishing data? This seems to imply that the collection of user data is somehow related to the opt-in behavior. From the system you described, it would seem as if you would only want to remove the key's pseudo_id, and not the data stored in mcaptcha_pow_analytics.

Oh. Good catch, that is a bug. There's also another bug in the same feature, that I just noticed: we are not calculating PoW difficulty factor based on new parameters to update the CAPTCHA configuration. The infra is there, but I seem to have missed it out in the update subroutine. Will send a patch by the end of the week.

The cronjob that analyzes the performance stats. The codebase needs some clean, it shouldn't be this difficult to find stuff 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants