Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other diarization metrics #3

Open
1 of 6 tasks
desh2608 opened this issue Mar 9, 2021 · 5 comments
Open
1 of 6 tasks

Other diarization metrics #3

desh2608 opened this issue Mar 9, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@desh2608
Copy link
Owner

desh2608 commented Mar 9, 2021

Following metrics (from pyannote and dscore) may be implemented:

  • Diarization error rate (DER)
  • Jaccard error rate (JER)
  • Purity and coverage
  • Bcubed precision/recall
  • Goodman-Kruskal Tau
  • Mutual information
@desh2608 desh2608 added the enhancement New feature or request label Mar 9, 2021
@nryant
Copy link

nryant commented Apr 5, 2023

Are JER/clustering metrics still of interest? I'd be up for adding them if I know the PRs would get accepted.

@desh2608
Copy link
Owner Author

desh2608 commented Apr 5, 2023

Hi Neville! Yeah, that would be awesome. JER is top-most on the list, but I can imagine people would be interested in other metrics as well.

(@popcornell and I want to switch from dscore to spyder in CHiME-7 DASR, but it is blocked by JER not being implemented yet.)

@nryant
Copy link

nryant commented Apr 5, 2023

Ok, I can add this to the TODO list. I'm in the process of rewriting dscore to eliminate the md-eval dependency and output more detailed reporting. The initial version is based on pyannote.metrics, but between the penalty of Python being an interpreted language and the repeated calls to uemify, it's not particularly quick. So, it's in my interest to get faster implementations of the various metrics and I'd rather contribute to an existing project if possible.

@desh2608
Copy link
Owner Author

desh2608 commented Apr 6, 2023

Cool! Your contributions would be very welcome. In my benchmarking, I found pyannote.metrics to be an order of magnitude slower than md-eval.pl --- pyannote is a great tool overall, just not suitable for DER evaluation :)

I'm sure spyder would benefit immensely from your expertise. Please use this thread for any questions/discussions once you get around to implementing the metrics.

@nryant
Copy link

nryant commented Apr 6, 2023

That sounds about right. When I benchmarked on the DIHARD III eval (full) condition, just the DER computation (omitting IO and building the Annotation/Timeline instances in memory) averaged over 13 seconds; cf. to 3.5 seconds for running md-eval. Most of this comes from the call to IdentificationErrorRate.uemify that constructs the equivalent of your get_eval_regions. Specifically, this block, which accounts for 10 seconds of that run time.

I've been updating dscore off-and-on for the past week for an LDC internal project and want to finish that work first, but will look into implementing JER in spy-der after. I think it should be relatively straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants