Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does mdeval computes DER taking overlapped speech into account? #62

Open
AntoineBlanot opened this issue Feb 6, 2024 · 9 comments
Open

Comments

@AntoineBlanot
Copy link

AntoineBlanot commented Feb 6, 2024

Thank you very much for sharing this repository. It is very useful to have a single repo with many audio metrics :)

Tools like pyannote allows us to choose the collar and if we want to compute DER on overlapped speech regions or not.

With mdeval, we can specify the collar but it seems like there is no option for including overlapped speech in the metric or not.
Does that mean that by default it computes over overlapped regions? Or are they excluded for the calculations?

Thank you for your answer !

@thequilo
Copy link
Member

thequilo commented Feb 6, 2024

Hi @AntoineBlanot! I had to check that first. md-eval-22.pl has an option -o

-o to include overlapping speech in MD evaluation. With this option, separate recognition passes are made for each reference speaker.

This option is, however, currently not set by our wrapper.

Even if the option is not set, I found the following things:

  • md_eval ignores self-overlap and treats these regions as if the speaker was active continuously. It does warn about self-overlap though.
meeteval.der.md_eval_22(
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 7, 'session_id': 'X', 'words': ''},
        {'speaker': 'A', 'start_time': 4, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
    meeteval.io.asseglst([{'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''}]),
)
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# WARNING:  speaker A speaking more than once at time 4
# {'X': DiaErrorRate(error_rate=Decimal('0.00'), scored_speaker_time=Decimal('10.000000'), missed_speaker_time=Decimal('0.000000'), falarm_speaker_time=Decimal('0.000000'), speaker_error_time=Decimal('0.000000'))}
  • md_eval seems to compute DER for overlapping regions, even if -o is not set (scored_spaker_time is 20).
meeteval.der.md_eval_22(
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
        {'speaker': 'B', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
    meeteval.io.asseglst([
        {'speaker': 'A', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
        {'speaker': 'B', 'start_time': 0, 'end_time': 10, 'session_id': 'X', 'words': ''},
    ]),
)
# {'X': DiaErrorRate(error_rate=Decimal('0.00'), scored_speaker_time=Decimal('20.000000'), missed_speaker_time=Decimal('0.000000'), falarm_speaker_time=Decimal('0.000000'), speaker_error_time=Decimal('0.000000'))}

@boeddeker Do you know anything about the -o option?

@thequilo
Copy link
Member

thequilo commented Feb 6, 2024

We looked through md-eval-22.pl and found that -o is ignored unless -w (word-mediated alignment) is set, which we currently do not support. md-eval-22.pl evaluates overlap by default. This can be deactivated with -1, ignoring overlapping regions.

MeetEval does currently not set -1, so overlap is always evaluated. Do you need an option to deactivate scoring in overlapped regions?

@AntoineBlanot
Copy link
Author

@thequilo Thank you very much for your responses !

Your comments were very clear, thank you for your insights!

Being able to deactivate scoring would help yes, as it can indicate if a model is good on non-overlapped regions or not.
If this is something that can be implemented, I think that it would be very nice! :)

@thequilo
Copy link
Member

thequilo commented Feb 7, 2024

Sure, it's just an option that has to be passed to md-eval. The naming of such an option is not that easy though. md-eval uses -1, dscore uses ignore_overlap, pyannote skip_overlap, and spyder uses -r, --regions [all|single|overlap|nonoverlap].

I currently prefer --skip-overlap or --ignore-overlap

@boeddeker
Copy link
Member

I think we have two options (In the future, we may add more DER backends, e.g. pyannote and/or spyder):

  • Use the native options
  • Use an option that will most likely work with all backends that we introduce in the future

I am against --skip-overlap and --ignore-overlap because they don't work with spyder, and they aren't in the md-eval help.
Given that md-eval has plenty of options, and it is not clear, which should be supported in the future, I have a small preference for native names, i.e. keep the name from the tool and don't rename something.

@thequilo
Copy link
Member

thequilo commented Feb 7, 2024

I would agree with you if the native name wasn't -1, which is pretty unclear if you don't already know what the option is doing (The first time I saw it, I thought: Why does the script take a negative integer number as input and what does it do? I then looked into the signature, which only mentions files as arguments, and was even more confused). I prefer a more verbose name that people understand without reading the docs first.

If you are against --skip-overlap and --ignore-overlap, I'd prefer spyder's approach with -r [all|nooverlap] where we can add options when we wrap other backends.

@boeddeker
Copy link
Member

There is always a trade-off between keeping the original name and introducing a new name.
While a new name could be cleaner, it can introduce confusion (e.g. see all those ideological changes in pytorch, where it differs from numpy, where at the end you have to know both).

While the -1 is on the first sight not obvious, once you know it, it is clear.
When we provide a wrapper, we have to think about, which options we want to simply forward to the user and which we want to sync between all different implementations.

IMHO, the -1 is a special option and not worth to be a standard option, that we want to sync (@TCord mentioned some cases, where you have to be careful with that option).
While it is IMHO not worth to be a standard option, it is useful for a user to set this flag.
That is the reason why I would forward this option unmodified.

If we rename that option, yes, the long form of spyder is probably the best (short form -r is already blocked).

@TCord
Copy link
Member

TCord commented Feb 9, 2024

I also think that keeping the -1 option from md-eval is not necessary. With md-eval, the dscore python wrapper, pyannote and spyder there are already at least four different option names out there.
Simply using a verbose option name (e.g. ignore_overlap) should be the best solution, here. nooverlap could be misleading, however, since it could also mean it enforces that there is no overlap.

@thequilo
Copy link
Member

thequilo commented Feb 9, 2024

We could also define a long and short form for the option, like --ignore-overlap, -1, so that we have the more verbose variant and the option from md_eval. I think I could live with that.

Some tools also have the following interface (not for this particular option), but I feel like they pollute the interface namespace:

  • --evaluate-overlap evaluate overlap regions (default)
  • --no-evaluate-overlap don't evaluate overlap regions (mutually exclusive with --evaluate-overlap)

Or one of these?

  • --evaluate-overlap [true|false] with the default true.
  • --exclude-overlap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants