Skip to content

shahruk10/go-sctk

Repository files navigation

Go Wrapper for SCTK

CI

  • SCTK is a toolkit made by NIST that can be used for evaluating the output of automatic speech recognition systems (ASR). It can be used to:

    • Calculate Word Error Rate (WER) and Character Error Rate (CER)

    • Analyze different types of errors made by ASR systems: Substitutions, Insertions and Deletions.

    • Generate alignments between multiple sets of transcripts (reference and hypotheses)

    • Use statistical tests to evaluate the significance in performance delta between ASR systems.

  • This repository offers a single binary with a command line interface (CLI) that wraps around the all different tools in SCTK; the CLI has a simple and easy to use interface.

  • This repo is a work-in-progress.


Usage Examples

Evaluating WER

  • This example evaluates the word error rate (WER) between reference transcripts and hypothesis transcript generated by a ASR system. The example uses Bengali text but SCTK supports most languages since it expects text with UTF-8 encoding.
# Creating dummy reference transcript file in CSV format.
cat << EOF > reference.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ষিক দশ লক্ষ ইউরো।
spk02-utt02,খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল।
EOF

# Creating dummy hypothesis transcript from an ASR system, in CSV format.
cat << EOF > hypothesis.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ দশ লক ইউর।
spk02-utt02,খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল।
EOF

# Getting the sctk CLI tool from this repository and giving it executable permissions.
version=v0.3.0
wget -O sctk https://github.com/shahruk10/go-sctk/releases/download/${version}/sctk
chmod +x sctk

# Using sctk CLI to evaluate WER and check errors.
#
# Setting `--ignore-first=true` to ignore header row.
# Check `sctk score --help` for documentation of each argument.
#
# To compare characters instead of words, and calculate the
# character error rate (CER) instead of WER, set --cert=true.
./sctk score \
  --ignore-first=true \
  --delimiter="," \
  --col-id=0 \
  --col-trn=1 \
  --normalize-unicode=true \
  --cer=false \
  --out=./report \
  --ref=reference.csv \
  --hyp=hypothesis.csv
  • Now we can check generated reports in the ./report directory.
  report/
  ├── hyp1.trn
  ├── hyp1.trn.dtl
  ├── hyp1.trn.raw
  ├── hyp1.trn.sgml
  ├── hyp1.trn.sys
  ├── hyp1.trn.pra.html
  ├── hyp1.trn.pra.md
  ├── hyp1.trn.pra.csv
  ├── hyp1.trn.pra.json
  ├── hyp1.trn.pra
  └── ref.trn
  • The *.sys file contains a table showing a breakdown of the different types of errors.

    • The results are aggregated for each speaker; Corr, Sub, Del and Ins stands for the percentage of words (characters in case of CER) that were correctly decoded, substituted, deleted and inserted in the hypothesis respectively.
                       SYSTEM SUMMARY PERCENTAGES by SPEAKER                      
    
         ,----------------------------------------------------------------.
         |                              hyp1                              |
         |----------------------------------------------------------------|
         | SPKR   | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
         |--------+-------------+-----------------------------------------|
         | spk01  |    1      6 | 50.0   50.0    0.0    0.0   50.0  100.0 |
         |--------+-------------+-----------------------------------------|
         | spk02  |    1      6 | 50.0   50.0    0.0    0.0   50.0  100.0 |
         |================================================================|
         | Sum/Avg|    2     12 | 50.0   50.0    0.0    0.0   50.0  100.0 |
         |================================================================|
         |  Mean  |  1.0    6.0 | 50.0   50.0    0.0    0.0   50.0  100.0 |
         |  S.D.  |  0.0    0.0 |  0.0    0.0    0.0    0.0    0.0    0.0 |
         | Median |  1.0    6.0 | 50.0   50.0    0.0    0.0   50.0  100.0 |
         `----------------------------------------------------------------'
    
  • The *.pra.md and *.pra.html file shows alignments between the reference and hypothesis text in markdown and html format respectively. These alignment files make it easy to see errors in context. In the table below, taken from hyp1.trn.pra.md, S indicates substitutions. D and I would represent deletions and insertions respectively.

    REF খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল।
    HYP1 খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল।
    EVAL S S S
  • These alignments are also available in json format in the *.pra.json file, which can be easily loaded into different programs and used for analysis or combining different ASR results.

  • Further more, multiple ASR systems can be evaluated together by providing more than one hypothesis with additional uses of the --hyp flag when using the sctk CLI.

  • The *.dtl file shows further details of each type of error. This can reveal systematic errors and patterns in how the ASR system is transcribing the audio. When evaluating CER, this file will show character level information, instead of word level.

    ... (other useful stuff)
    
    CONFUSION PAIRS                  Total                 (6)
                                   With >=  1 occurrences (6)
     1:    1  ->  ইউরো। ==> ইউর।
     2:    1  ->  খেলাটি ==> খেলা
     3:    1  ->  চার ==> ছার
     4:    1  ->  বার্ষিক ==> বার্
     5:    1  ->  লক্ষ ==> লক
     6:    1  ->  সিরিজের ==> শিরিজের
       -------                                                                                          
           6  
    
    ... (other useful stuff)
    

License

Apache License 2.0

About

Go CLI wrapper around SCTK binaries for word error rate evaluation and error analysis for ASR systems.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published