PyMETEOR

A Python implementation of METEOR, an automatic metric to evaluate machine translations.

Installation

The easiest way to install pymeteor is from the Test PyPI repository:
pip install --index-url https://test.pypi.org/simple/ pymeteor

This module is dependent on the following packages:

nosetests (for testing only)

To install the dependencies, if not installing pymeteor via pip, use the requirements.txt file:
pip install -r requirements.txt

Alternatively, to run the test cases, nosetests must be used: pip install nose

Usage

The meteor function within the pymeteor package requires reference and candidate strings where each word, or unigram, is separated by whitespace. The function will return the METEOR score as a float.

import pymeteor.pymeteor as pymeteor

reference = 'the cat sat on  the mat'
candidate = 'on  the mat sat the cat'

meteor_score = pymeteor.meteor(reference, candidate)
print(meteor_score)

>>> 0.5

The meteor function also allows more details to be printed while calculating the scores:

meteor_score = pymeteor.meteor(reference, candidate, print_details=True)

>>> Reference: "the cat sat on the mat"
>>> Candidate: "on the mat sat the cat"
>>> Score: 0.5000 = Fmean: 1.0000 * (1 - Penalty: 0.5000)
>>> Fmean: 1.0000 = 10 * Precision: 1.0000 * Recall: 1.0000 / (Recall: 1.0000 + 9 * Precision: 1.0000)
>>> Penalty: 0.5000 = 0.5 * (Fragmentation: 1.0000 ^3)
>>> Fragmentation: 1.0000 = Chunks: 6.0000 / Matches: 6.0000

Formulas

This program implements the formulas mentioned within the algorithm.

Unigram precision P is calculated as:

P = m / w_t

Where m is the number of unigrams in the candidate string that are also found in the reference string, and w_t is the number of unigrams in the candidate string.

Unigram recall R is calculated as:

R = m / w_r

Where m is the same as above, and w_r is the number of unigrams in the reference string.
In the program this is covered by _unigram_precision_and_recall(reference, candidate)

Unigram precision and recall are combined using the harmonic mean, with recall weighted 9 times more than precision, giving us F_mean as follows:

F_mean = (10P**R) / (R+9P)

In the program this is covered by _harmonic_mean(P, R)

The penalty p is calculated as:

p = ¹⁄₂ (c / u_m)³

Where c is the number of chunks, and u_m is the number of unigrams that have been mapped.
In the program this is covered by _calculate_penalty(reference, candidate)

The final score, M, is calculated as:

M = F_mean (1 - p)

In the program this is covered by _calculate_meteor(f_mean, penalty)

Mappings and chunks

The values for c and u_m, used above in the penalty, are calculated in the function _calculate_chunks(reference, candidate)

To do this, the program creates an alignment, or a set of mappings between unigrams within the two strings. u_m is the length of this set of mappings. c is the amount of chunks in the alignment, where a chunk is a set of mappings where the unigrams are adjacent in both strings.
Below is an example of a reference and candidate string and their unigram's corresponding indices.

	0	1	2	3	4	5
reference	the	cat	sat	on	the	mat
candidate	on	the	mat	sat	the	cat

Two possible alignments between these strings are:

^{Images linked from Wikipedia}

The difference between these examples are which "the" in the reference string is mapped with which "the" in the candidate string.

The METEOR algorithm calls for the alignment with the fewest amount of mappings to be chosen. Since both of these alignments have 6 mappings, we much choose the mapping that has the fewest amount of intersections. This is done by creating points on a graph for each word, with their X-value being their index in the string, and their Y-value corresponding with being in the reference or candidate string (0 or 1, for example). Illustrated above are the alignments with mappings drawn as lines. Calculating this, we can now count how many of these lines intersect.

In the above example, the first alignment will be chosen as it has 8 intersections, while the second alignment has 11 intersections. This gives us a final c value of 6, and a u_m value of 6.

References and credits

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments by Satanjeev Banerjee and Alon Lavie.
Wikipedia article

License

This project is licensed under the MIT License, please see the LICENSE file for details.

Author's note

Please feel free to contribute to the project or notify me of any bugs in the code or improvements to be made. Thanks.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
pymeteor		pymeteor
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pymeteor

pymeteor

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

PyMETEOR

Installation

Usage

Formulas

Mappings and chunks

References and credits

License

Author's note

About

Releases

Packages

Languages

License

zembrodt/pymeteor

Folders and files

Latest commit

History

Repository files navigation

PyMETEOR

Installation

Usage

Formulas

Mappings and chunks

References and credits

License

Author's note

About

Topics

Resources

License

Stars

Watchers

Forks

Languages