Use Peter Novig spell corrector for similar_names #300

wangkuiyi · 2024-02-01T02:13:49Z

Guoli Yin pointed me to this code snippet for field name correction. I instantly thought of Peter Novig's spell checker, which is available at https://norvig.com/spell-check.html. This pull request makes advantage of Peter Novig's technique for name suggestions.

I'm preserving the original function name,'similar_names'. However, the new algorithm only suggests one name. I think this makes more sense than suggesting multiple choices because a class's collection of properties is typically far smaller than the English vocabulary, and users may prefer one accurate correction when they misspell.

Please let me know if I mis-interpret the problem here. Thanks!

gyin94 · 2024-02-01T19:16:32Z

axlearn/common/config.py

+    """Use Peter Novig's spell correcter at https://norvig.com/spell-correct.html"""
+    word_count = Counter([_ for _ in candidates])
+
+    def P(word, N=sum(word_count.values())):


consider a type hint and return type?

I agree that we should add the type hints if we merge this pull request; however, after more thoughts, I am no longer sure that we should merge Peter's algorithm.

This algorithm expands the misspelled word to a small set of similar words, and filter out those not in the vocabulary. This is because the vocabulary is too large to compute the editing distance between each word in it and the misspelled word.

However, a Python class wouldn't have this many candidate symbols as in a vocabulary. So it may be affordable to compute the pair-wise editing distance. What do you think?

makes sense to me.

Use Peter Novig spell corrector for similar_names

209bc18

gyin94 requested a review from markblee February 1, 2024 19:15

gyin94 reviewed Feb 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Peter Novig spell corrector for similar_names #300

Use Peter Novig spell corrector for similar_names #300

wangkuiyi commented Feb 1, 2024 •

edited

gyin94 Feb 1, 2024

wangkuiyi Feb 2, 2024

gyin94 Feb 6, 2024

Use Peter Novig spell corrector for similar_names #300

Are you sure you want to change the base?

Use Peter Novig spell corrector for similar_names #300

Conversation

wangkuiyi commented Feb 1, 2024 • edited

gyin94 Feb 1, 2024

Choose a reason for hiding this comment

wangkuiyi Feb 2, 2024

Choose a reason for hiding this comment

gyin94 Feb 6, 2024

Choose a reason for hiding this comment

wangkuiyi commented Feb 1, 2024 •

edited