Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse() should return the score as well #866

Open
1 task done
walles opened this issue Oct 6, 2023 · 4 comments
Open
1 task done

Analyse() should return the score as well #866

walles opened this issue Oct 6, 2023 · 4 comments

Comments

@walles
Copy link
Contributor

walles commented Oct 6, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What problem does this feature solve?

Chroma can classify text by its contents:

lexer := lexers.Analyse("package main\n\nfunc main()\n{\n}\n")

With this API though, it's not possible for me to know if even the best match is bad.

I would like to find that out, so that I can just not highlight if the text contents is uncertain.

What feature do you propose?

One possible suggestion would be to change the API to this...

lexer, certainty := lexers.Analyse("package main\n\nfunc main()\n{\n}\n")

... where certainty is a number on a well defined and documented scale.

Then, if I feel this number is too low, I could choose not to highlight anything.

@alecthomas
Copy link
Owner

Seems reasonable, how could we do this in a backwards compatible manner?

@walles
Copy link
Contributor Author

walles commented Oct 6, 2023

Maybe this?

lexer, certainty := lexers.AnalyseScore("package main\n\nfunc main()\n{\n}\n")

Possibly in combination with deprecating the existing function since it's sort of unpredictable.

@alecthomas
Copy link
Owner

In what way is it unpredictable?

@walles
Copy link
Contributor Author

walles commented Oct 8, 2023

Not sure if "unpredictable" is the right word, but let's say:

  1. I get a file
  2. lexers.Analyse() says it's a C program, with 1% confidence

This means that even though C is the "best" guess, it's still a bad guess, and it might be better to not highlight at all.

That's why I'd like to have the confidence number as well to be able to make this judgement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants