Skip to content

Corpus compact #112

Answered by JasonKessler
albin02t asked this question in Q&A
Discussion options

You must be logged in to vote

It accumulates the highest scoring words (most associated) for each category, as selected by a set frequency rank in all classes, until no more than 2000 (or whatever the specified number is) words are collected.

By default, Scaled F-Score is used, but any TermScorer such as RankDifference, DeltaJSDivergence, etc. could be used,

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by JasonKessler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants