Skip to content

Offsets (spans) for tokens #80

Discussion options

You must be logged in to vote

Hey @andrefs

Here is sample code to get you started. It is inspired from the code you have mentioned in the discussion, we have replaced the doc.sentences with doc.tokens

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;

const text = '      It is a fox.   It    is a fox.Is that a fox? Is it?';
const doc = nlp.readDoc( text );
curIndex = 0;
const spans = [];
doc.tokens().each( ( s, k ) => {
  const sk = s.out();
  curIndex += s.out(its.precedingSpaces).length;
  const start = curIndex;
  const end = start + sk.length - 1;
  curIndex = end + 1;
  spans.push( { text: sk, start: start, end: end } );
}…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@andrefs
Comment options

Answer selected by andrefs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants