Offsets (spans) for tokens #80

andrefs · 2022-05-26T21:19:29Z

andrefs
May 26, 2022

Hello, is it possible to get spans for tokens? 😎

From the documentation it seems they are available for sentences only.
However this discussion mentions tokens, but I still don't understand how to get them.

Many thanks! 🙏

Answered by rachnachakraborty

May 27, 2022

Hey @andrefs

Here is sample code to get you started. It is inspired from the code you have mentioned in the discussion, we have replaced the doc.sentences with doc.tokens

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;

const text = '      It is a fox.   It    is a fox.Is that a fox? Is it?';
const doc = nlp.readDoc( text );
curIndex = 0;
const spans = [];
doc.tokens().each( ( s, k ) => {
  const sk = s.out();
  curIndex += s.out(its.precedingSpaces).length;
  const start = curIndex;
  const end = start + sk.length - 1;
  curIndex = end + 1;
  spans.push( { text: sk, start: start, end: end } );
}…

View full answer

rachnachakraborty · 2022-05-27T05:12:42Z

rachnachakraborty
May 27, 2022

Hey @andrefs

Here is sample code to get you started. It is inspired from the code you have mentioned in the discussion, we have replaced the doc.sentences with doc.tokens

const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;

const text = '      It is a fox.   It    is a fox.Is that a fox? Is it?';
const doc = nlp.readDoc( text );
curIndex = 0;
const spans = [];
doc.tokens().each( ( s, k ) => {
  const sk = s.out();
  curIndex += s.out(its.precedingSpaces).length;
  const start = curIndex;
  const end = start + sk.length - 1;
  curIndex = end + 1;
  spans.push( { text: sk, start: start, end: end } );
} );
console.log( spans );

It will produce the following output:

Array (18 items)
0: Object {text: "It", start: 6, end: 7}
1: Object {text: "is", start: 9, end: 10}
2: Object {text: "a", start: 12, end: 12}
3: Object {text: "fox", start: 14, end: 16}
4: Object {text: ".", start: 17, end: 17}
5: Object {text: "It", start: 21, end: 22}
6: Object {text: "is", start: 27, end: 28}
7: Object {text: "a", start: 30, end: 30}
8: Object {text: "fox", start: 32, end: 34}
9: Object {text: ".", start: 35, end: 35}
10: Object {text: "Is", start: 36, end: 37}
11: Object {text: "that", start: 39, end: 42}
12: Object {text: "a", start: 44, end: 44}
13: Object {text: "fox", start: 46, end: 48}
14: Object {text: "?", start: 49, end: 49}
15: Object {text: "Is", start: 51, end: 52}
16: Object {text: "it", start: 54, end: 55}
17: Object {text: "?", start: 56, end: 56}
Array Prototype

Try it on runkit with Node V 16

Cheers,
Rachna

1 reply

andrefs May 27, 2022
Author

Ah i see, perfect! 👌
Thank you for replying so quickly 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offsets (spans) for tokens #80

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Offsets (spans) for tokens #80

andrefs May 26, 2022

Replies: 1 comment · 1 reply

rachnachakraborty May 27, 2022

andrefs May 27, 2022 Author

andrefs
May 26, 2022

Replies: 1 comment 1 reply

rachnachakraborty
May 27, 2022

andrefs May 27, 2022
Author