Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break after footnote markers #17

Open
alxtsg opened this issue Oct 30, 2019 · 1 comment
Open

Break after footnote markers #17

alxtsg opened this issue Oct 30, 2019 · 1 comment
Labels
Status: Proposal Request for comments

Comments

@alxtsg
Copy link
Contributor

alxtsg commented Oct 30, 2019

Given the following piece of text extracted from a page on the Wikipedia:

Wikipedia is a multilingual online encyclopedia created and maintained as an open collaboration project[3] by a community of volunteer editors using a wiki-based editing system.[4] It is the largest and most popular general reference work on the World Wide Web,[5][6][7] and is one of the most popular websites ranked by Alexa as of October 2019.

I get the following result when using sentence-splitter to split the text:

Sentence #0: Wikipedia is a multilingual online encyclopedia created and maintained as an open collaboration project[3] by a community of volunteer editors using a wiki-based editing system.[4] It is the largest and most popular general reference work on the World Wide Web,[5][6][7] and is one of the most popular websites ranked by Alexa as of October 2019.

Does it make sense to split the text at the footnote marker [4]? The result will become:

Sentence #0: Wikipedia is a multilingual online encyclopedia created and maintained as an open collaboration project[3] by a community of volunteer editors using a wiki-based editing system.[4]
Sentence #1: It is the largest and most popular general reference work on the World Wide Web,[5][6][7] and is one of the most popular websites ranked by Alexa as of October 2019.

If this make sense, where should I start to change the codes?

@azu azu added the Status: Proposal Request for comments label Oct 30, 2019
@azu
Copy link
Member

azu commented Oct 30, 2019

Thanks for report!

Abstract text:

[Str][Punctuation][Footnote][WhiteSpace][Str][Punctuation]

We have three options.

Option 1: No Support

Sentence 0: [TEXT][Punctuation][Footnote][WhiteSpace][TEXT][Punctuation]

Option 2: Sentence 0 inlucdes Footnote( @alxtsg suggested )

Sentence 0: [TEXT][Punctuation][Footnote]
[WhiteSpace]
Sentence 1: [TEXT][Punctuation]

Option 3: put [Footnote] out of sentence

Sentence 0: [TEXT][Punctuation]
[Footnote] <- Meta Node? new Node type?
[WhiteSpace]
Sentence 1: [TEXT][Punctuation]

I agree with @alxtsg, so I select Option 2.

Because, [Footnote] depend on Sentence 0.
[Footnote] of Option 3 is not clear. (Which note for sentence 0 or sentence 1?)

Another example, remark(markdown parser) support [1] as LinkRererence node.
I think that the LinkRererence(footnote) should be included in the sentence.
https://astexplorer.net/#/gist/708be10b31ad657cdfd9f03d8f65eda3/64101d95f31191d062240492634f675916468407
(Note: TxtAST use remark internally https://textlint.github.io/astexplorer/)

Sentence 0: [TEXT][Punctuation][LinkReference(footnote)]
[WhiteSpace]
Sentence 1: [TEXT][Punctuation]

I'm ok that split the text at the footnote marker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Proposal Request for comments
Projects
None yet
Development

No branches or pull requests

2 participants