Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to proccess a raw string input? #178

Open
qiubinyang opened this issue Feb 21, 2022 · 2 comments
Open

How to proccess a raw string input? #178

qiubinyang opened this issue Feb 21, 2022 · 2 comments

Comments

@qiubinyang
Copy link

qiubinyang commented Feb 21, 2022

Nice job dude. I want to proccess a raw string input, not from pdf or txt file. And I find some related code from AnyStyle and wapiti-ruby source code, but it didn`t work. Can u give me an example?

require 'anystyle'

# `str` is the full text of pdf
str = "xxx"

finder = AnyStyle.finder

dataset = Wapiti::Dataset.prepare(str)

output = finder.model.label dataset

result = Wapiti::Dataset.new(dataset.map.with_index { |doc, idx|
  doc.label(output[idx])
})

print result[0].references

it case some bugs like:

Traceback (most recent call last):
        2: from parser_string.rb:1102:in `<main>'
        1: from /Library/Ruby/Gems/2.6.0/gems/wapiti-1.0.7/lib/wapiti/model.rb:44:in `label'
/Library/Ruby/Gems/2.6.0/gems/wapiti-1.0.7/lib/wapiti/model.rb:44:in `label': missing tokens at 1: 11/8, cannot apply pattern (Wapiti::NativeError)
@inukshuk
Copy link
Owner

You can use AnyStyle::Document which is a wrapper around the Wapiti classes aimed for use with the Finder model. For an example, take a look at the Document.open which can open plain text files.

@qiubinyang
Copy link
Author

You can use AnyStyle::Document which is a wrapper around the Wapiti classes aimed for use with the Finder model. For an example, take a look at the Document.open which can open plain text files.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants