Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite: make it modular #6

Merged
merged 27 commits into from
Jul 10, 2023
Merged

rewrite: make it modular #6

merged 27 commits into from
Jul 10, 2023

Conversation

gamemaker1
Copy link
Owner

@gamemaker1 gamemaker1 commented Jun 8, 2023

Checklist

  • The issues that this pull request is related to/fixes are mentioned.
  • The purpose of this pull request is explained.
  • The changes made to the code, as well as the documentation have been explained.
  • The tests, lint & style checks all pass.
  • The documentation has been updated to reflect the changes made in this pull request.

Related Issues

Overview

This pull request rewrites the library to make it easier to add more text extraction methods. It also allows extraction from a file, buffer or a url.

The API is now as follows -

import { getTextExtractor } from 'office-text-extractor'

const extractor = getTextExtractor()

const location = 'https://raw.githubusercontent.com/gamemaker1/office-text-extractor/rewrite/test/fixtures/test-pdf.pdf'
const text = await extractor.extractText({
	input: location,
	type: 'url'
})

console.log(text)

@kibertoad
Copy link

Looking forward to this being finished!

@gamemaker1
Copy link
Owner Author

It should be done in the next few days :)

@kibertoad
Copy link

@gamemaker1 Any chance you could publish an RC/beta version, if it's already working to some extent? We are doing feasibility study right now, and if we can confirm that this works, we can stop exploring other libraries.

@gamemaker1
Copy link
Owner Author

Sure, I could release a beta version right now.

It only works on pdf and docx files yet, though.

package.json Outdated Show resolved Hide resolved
@kibertoad
Copy link

@gamemaker1 Perfect, this is exactly the formats that we are interested in!

@gamemaker1
Copy link
Owner Author

@kibertoad I published the version in this PR as v3.0.0-beta.1, you can install it via npm install office-text-extractor@next.

@kibertoad
Copy link

much appreciated!

@kibertoad
Copy link

@gamemaker1 Types seem to be not included in the bundle, was that intentional?

@gamemaker1
Copy link
Owner Author

No, let me check it out

@gamemaker1
Copy link
Owner Author

Re-published with types! I hadn't added the declaration: true option to the tsconfig file, so Typescript didn't generate types. Sorry about that :(

@kibertoad
Copy link

It works! thank you so much
image

@gamemaker1
Copy link
Owner Author

Yay!

@wllvns
Copy link

wllvns commented Jul 9, 2023

this is great! Any timeline on the pptx. (etc) support in v3? Thank you!

@gamemaker1 gamemaker1 marked this pull request as ready for review July 10, 2023 15:56
@gamemaker1 gamemaker1 merged commit bbae8d1 into main Jul 10, 2023
8 checks passed
@gamemaker1 gamemaker1 deleted the rewrite branch July 10, 2023 16:22
@gamemaker1
Copy link
Owner Author

@wllvns released v3.0.1 right now, with pdf, docx, pptx and xlsx support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cant use external url/path
3 participants