Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support .docx as /content Documents #6

Open
escv opened this issue Jul 4, 2018 · 2 comments
Open

Support .docx as /content Documents #6

escv opened this issue Jul 4, 2018 · 2 comments

Comments

@escv
Copy link
Owner

escv commented Jul 4, 2018

Add an OSGi IResourceConverter Component in content bundle to support reading-in Word .docx files.

Only basic style information should be extracted from Word document so that styling is done in CSS files.

Maybe, instead of using external libraries such as apache poi, it is possible to read in the .docx XML with a XML parser directly (it is a zip'ed XML) and extract text there.

The result of the ResourceConverter should be a clean and simple XHMTL snippet representing the Word document.

@escv
Copy link
Owner Author

escv commented Jul 4, 2018

Sample document.xml that is inside a docx Zip container
document.xml.txt

@escv
Copy link
Owner Author

escv commented Jul 11, 2018

Branch: docx-support (https://github.com/escv/younic/tree/docx-support)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant