Skip to content

turnerguo/pptx2html

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project takes PowerPoint .pptx files and extracts their contents. It's based on ANTLR 4, ANother Tool for Language Recognition. There's an ANTLR 3 branch available as well.

Limitations

  1. This version does not preserve text formatting or slide layouts.
  2. This version ignores shapes drawn with PowerPoint (that's a complex little drawing language) and might not catch all pictures.
  3. The output is HTML formatted for a s6 slideshow.

Building

Intall Maven and JDK 6 or later, build using the standard Maven lifecycle targets (clean, compile, test, package).

Wishlist / roadmap

  1. Other output templates (e.g. Markdown, Textile)
  2. Capture inline formatting
  3. Capture more of the layout options (titles, header/footer, text block positioning, picture positioning.)

About

PowerPoint OOXML (2007) to HTML conversion via ANTLR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 95.9%
  • ANTLR 4.1%