-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct export into more formats #9
Comments
As for PDF I think we need a little roadmap here as there are tradeoffs. I tried weasyprint, xhtml2pdf and wkhtmltopdf (or pdfkit, which is the wrapper for the same engine).
We can also put a little section "what you can do next with html" in README or docs and let the pdf export wait until a better choice emerges. Export pagination may be a barrier, might need a method for For disussion of tools I refered to manubot. These guys have a fallback procedure on generating the pdf based on tools that are available of the system (athena is there is Docker). The relation of tools to engines is as below:
|
I thought a bit about exporting into more formats. There will be some formats that HTML can easily be converted to and some that can't. Converting from HTML won't work if the user is interested in the exported source rather than just the rendered result. This is the case for e.g. LaTeX which you may want to include in a larger LaTeX document and keep editing. Another example could be exporting to Jupyter notebooks. For this, we need to support multiple exporters. HTML, LaTeX, and ipynb come to mind for which this would be useful. Display formats like PDF are easy to generate from any of the three. To convert to LaTeX, we'll need a Markdown to LaTeX converter to format multi-line comments. I would like to keep all base features available for all output formats, e.g. we should robustly embed videos into LaTeX. Besides this, the will be It will take a little bit of planning to decide how to structure the code for this. For example, we may want to make the blocks ( class Exporter:
__init__(directory)
visit_comment(block) # e.g. block.text
visit_text(block) # e.g. block.text
visit_image(block) # e.g. block.filename, block.width
visit_video(block) # e.g. block.filename, block.width
save() What do you think? Do you have other ideas how the code should be structured? |
I agree with sequence of formats - there are 'immediate output' formats ones like html, ipynb, latex and display formats like pdf which is based on processing either html or latex. As I do not fully understand
Source script processing and Then there is a render_html(), render_latex(), render_markdown() function or method that converts each block type into a new format. Finally there is a functionality that assembles converted blocks into html, latex, ipynb or markdown document. # Handout class is exposed to the user:
# - the user inits a handout in a script to display script comments and code in output
# - the user adds elements as images and video to the output inside a script
# - alternatively, the user plays with instance in interactive session,
# just using the add_x() methods
class Handout:
def __init___(directory, title, interactive=False):
pass
# blocks represent report contents units
# they hold values and display configurations
# maybe blacks can be dataclasses, to make the constructors cleaner
class Block:
pass
class Message(Block):
pass
class Text(Block): #this is for multiline comments
pass
class Image(Block):
pass
class Video(Block):
pass
class Code(Block):
pass
# something is done to produce internal representation of the document
# as a list of blocks. this is what Handout class does now, but it is tightly
# bundled with html output
class Document:
self.title: str
self.blocks: [Block]
# document can be exported to different formats
def to_html(doc: Document) -> str:
pass
def to_latex(doc: Document) -> str:
pass
def to_notebook(doc: Document) -> str:
pass
def to_markdown(doc: Document) -> str:
pass |
As a sidenote the role of markdown is still to be discussed:
We can start with simplest type of markdown. |
Thanks for your example. What I had in mind is the visitor pattern, which seems like a better solution to me. What do you think? # The user API gets a new constructor argument:
class Handout:
__init__(directory, title='Handout', format='html', source=True)
@_blocks
add_text(string) # Add Text block.
add_image(tensor, width=None, format='png') # Save to disk and add Image block.
add_html(string) # Add HTML block.
show() # Iterate over blocks and call according exporter methods.
_find_source() # Find user's Python source; can be extracted out some day.
# Blocks are independent of output format:
Text = namedtuple('Text', 'string')
Image = namedtuple('Image', 'filename, width')
HTML = namedtuple('HTML', 'string')
# Exporters are visitors:
class HTMLExporter:
__init__(directory, source, title)
@lines
visit_text(text) # Add lines to self.lines
visit_image(image)
visit_html(html)
save() # Save lines to index.html.
class LaTeXExporter:
__init__(directory, source, title)
visit_text(text)
visit_image(image)
visit_html(html) # No-op.
save() One question is where to specify the export type. It shouldn't be in By the way, do you have a preference for how to name the exporters? I can think of exporter, output, target, backend |
The from dataclasses import dataclass
class Block:
pass
@dataclass
class Message(Block):
string: str
def html(self):
return '<pre class="message">' + self.string + '</pre>'
def markdown(self):
return self.string
def latex(self):
pass
assert Message('Some text').html() == '<pre class="message">Some text</pre>' This way we keep data and block conversion fucntions closer together, much easier testing. Later in code you can have a
|
What is does |
As for As an extra feature can add |
If we provide |
'Exporter' seems quite a natural, I think it stresses we are doing one-directional conversion. |
In addition to LaTeX export, it would make sense to export to Markdown. This might also be easier for users to further convert into other formats downstream. |
@danijar, what is inside |
The output format is currently HTML, which can then be printed as PDF from the browser. It could be nice to directly export to PDF, ipynb and other formats.
The text was updated successfully, but these errors were encountered: