Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to export Arabic and Chinese characters in pdf export #234

Closed
sirishayalavarthi opened this issue May 22, 2018 · 2 comments
Closed

Comments

@sirishayalavarthi
Copy link

I have tried to download some Arabic characters and Chinese characters showing ###

this is the code in the parser

public byte[] generateFile(final String markdown) {

    Node document = getParser().parse(markdown);
    String html = getRenderer().render(document);
    String htmlDoc = applyCssStyling(appendHtmlAndCssTag(html));

    System.out.print(htmlDoc);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    PdfConverterExtension.exportToPdf(out, htmlDoc, null, getFlexOptions());
    return out.toByteArray();
}

here is HTML

Disclaimer: By reading this you agree that the template is working as per requirement
I’mmm ڛ ڛ 中文字符 ـش ـص Agree/Disagree bla bla bla

here is output pdf attached
download.pdf

this is how text in pdf looks
Disclaimer: By reading this you agree that the template is working as per requirement
I’mmm # # #### ## ## Agree/Disagree bla bla bla

@vsch
Copy link
Owner

vsch commented May 22, 2018

@sirishayalavarthi, the PDF converter uses default fonts for conversion and the default fonts do not support all unicode code points. I have not yet resolved this issue or figured out how to get around it. See #181

@vsch
Copy link
Owner

vsch commented Jan 24, 2019

A solution to the font problem is to define an embedded TrueType font in the style or stylesheet and set the body tag to use this font. OpenHtmlToPDF will use the characters from the font which has them defined.

For example including Noto Serif/Sans/Mono fonts and adding noto-serif, noto-sans and noto-mono families to CSS to allow PDF to use these for rendering text.

However, the PDF converter requires TrueType fonts and Noto CJK fonts are OpenFonts which cannot be used. The solution is to download a TrueType Unicode font that supports CJK character set and add it to the custom rendering profile to be used for PDF.

For my test I used arialuni.ttf from https://www.wfonts.com/font/arial-unicode-ms

If the installation directory for the fonts is /usr/local/fonts/ then the following in the stylesheet should be added:

@font-face {
  font-family: 'noto-cjk';
  src: url('file:///usr/local/fonts/arialuni.ttf');
  font-weight: normal;
  font-style: normal;
}

@font-face {
  font-family: 'noto-serif';
  src: url('file:///usr/local/fonts/NotoSerif-Regular.ttf');
  font-weight: normal;
  font-style: normal;
}

@font-face {
  font-family: 'noto-serif';
  src: url('file:///usr/local/fonts/NotoSerif-Bold.ttf');
  font-weight: bold;
  font-style: normal;
}

@font-face {
  font-family: 'noto-serif';
  src: url('file:///usr/local/fonts/NotoSerif-BoldItalic.ttf');
  font-weight: bold;
  font-style: italic;
}

@font-face {
  font-family: 'noto-serif';
  src: url('file:///usr/local/fonts/NotoSerif-Italic.ttf');
  font-weight: normal;
  font-style: italic;
}

@font-face {
  font-family: 'noto-sans';
  src: url('file:///usr/local/fonts/NotoSans-Regular.ttf');
  font-weight: normal;
  font-style: normal;
}

@font-face {
  font-family: 'noto-sans';
  src: url('file:///usr/local/fonts/NotoSans-Bold.ttf');
  font-weight: bold;
  font-style: normal;
}

@font-face {
  font-family: 'noto-sans';
  src: url('file:///usr/local/fonts/NotoSans-BoldItalic.ttf');
  font-weight: bold;
  font-style: italic;
}

@font-face {
  font-family: 'noto-sans';
  src: url('file:///usr/local/fonts/NotoSans-Italic.ttf');
  font-weight: normal;
  font-style: italic;
}


@font-face {
  font-family: 'noto-mono';
  src: url('file:///usr/local/fonts/NotoMono-Regular.ttf');
  font-weight: normal;
  font-style: normal;
}

body {
    font-family: 'noto-sans', 'noto-cjk', sans-serif;
    overflow: hidden;
    word-wrap: break-word;
    font-size: 14px;
}

var,
code,
kbd,
pre {
    font: 0.9em 'noto-mono', Consolas, "Liberation Mono", Menlo, Courier, monospace;
}

Sample PdfConverter.java updated. Wiki Page with information added: PDF-Renderer-Converter

@vsch vsch pinned this issue Jan 24, 2019
@vsch vsch closed this as completed Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants