Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line numbers for rtf output #1217

Closed
Anteru opened this issue Aug 31, 2019 · 25 comments · Fixed by #2654
Closed

Line numbers for rtf output #1217

Anteru opened this issue Aug 31, 2019 · 25 comments · Fixed by #2654
Labels
A-formatting area: changes to formatters help wanted Community help appreciated!
Milestone

Comments

@Anteru
Copy link
Collaborator

Anteru commented Aug 31, 2019

(Original issue 1513 created by jonascj on 2019-05-06T12:56:56.977289+00:00)

I suggest to add a crude line numbering option to the RTF formatter available in pygments/pygmentize

Basically, after the rtf output is generated numbers padded with spaces could be added to the rtf document, in front of every line of highlighted code.

{ 2 } and {10 } serves very well as line numbers if put in front of every line of highlighted code (with a monospaced font of course). A script to do this is easily crafted, but it would be nice to add it to the RTF formatter. See the attached RTF-document as an example of my suggestion.

The only downside is that the code is not copy friendly in the final document (i.e. you will copy the line numbers as well), but who copies code from RTF/Word/LibreOffice documents anyway? My use case is students who do not write reports using LaTeX. This formatting option would make it easy to generate nice looking code snippets to paste into their reports (without resorting to taking screenshots).

I might get round to looking at submitting a patch during the summer, but maybe someone else will be hooked by this. In any case, now it is here for reference.

@Anteru Anteru added T-feature type: a new feature X-imported imported from Bitbucket S-major severity: major labels Aug 31, 2019
@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by jonascj on 2019-05-11T12:07:35.390323+00:00)

I’ve made a patch / changes to `pygments-main/pygments/formatters/rtf.py` which implements this.
Do I need permission to submit a PR?

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by jonascj on 2019-05-12T19:13:08.101899+00:00)

Never mind, I found the way to make a pull request. As per [1] you have to click the “Create pull request”-button from within your own fork of the project, if you do not have write access to the project you wish to make a pull request to.

So I’ll just polish the patch a bit and learn basic hg, then submit the PR.

[1] https://bitbucket.org/site/master/issues/6986/access-denied-when-sending-pull-request

@birkenfeld birkenfeld added the A-formatting area: changes to formatters label Dec 1, 2019
@galeo
Copy link

galeo commented Aug 9, 2021

Any progress?

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 9, 2021

Not that I'm aware of, sorry.

@Anteru Anteru added help wanted Community help appreciated! and removed T-feature type: a new feature X-imported imported from Bitbucket S-major severity: major labels Aug 9, 2021
@galeo
Copy link

galeo commented Aug 9, 2021

@jonascj ping :)

@jonascj
Copy link
Contributor

jonascj commented Aug 9, 2021

I made something back then which could produce outputs such as the following (as rtf):

scrot_2021-08-09_230354_400x294

I remember almost submitting the patch but then discovering something (during testing) which stopped me doing so. Some cases which would fail, maybe lines that wrap (one line in the source code, but two lines in the highlighted rtf document) or maybe a problem with justifications of linenumbers above 9.

I'll see what I can find and submit it here, at least as a patch for discussion.

@galeo
Copy link

galeo commented Aug 10, 2021

👍 :-)

@jonascj
Copy link
Contributor

jonascj commented Aug 12, 2021

My work from back then can be seen here: https://github.com/jonascj/pygments

If used to highlight this python code test-rtf.py.txt it produces this output (which has some shortcomings):
scrot_2021-08-11_234645_513x555

You can test it as follows. Modify pygments/formatters/rtf.py to modify my implementation, modifications should take effect immediately due to pip install -e ..

git clone https://github.com/jonascj/pygments
cd pygments
git checkout rtf-linenos
python -m venv venv
venv/bin/pip install -e .
venv/bin/python -m pygments -O linenos=1 -l python -o path/to/output.rtf path/to/source-code.ext

@galeo
Copy link

galeo commented Aug 12, 2021

Thanks so much. I simply tested it and haven't looked at the code. Is it an implementation problem to display consecutive strings as one line? I think the correct one should look like this:
Screen Shot 2021-08-12 at 4 09 11 PM
As shown in an editor.

@galeo
Copy link

galeo commented Aug 12, 2021

Can it be processed like the HtmlFormatter implementation? tokensource value in Line 137 should be split by '\n'.
You can have a look at the HtmlFormatter._translate_parts method.

@jonascj
Copy link
Contributor

jonascj commented Aug 12, 2021

@galeo Your rendering (from an editor with source highlighting) is indeed how it should be rendered. And the problem is indeed single tokens which contain \n newline characters.

The simplest solution, which I contemplated back then, would be to produce the entire RTF code without line numbers, then afterwards prefix all produced RTF-lines (termintated by \par) with a line number. Loosely speaking outfile.write should be replaced by some byteio_object.write so it could be parsed before the whole thing was written to the outfile.

The reason why I abandoned it two years ago was the fact that the HTMLFormatter is much more advanced, RTF is an aging format and my usecase (students using Microsoft Word or Libre Office to typeset technical reports with source code) is better solved by using e.g. http://hilite.me/ producing HTML-output which can be copy-and-pasted into WYSIWYG-editors.

I'll give the logic another try. If the multi-line string problem can be solved it is an valuable addition to the rtf-formatter.

@galeo
Copy link

galeo commented Aug 12, 2021

Good luck :-) I think the RFT formatter should achieve roughly the same parameter settings and display effects as the HTML formatter on line number display. I will also consider doing it if I have time. It may not be soon.

@jonascj
Copy link
Contributor

jonascj commented Aug 12, 2021

@galeo Out of curiosity, what is your use case? One where HTML is not satisfactory / an option...

@jonascj
Copy link
Contributor

jonascj commented Aug 12, 2021

How about a table-based implemenation like the HTMLFormatter? Producing output like this (and of course with lines handled correctly):

test-table.rtf.txt
scrot_2021-08-12_114004_691x280

That implementation would potentially be easier since it only requires counting the lines correctly, not splitting them and inserting/prefixing line numbers. Since all the line numbers would just go in the first cell/column.

@galeo
Copy link

galeo commented Aug 12, 2021

To copy code to a keynote/ppt with syntax highlight. The highlight tool could achieve this. I want to switch from highlight to pygmentize and find out the line number output not supported by the RFT formatter.

@jonascj
Copy link
Contributor

jonascj commented Aug 13, 2021

@galeo Please test again, I've added logic to handle numbering of multi-line strings and doc strings correctly. I've also implemented options to only number every n line and to specify the starting value for the line numbering: https://github.com/jonascj/pygments (branch rtf-linenos).

pygvenv/bin/python -m pygments -O linenos=1 -O linenostart=5 -O linenostep=5 -l python -o /tmp/out3.rtf test-rtf.py

Edit Dec 2023: s/login/logic

@galeo
Copy link

galeo commented Aug 13, 2021

@jonascj Thanks so much. It works well. Good job. 👍
For the table-based implementation you mentioned earlier, I think it can also be added, allowing users to choose between the two.

@jonascj
Copy link
Contributor

jonascj commented Aug 14, 2021

@galeo Indeed, users could choose between the two, like with the HTML-formatter.

Anyone with any ideas or opinions on highlighting lines? RTF has a {\highlightN } directive which could be used, but it only highlights the part of a line which contain characters:

scrot_2021-08-14_221457_784x580

@galeo
Copy link

galeo commented Aug 15, 2021

This is also pretty good. In some editors, the line highlighting can choose not to exceed the end of the line. I think this is acceptable.

@akraus53
Copy link

akraus53 commented Dec 7, 2023

@jonascj this is really cool, would you mind creating a PR for this?

@jonascj
Copy link
Contributor

jonascj commented Dec 19, 2023

@akraus53 I'd like to finish the project and submit a PR!

Judging by the comments above a few people, my self included, seem to have solved most problems or come up with ideas for solving them. I'll see if I can pick up the threads!

@jonascj
Copy link
Contributor

jonascj commented Jan 4, 2024

@akraus53 An update: I'm just about ready to make the PR, but what appears to be a bug in Libre Office (which I used to view/render the RTF-output) held me up for a long time. It appears Libre Office renders space characters in sequences of space characters at variable/different widths: https://bugs.documentfoundation.org/show_bug.cgi?id=144050#c14

That is a headache because Libre Office renders RTF-output with line number step > 1 horribly, see the image below. Maybe that will just have to be added as a comment to the documentation - that Libre Office v. 7+ renders spaces incorrectly and hence line number steps > 1 is discouraged if you use Libre Office for typesetting.

Alternatively, it appears a single control word / destination {\*\generator anystring} can be added to the RTF-output which cause Libre Office to render the spaces correctly. But that would be working around a bug in LibreOffice which was introduced (it appears) as a workaround to something space-related: https://git.libreoffice.org/core/+/24b04db5a63b57a74e58a7616091437ad68548ac%5E%21.

wordpad-manual2-rtf

@akraus53
Copy link

akraus53 commented Jan 22, 2024

Isn't it nice, trying to fix an issue with your code for hours and finding out in the end the error is in testing? :D

I will have a look into your multiline-handling because it appears there is still an issue with it. It won't detect multi-line comments, as far as I can tell.

image

EDIT: Problem found, you need to also split up the token type "Token.Comment.Multiline". I've added a comment to your commit here

@jonascj
Copy link
Contributor

jonascj commented Feb 25, 2024

@akraus53 Cheers - I decided to split all tokens containing \n (unless the token only contained one \n character). Otherwise we would have to test for all tokens able \n in all languages.

I finally made the PR #2654 .

Screenshot from 2024-02-25 23-56-40

@Anteru Anteru linked a pull request Apr 27, 2024 that will close this issue
@Anteru Anteru added this to the 2.18.0 milestone Apr 27, 2024
@Anteru
Copy link
Collaborator Author

Anteru commented Apr 27, 2024

Merged!

@Anteru Anteru closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-formatting area: changes to formatters help wanted Community help appreciated!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants