Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two backslashes gets converted to 3 backslashes #404

Open
tomgoddard opened this issue Nov 29, 2023 · 0 comments
Open

Two backslashes gets converted to 3 backslashes #404

tomgoddard opened this issue Nov 29, 2023 · 0 comments

Comments

@tomgoddard
Copy link

In the current PyPi html2text converting a single backslash in html produces a single backslash in plain text. That seems right. But converting two backslashes in html produces 3 backslashes in plain text. It seems like two backslashes in html should produce two in plain text. The where I am seeing this is in html that shows two backslashes in Windows some file paths to indicate the backslash is escaped. When we convert in our ChimeraX application to plain text for bug reporting it then appears as 3 backslashes in the file names (https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10252).

Note that in the python strings in the test script below the appearance of two backslashes in a Python string means just one backslash since "\" is an escape indicating a single character string containing one backslash.

  • Version by html2text --version
    2020.1.16

  • Test script

import html2text
h = html2text.HTML2Text()
h.handle('<p>\\</p>')
    '\\\n\n'   # Seems right
h.handle('<p>\\\\</p>')
    '\n\n\\\\\\\n\n'  # Seems wrong, 3 backslashes in the output.
html2text.__version__
    (2020, 1, 16)
  • Python version python --version
    Python 3.10.9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant