Unwanted unescaping of http url strings #355

NicoHood · 2022-06-12T09:44:21Z

Hi!
I have an url in my ical event description that is already html encoded. Here is an example:

https://www.facebook.com/events/756119502186737/?acontext=%7B%22source%22%3A5%2C%22action_history%22%3A[%7B%22surface%22%3A%22page%22%2C%22mechanism%22%3A%22main_list%22%2C%22extra_data%22%3A%22%5C%22[]%5C%22%22%7D]%2C%22has_source%22%3Atrue%7D

This is how it looks in the ical file:

DESCRIPTION:https://www.facebook.com/events/7561195021867
 37/?acontext=%7B%22source%22%3A5%2C%22action_history%22%3A[%7B%22surface%22
 %3A%22page%22%2C%22mechanism%22%3A%22main_list%22%2C%22extra_data%22%3A%22%
 5C%22[]%5C%22%22%7D]%2C%22has_source%22%3Atrue%7D

But some characters now get unescaped for some unknown reason:

https://www.facebook.com/events/756119502186737/?acontext=%7B%22source%22:5,%22action_history%22:[%7B%22surface%22:%22page%22,%22mechanism%22:%22main_list%22,%22extra_data%22:%22\%22[]\%22%22%7D],%22has_source%22:true%7D

It turns out, that this code causes the issue:
https://github.com/collective/icalendar/blob/master/src/icalendar/parser.py#L273

It converts %3A to : which in my case is NOT wanted. The url is broken then.

Why was this html unescape introduced and how can we fix that?

The text was updated successfully, but these errors were encountered:

NicoHood · 2022-06-12T09:47:16Z

Wouldn't it make more sense to escape them with the character function, with backslashes instead of html codes?
https://github.com/collective/icalendar/blob/master/src/icalendar/parser.py#L25

NicoHood · 2022-06-12T09:57:14Z

I see what the issue is: escape_string is used as a workaround to hide \: characters and only search for "real" : splitting characters. However, what nobody thought of was, that this will modify the final output, if there already were escaped characters. This solution is quite of wonky and error prone and should be fixed. The splitting should be parsed more smart, and not by escaping/unescaping which will always lead to errors, if implemented like that. Also it makes it very hard to understand, as the html escape characters are absolutely missplaced and have nothing to do with the code itself.

NicoHood · 2022-06-12T10:34:48Z

This is my tested recoding of this function. It completely removed the escaping functions, as they are not required anymore.

What I am not sure about is, if inside quotes, backslashes should also quote the quote itself. Example:
"this text is quote and it even contains a \" quote mark"

I can implement that, but it was not implemented before and I am not sure if the spec even allows that. This document states, that is is not implemented very often, so we could just stick to that?
https://tools.ietf.org/id/draft-daboo-ical-vcard-parameter-encoding-02.html#rfc.appendix.Appendix%20A

def parts(self):
    """
    Split the content line up into (name, parameters, values) parts.
    
    Example with parameter:
    DESCRIPTION;ALTREP="cid:part1.0001@example.org":The Fall'98 Wild

    Example without parameters:
    DESCRIPTION:The Fall'98 Wild
    
    https://icalendar.org/iCalendar-RFC-5545/3-2-property-parameters.html
    """
    try:
        st = self
        name_split = None
        value_split = None
        in_quotes = False
        # Any character can be escaped using a backslash, e.g.: "test\:test"
        quote_character = False
        for i, ch in enumerate(st):
            # We can also quote using quotation marks. This ignores any output, until another quote appears.
            if ch == '"':
                in_quotes = not in_quotes
                continue
                
            # Ignore input, as we are currently in quotation mark quotes
            if in_quotes:
                continue
            
            # Skip quoted character
            if quote_character:
                quote_character = False
                continue

            # The next character should be ignored
            if ch == '\\':
                quote_character = True
                continue

            # The name ends either after the parameter or value delimiter
            if ch in ':;' and not name_split:
                name_split = i

            # The value starts after the value delimiter
            if ch == ':' and not value_split:
                value_split = i

        # Get name
        name = st[:name_split]
        if not name:
            raise ValueError('Key name is required')
        validate_token(name)

        # Check if parameters are empty
        if not name_split or name_split + 1 == value_split:
            raise ValueError('Invalid content line')

        # Get parameters (text between ; and :)
        params = Parameters.from_ical(st[name_split + 1: value_split],
                                      strict=self.strict)

        # Get the value after the :
        values = st[value_split + 1:]
        return (name, params, values)
    except ValueError as exc:
        raise ValueError(
            "Content line could not be parsed into parts: '%s': %s"
            % (self, exc)
        )

NicoHood · 2022-06-12T10:37:22Z

Here is a VEVENT to test with:

BEGIN:VEVENT
DTSTART:20220305T200000Z
DTSTAMP:20220612T093000Z
UID:6co62d1l6cs3eb9lcgp3cb9k6ssm6b9ochim8b9g71hjedb4c8pj6p9pc4@google.com
CREATED:20220223T074954Z
DESCRIPTION:<html-blob>Feier Deine Jugend! <a href="https://www.facebook.co
 m/events/1213722619037860?acontext=%7B%22event_action_history%22%3A[%7B%22s
 urface%22%3A%22page%22%7D]%7D">https://www.facebook.com/events/121372261903
 7860?acontext=%7B%22event_action_history%22%3A[%7B%22surface%22%3A%22page%2
 2%7D]%7D</a></html-blob>
LAST-MODIFIED:20220225T121837Z
LOCATION:Removed
SEQUENCE:1
STATUS:CONFIRMED
SUMMARY:BRAVO HITS Party
TRANSP:OPAQUE
END:VEVENT

NicoHood · 2022-06-12T11:12:46Z

And this is my code as monket patch, so people coming here via google can hotfix their library right now:

# Monkey patch icalendar bug
# https://medium.com/@chipiga86/python-monkey-patching-like-a-boss-87d7ddb8098e
# https://github.com/collective/icalendar/issues/355
from icalendar.parser import validate_token
from icalendar.parser import Parameters

def parts_patched(self):
    """
    Split the content line up into (name, parameters, values) parts.

    Example with parameter:
    DESCRIPTION;ALTREP="cid:part1.0001@example.org":The Fall'98 Wild

    Example without parameters:
    DESCRIPTION:The Fall'98 Wild

    https://icalendar.org/iCalendar-RFC-5545/3-2-property-parameters.html
    """
    try:
        st = self
        name_split = None
        value_split = None
        in_quotes = False
        # Any character can be escaped using a backslash, e.g.: "test\:test"
        quote_character = False
        for i, ch in enumerate(st):
            # We can also quote using quotation marks. This ignores any output, until another quote appears.
            if ch == '"':
                in_quotes = not in_quotes
                continue

            # Ignore input, as we are currently in quotation mark quotes
            if in_quotes:
                continue

            # Skip quoted character
            if quote_character:
                quote_character = False
                continue

            # The next character should be ignored
            if ch == '\\':
                quote_character = True
                continue

            # The name ends either after the parameter or value delimiter
            if ch in ':;' and not name_split:
                name_split = i

            # The value starts after the value delimiter
            if ch == ':' and not value_split:
                value_split = i

        # Get name
        name = st[:name_split]
        if not name:
            raise ValueError('Key name is required')
        validate_token(name)

        # Check if parameters are empty
        if not name_split or name_split + 1 == value_split:
            raise ValueError('Invalid content line')

        # Get parameters (text between ; and :)
        params = Parameters.from_ical(st[name_split + 1: value_split],
                                      strict=self.strict)

        # Get the value after the :
        values = st[value_split + 1:]
        return (name, params, values)
    except ValueError as exc:
        raise ValueError(
            "Content line could not be parsed into parts: '%s': %s"
            % (self, exc)
        )

from icalendar import parser
parser.Contentline.parts = parts_patched
# End of monkey patch

niccokunzmann · 2022-06-15T16:57:19Z

@NicoHood Would it be ok for you to create a pull request for this? It could be just the code. I am not a contributor to this project but willing to look at it.

NicoHood · 2022-06-15T20:32:18Z

Sure. #356

see #356 see #355

NicoHood added a commit to NicoHood/icalendar that referenced this issue Jun 15, 2022

Fix collective#355 url escaping

757cea3

NicoHood linked a pull request Jun 15, 2022 that will close this issue

Fix #355 url escaping #356

Open

niccokunzmann mentioned this issue Aug 14, 2022

Fix ical printing doc #352

Merged

niccokunzmann changed the title ~~Unwanted unescaping if http url strings~~ Unwanted unescaping of http url strings Aug 21, 2022

jacadzaca linked a pull request Sep 5, 2022 that will close this issue

Fix #355 url escaping #356

Open

niccokunzmann added a commit that referenced this issue Oct 3, 2022

add tests for issue #355

f2c0d23

see #356 see #355

niccokunzmann mentioned this issue Oct 3, 2022

add tests for issue #355 #426

Open

niccokunzmann pushed a commit that referenced this issue Oct 3, 2022

Fix #355 url escaping

2e8430a

niccokunzmann mentioned this issue Oct 3, 2022

New Release 5.0.0 #429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unwanted unescaping of http url strings #355

Unwanted unescaping of http url strings #355

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022 •

edited

NicoHood commented Jun 12, 2022 •

edited

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022

niccokunzmann commented Jun 15, 2022

NicoHood commented Jun 15, 2022

Unwanted unescaping of http url strings #355

Unwanted unescaping of http url strings #355

Comments

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022 • edited

NicoHood commented Jun 12, 2022 • edited

NicoHood commented Jun 12, 2022

NicoHood commented Jun 12, 2022

niccokunzmann commented Jun 15, 2022

NicoHood commented Jun 15, 2022

NicoHood commented Jun 12, 2022 •

edited

NicoHood commented Jun 12, 2022 •

edited