Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix path_template variable parsing containing dots #2440

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

jowlo
Copy link
Contributor

@jowlo jowlo commented Mar 2, 2023

Using dots (.) and other special characters as part of a path_template for pages is currently not supported. This is due to the fact that path_template is passed to Python's regex module without escaping first, see #2437.

The PR escapes special regex characters in the path_template before parsing it. It also simplifies the way the parsing is done by using re.match.

As re.escape on Python 3.6 internally compiles the to-be-escaped string as regex pattern and throws errors for invalid patterns (which we have here), the function keeps a local copy of re._special_chars_map to escape these characters.

Contributor Checklist

  • I have broken down my PR scope into the following TODO tasks
    • Fix the path_template parsing
  • I have run the tests locally and they passed. (refer to testing section in contributing)
  • I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR

@jowlo jowlo force-pushed the fix/dots-in-path-template branch 2 times, most recently from 4811dd2 to 995ec65 Compare March 3, 2023 09:05
@jowlo jowlo marked this pull request as ready for review March 3, 2023 09:05
@jowlo jowlo requested a review from alexcjohnson as a code owner March 3, 2023 09:05
@jowlo jowlo force-pushed the fix/dots-in-path-template branch from 995ec65 to cf185ae Compare March 7, 2023 07:38
@jowlo
Copy link
Contributor Author

jowlo commented Mar 7, 2023

Maybe I am missing something here, but it seems that the CI tests are quite flaky...

@AnnMarieW
Copy link
Contributor

I don't think it's a bug that you can't use special characters as variable separators as described in #2437.

The Pages path template follows the Flask convention for path variables and at this time only supports strings (any text without a slash) in each path segment. The issue with allowing special characters such as a dot or a dash as separators, is that those special characters cannot be used in the variable.

In your example, what would happen if abc was 1.23?

When using /dot/<foo>.<bar>/ as path template, a request to /dot/abc.def should result in a call to layout(foo='abc', bar='def').

Rather than this custom parsing being handled by dash, I think it would be better to use a path template like /dot/<foobar>/ then split foobar into two variables inside the layout function of the app.

@jowlo
Copy link
Contributor Author

jowlo commented Mar 7, 2023

I don't think it's a bug that you can't use special characters as variable separators as described in #2437.

The Pages path template follows the Flask convention for path variables and at this time only supports strings (any text without a slash) in each path segment. The issue with allowing special characters such as a dot or a dash as separators, is that those special characters cannot be used in the variable.

But flask does support this, you can craft arbitrary URLs using flask path variables, the following splits the parameters just fine:

from flask import Flask

app = Flask(__name__)

@app.route('/user/<first>.<last>')
def show_user_profile(first, last):
    # show the user profile for that user
    return f'Hello {first} {last}'

In your example, what would happen if abc was 1.23?

Well, you can always craft URLs that won't match to a template. Thats the same with all separators, including /.

Rather than this custom parsing being handled by dash, I think it would be better to use a path template like /dot/<foobar>/ then split foobar into two variables inside the layout function of the app.

Yes, that is possible, but leads to a lot of duplicated code if you split that parameter in many routes. In my application I have a few flask endpoints and some dash endpoints. It just struck me as bug that they are not behaving the same.

Maybe the dash pages path_template parsing should leverage flask's parsing under the hood?

@BSd3v
Copy link
Contributor

BSd3v commented Mar 7, 2023

What happens in your tests when you pass foo.bar.test in the url?

In flask, you get:
image

@app.route('/testing/<foo>.<bar>', methods=['GET'])
def testing(foo, bar):
    return jsonify([foo, bar])

It looks like Flask sticks all the extra ones in the first variable, and everything else is in reverse order based upon a split of the designated separator.

@jowlo
Copy link
Contributor Author

jowlo commented Mar 7, 2023

What happens in your tests when you pass foo.bar.test in the url?

The code in the PR behaves the same as flask here: It replaces the <variable> blocks with greedy capture groups, so the first group will try to grab everything it can.


On second thought i dug a bit more into how flask handles routing, and flask uses werkzeugs routing underneath. Using that as described here in the docs is probably much less error prone (and faster as well). However, dash does not have an explicit dependency on werkzeug atm, only indirectly via Flask.

@BSd3v
Copy link
Contributor

BSd3v commented Mar 7, 2023

I'm wondering if switching to werkzeug might also help with parsing variables that have been url encoded as well. Right now, this is a test -> this%20is%20a%20test in the variable that dash receives.

Flask can also handle mix and matched separators, heaven forbid, haha.

@jowlo
Copy link
Contributor Author

jowlo commented Mar 7, 2023

It would also make it possible to use converters such as <pagenum:int> further reducing repeated manual parsing at the top of each pages layout.

@alexcjohnson
Copy link
Collaborator

I'm in favor of reusing Werkzeug behavior as much as possible. Just be aware that Flask / Werkzeug have a pretty bad record of backward compatibility in areas that we assumed were part of the public API but they choose to remove or alter in minor releases, requiring us to scramble to patch.

So if the code being reused is substantial by all means call Werkzeug, but try to find the highest-level reference possible to minimize the likelihood it moves later. If it's small, like just a regex or 2-line function, please inline it so we match their behavior without a risk of later breakage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants