Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow non-latin characters in old-path redirects #3283

Closed
wants to merge 3 commits into from

Conversation

JayaKrishnaNamburu
Copy link
Contributor

Description

fixes #3225

  1. What is this PR about (link the issue and add a short description)

Steps for reproduction

  1. click button
  2. expect xyz

Code Review

  • hi @kof, I need you to do
    • conceptual review (architecture, feature-correctness)
    • detailed review (read every line)
    • test it on preview

Before requesting a review

  • made a self-review
  • added inline comments where things may be not obvious (the "why", not "what")

Before merging

  • tested locally and on preview environment (preview dev login: 5de6)
  • updated test cases document
  • added tests
  • if any new env variables are added, added them to .env.example and the builder/env-check.js if mandatory

@kof
Copy link
Member

kof commented May 2, 2024

Why only redirects? I think all page paths can support non-latin characters

.string()
.refine(
(path) =>
new RegExp(/^(?:\/[-_a-zA-Z0-9*:?\\\p{L}\\p{N}\\p{M}\\\/]+)+$/gmu).test(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an original implementation of this? I think its a non-trivial thing to write and we should be able to use something battle-tested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just extended \\p{L}\\p{N}\\p{M}\\ to the original one to allow non-latin characters though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw is not possible to use any of the standard api for this? e.g. URL api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the original coming from? is it worth linking to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will see, ideally should work though. Will check about the +/-'s of both approaches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the original coming from? is it worth linking to?

Original regex is from PagesPath, just extending with latin characters. Until there is a decision to support non-latin across all the paths inside webstudio projects. Maybe checking only for old paths (projects that are designed in another platforms) would be safe way to go.

@JayaKrishnaNamburu
Copy link
Contributor Author

Yup, thought of it. But didn't want to change without cross checking with the team once.
So just added as a safe option for old redirects. As, urls from old projects can still be used. Will test a little bit, and change it to all page paths across the project 👍

@kof
Copy link
Member

kof commented May 2, 2024

I just checked webflow and when I put japanese "日本語" in there it gets transformed to "ri-ben-yu", do you know why?
When I put "тест" it converts to "test"

i am very surrprised and need to know why :)

@JayaKrishnaNamburu
Copy link
Contributor Author

They might be using translations, and i really wonder why they are being converted to simple english though. Is non-latin in urls is not a standard practice 🤔

@kof
Copy link
Member

kof commented May 2, 2024

Lets see if twitter knows why Webflow converts this way

https://x.com/oleg008/status/1786101402869063997

@kof
Copy link
Member

kof commented May 2, 2024

  • investigate how mature projects are treating page urls and what they use for validation

@kof
Copy link
Member

kof commented May 2, 2024

As an idea, check what wordpress is doing, I think they had 20 years to figure this out

@JayaKrishnaNamburu
Copy link
Contributor Author

Here are my observations.

  • Wordpress don't have a first hand support for redirects. But there is a plugin that brings in support for the same. While pasting, the non-latin characters are not handled. So any old url's with non-latin characters in them can't be used.
Screenshot 2024-05-03 at 18 29 53
  • Webflow seems to change them to translate them in the page urls. So, ideally looks like they don't support non-latin characters for the pages in the project. Can't test the 301 in webflow, as it needs a premium account.
Screenshot 2024-05-03 at 18 33 26

And one more thing, we can't directly use the URL to validate this because, there is a domain that is missing. So, the URL will always throw error unless we prepend with something like www.example.com/product-page/商品名-7 for validation purposes. We can just stick with regex as it is now.

The decent middle ground can be, just allow support for non-latin characters in the redirects manager and that too only for old-paths. So existing users like the one from GitHub discussions can still use the manager to move their projects. And later the support for non-latin on project wide routes can be added.

@JayaKrishnaNamburu JayaKrishnaNamburu self-assigned this May 3, 2024
@JayaKrishnaNamburu JayaKrishnaNamburu marked this pull request as ready for review May 3, 2024 13:17
@TrySound
Copy link
Member

TrySound commented May 3, 2024

And one more thing, we can't directly use the URL to validate this because, there is a domain that is missing. So, the URL will always throw error unless we prepend with something like www.example.com/product-page/商品名-7 for validation purposes. We can just stick with regex as it is now.

there is nothing bad with prepending for new URL, we do this all the time. You can even pass fake origin as second argument.

@JayaKrishnaNamburu
Copy link
Contributor Author

Closing this PR, as the support for non-latin characters is a little bit bigger topic than just enabling in regex. Follow #3225 for further updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support non latin characters in page path
3 participants