Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composebox_typeahead: Avoid generating broken links. #30071

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kuv2707
Copy link
Collaborator

@kuv2707 kuv2707 commented May 13, 2024

The #**stream>topic** syntax generates broken links for topics containing two backticks or starting/ending with *, because of architectural flaws in the backend markdown processor. So we avoid generating the syntax for such topics and instead generate the normal link syntax in markdown.

Fixes #19873

CZO discussion

Screenshots and Screen Captures

chrome_M0x0tB29yL

Self-review checklist
  • Self-reviewed the changes for clarity and maintainability
    (variable names, code reuse, readability, etc.).

Communicate decisions, questions, and potential concerns.

  • Explains differences from previous plans (e.g., issue description).
  • Highlights technical choices and bugs encountered.
  • Calls out remaining decisions and concerns.
  • Automated tests verify logic where appropriate.

Individual commits are ready for review (see commit discipline).

  • Each commit is a coherent idea.
  • Commit message(s) explain reasoning and motivation for changes.

Completed manual review and testing of the following:

  • Visual appearance of the changes.
  • Responsiveness and internationalization.
  • Strings and tooltips.
  • End-to-end functionality of buttons, interactions and flows.
  • Corner cases, error conditions, and easily imagined bugs.

@zulipbot zulipbot added size: S area: markdown (mentions) Mentions for users/groups/stream/topic/time. bug rust community request Issues of interest to the Rust community labels May 13, 2024
Copy link
Collaborator Author

@kuv2707 kuv2707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a preliminary implementation and highlighted the reasons behind some technical choices.
Once we decide the correct logic to detect faulty topic names, I'll refactor the functions and put them in a sensible place.

@@ -995,8 +997,29 @@ export function content_typeahead_selected(item, query, input_element, event) {
// Stream + topic mention typeahead; close the stream+topic mention syntax
// with the topic and the final **. Note that token.length can be 0
// if we are completing from `**streamname>`.
function will_produce_broken_link(topic) {
Copy link
Collaborator Author

@kuv2707 kuv2707 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I have considered only two cases which produced buggy links:

  • When there are two or more backticks
  • When the topic name begins or ends with asterisks.

There can still be topic names which cause broken links. I had earlier tried to parse the syntax text using the frontend markdown parser and see if the link in the parsed html points to the correct stream and topic. But double-backtick topic names gave different urls through frontend and backend markdown processors (in fact, the frontend parser correctly parsed the double-backtick topic name, while the backend processor generated an incorrect url).

This means that checking if the frontend markdown processor parses the syntax correctly gives no guarantee about the correctness of the link in the final message. So I thought the best option would be to check for the individual reported cases which produce broken urls.

I infer from CZO discussions that we will eventually move to a different markdown processor, and this will no longer be an issue then.

web/src/composebox_typeahead.js Outdated Show resolved Hide resolved
@alya
Copy link
Contributor

alya commented May 14, 2024

@timabbott Do you want to take a look and provide some feedback here?

@alya alya added the integration review Added by maintainers when a PR may be ready for integration. label May 14, 2024
beginning.slice(0, syntax_start_index) +
url_syntax(get_stream_name(beginning), item.topic) +
" ";
}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract this logic to a function and/or add unit tests for it? It's complicated enough that it's very much worth having a test for.

Copy link
Collaborator Author

@kuv2707 kuv2707 May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extracted the logic and added some unit tests.

@kuv2707
Copy link
Collaborator Author

kuv2707 commented May 20, 2024

@timabbott Addressed the suggestion. Please take a look.

);
assert.equal(
ct.stream_topic_link_syntax("#**stream>t", "*asterisk"),
"[#stream>*asterisk](#narrow/stream/-1-unknown/topic/*asterisk) ",
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all of these have -1 as the stream ID in the URLs? The first couple don't, which makes me think you've got some sort of weird bug in either one test case or the other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream ids are only required when we generate the markdown link syntax (since we need to generate the url ourselves).
That is why the first two didn't need them.

@@ -1180,6 +1185,31 @@ export function content_typeahead_selected(
return beginning + rest;
}

function will_produce_broken_stream_topic_link(topic_name: string): boolean {
return /(\*+)|(.*`.*`.*)/.test(topic_name);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links for topic names with double backticks are now generated correctly, but the double backticks cause the enclosed part of the topic name to render as an inline code block.
I don't know what to do in this case since backslash escaping is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's no great answer for that at present. This is #18202.

As long as the link works correctly, though, that's a much less severe issue than #19873 is.

A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.

@kuv2707 kuv2707 force-pushed the 19873_topic_name_fix branch 2 times, most recently from a3c5613 to 8c243b1 Compare May 21, 2024 12:16
@kuv2707 kuv2707 requested a review from timabbott May 21, 2024 15:38
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kuv2707! This is an old issue people have repeatedly run into, and I'm happy to see progress being made on it.

function will_produce_broken_stream_topic_link(topic_name: string): boolean {
return /(\*+)|(.*`.*`.*)/.test(topic_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should handle several other cases too, as discussed at #19873 (comment) :

  • containing $$ (rather than backticks or asterisks)
  • containing any of the above in the stream/channel name (rather than the topic)

And checking for two backticks is probably trickier than necessary — can just check if there's any backticks at all. That's conservative in that it'll cover all the same cases and more, and it simplifies thinking about whether there are relevant cases this misses. (In particular if there's one backtick in the stream/channel name and another in the topic, that's probably enough to mess things up.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this regexp can be simplified:

  • we're not using the capture groups, so remove the ( and )
  • the leading and trailing .* are redundant
  • if it matches \*+, it'll match just \*, and vice versa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've simplified the regex and handled the $$ case.
(I noticed that a single $ doesn't actually cause problems in any case)

Additionally, stream names are also being checked now.

@@ -1180,6 +1185,31 @@ export function content_typeahead_selected(
return beginning + rest;
}

function will_produce_broken_stream_topic_link(topic_name: string): boolean {
return /(\*+)|(.*`.*`.*)/.test(topic_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there's no great answer for that at present. This is #18202.

As long as the link works correctly, though, that's a much less severe issue than #19873 is.

A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.

Comment on lines 1197 to 1198
const stream = stream_data.get_sub(stream_name);
const stream_id = stream?.stream_id ?? -1;
return `[#${stream_name}>${topic_name}](${hash_util.by_stream_topic_url(stream_id, topic_name)}) `;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does -1 come from here?

I believe that will end up producing a link that's broken, and doesn't even have the stream name the user typed. In general, when we don't have a meaningful value to supply somewhere, it's best not to make up a bogus value like -1 — instead, either do something that explicitly means "there's no known value", or if that's not possible then throw an exception.

(And if throwing an exception feels like it'd be a real problem for users, then that's a sign we need to do some more work so as to actually handle the case where we don't have a real value: either get the value after all, or gracefully handle saying "there's no known value", or something else.)

Copy link
Collaborator Author

@kuv2707 kuv2707 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code path is only accessed through typeahead, it is improbable that stream_id would ever be -1. I think we can throw an exception.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an exception sounds correct to me.

The #**stream>topic** syntax generates broken links for
topics containing two backticks or ending with *, because of
architectural flaws in the backend markdown processor.
So we avoid generating the syntax for such topics and instead
generate the normal link syntax in markdown.

Fixes zulip#19873
@kuv2707
Copy link
Collaborator Author

kuv2707 commented Jun 4, 2024

@alya Does this PR have to go through the buddy and mentor review stages? (Since it's already reviewed by maintainers once)

@alya
Copy link
Contributor

alya commented Jun 5, 2024

I think it's OK to skip those at this stage. Has all the feedback above been addressed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: markdown (mentions) Mentions for users/groups/stream/topic/time. bug integration review Added by maintainers when a PR may be ready for integration. rust community request Issues of interest to the Rust community size: L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix linking to topic names containing multiple backticks with in-message markdown
5 participants