-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
composebox_typeahead: Avoid generating broken links. #30071
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a preliminary implementation and highlighted the reasons behind some technical choices.
Once we decide the correct logic to detect faulty topic names, I'll refactor the functions and put them in a sensible place.
web/src/composebox_typeahead.js
Outdated
@@ -995,8 +997,29 @@ export function content_typeahead_selected(item, query, input_element, event) { | |||
// Stream + topic mention typeahead; close the stream+topic mention syntax | |||
// with the topic and the final **. Note that token.length can be 0 | |||
// if we are completing from `**streamname>`. | |||
function will_produce_broken_link(topic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I have considered only two cases which produced buggy links:
- When there are two or more backticks
- When the topic name begins or ends with asterisks.
There can still be topic names which cause broken links. I had earlier tried to parse the syntax text using the frontend markdown parser and see if the link in the parsed html points to the correct stream and topic. But double-backtick topic names gave different urls through frontend and backend markdown processors (in fact, the frontend parser correctly parsed the double-backtick topic name, while the backend processor generated an incorrect url).
This means that checking if the frontend markdown processor parses the syntax correctly gives no guarantee about the correctness of the link in the final message. So I thought the best option would be to check for the individual reported cases which produce broken urls.
I infer from CZO discussions that we will eventually move to a different markdown processor, and this will no longer be an issue then.
@timabbott Do you want to take a look and provide some feedback here? |
07cab3f
to
dbcdfa0
Compare
web/src/composebox_typeahead.ts
Outdated
beginning.slice(0, syntax_start_index) + | ||
url_syntax(get_stream_name(beginning), item.topic) + | ||
" "; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extract this logic to a function and/or add unit tests for it? It's complicated enough that it's very much worth having a test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extracted the logic and added some unit tests.
dbcdfa0
to
0a026be
Compare
0a026be
to
ff0ba14
Compare
@timabbott Addressed the suggestion. Please take a look. |
ff0ba14
to
7455824
Compare
); | ||
assert.equal( | ||
ct.stream_topic_link_syntax("#**stream>t", "*asterisk"), | ||
"[#stream>*asterisk](#narrow/stream/-1-unknown/topic/*asterisk) ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do all of these have -1
as the stream ID in the URLs? The first couple don't, which makes me think you've got some sort of weird bug in either one test case or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream ids are only required when we generate the markdown link syntax (since we need to generate the url ourselves).
That is why the first two didn't need them.
web/src/composebox_typeahead.ts
Outdated
@@ -1180,6 +1185,31 @@ export function content_typeahead_selected( | |||
return beginning + rest; | |||
} | |||
|
|||
function will_produce_broken_stream_topic_link(topic_name: string): boolean { | |||
return /(\*+)|(.*`.*`.*)/.test(topic_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The links for topic names with double backticks are now generated correctly, but the double backticks cause the enclosed part of the topic name to render as an inline code block.
I don't know what to do in this case since backslash escaping is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there's no great answer for that at present. This is #18202.
As long as the link works correctly, though, that's a much less severe issue than #19873 is.
A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.
a3c5613
to
8c243b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kuv2707! This is an old issue people have repeatedly run into, and I'm happy to see progress being made on it.
web/src/composebox_typeahead.ts
Outdated
function will_produce_broken_stream_topic_link(topic_name: string): boolean { | ||
return /(\*+)|(.*`.*`.*)/.test(topic_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should handle several other cases too, as discussed at #19873 (comment) :
- containing
$$
(rather than backticks or asterisks) - containing any of the above in the stream/channel name (rather than the topic)
And checking for two backticks is probably trickier than necessary — can just check if there's any backticks at all. That's conservative in that it'll cover all the same cases and more, and it simplifies thinking about whether there are relevant cases this misses. (In particular if there's one backtick in the stream/channel name and another in the topic, that's probably enough to mess things up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this regexp can be simplified:
- we're not using the capture groups, so remove the
(
and)
- the leading and trailing
.*
are redundant - if it matches
\*+
, it'll match just\*
, and vice versa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've simplified the regex and handled the $$
case.
(I noticed that a single $
doesn't actually cause problems in any case)
Additionally, stream names are also being checked now.
web/src/composebox_typeahead.ts
Outdated
@@ -1180,6 +1185,31 @@ export function content_typeahead_selected( | |||
return beginning + rest; | |||
} | |||
|
|||
function will_produce_broken_stream_topic_link(topic_name: string): boolean { | |||
return /(\*+)|(.*`.*`.*)/.test(topic_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there's no great answer for that at present. This is #18202.
As long as the link works correctly, though, that's a much less severe issue than #19873 is.
A possible workaround is to use HTML character references, as suggested here: #18202 (comment)
I'd definitely leave that as a separate later commit, though.
web/src/composebox_typeahead.ts
Outdated
const stream = stream_data.get_sub(stream_name); | ||
const stream_id = stream?.stream_id ?? -1; | ||
return `[#${stream_name}>${topic_name}](${hash_util.by_stream_topic_url(stream_id, topic_name)}) `; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does -1 come from here?
I believe that will end up producing a link that's broken, and doesn't even have the stream name the user typed. In general, when we don't have a meaningful value to supply somewhere, it's best not to make up a bogus value like -1 — instead, either do something that explicitly means "there's no known value", or if that's not possible then throw an exception.
(And if throwing an exception feels like it'd be a real problem for users, then that's a sign we need to do some more work so as to actually handle the case where we don't have a real value: either get the value after all, or gracefully handle saying "there's no known value", or something else.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code path is only accessed through typeahead, it is improbable that stream_id
would ever be -1
. I think we can throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an exception sounds correct to me.
8c243b1
to
7704b04
Compare
The #**stream>topic** syntax generates broken links for topics containing two backticks or ending with *, because of architectural flaws in the backend markdown processor. So we avoid generating the syntax for such topics and instead generate the normal link syntax in markdown. Fixes zulip#19873
7704b04
to
d15f3fd
Compare
@alya Does this PR have to go through the buddy and mentor review stages? (Since it's already reviewed by maintainers once) |
I think it's OK to skip those at this stage. Has all the feedback above been addressed? |
The
#**stream>topic**
syntax generates broken links for topics containing two backticks or starting/ending with *, because of architectural flaws in the backend markdown processor. So we avoid generating the syntax for such topics and instead generate the normal link syntax in markdown.Fixes #19873
CZO discussion
Screenshots and Screen Captures
Self-review checklist
(variable names, code reuse, readability, etc.).
Communicate decisions, questions, and potential concerns.
Individual commits are ready for review (see commit discipline).
Completed manual review and testing of the following: