Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken link to a ProGit book page presented via search #1642

Open
sivaraam opened this issue Oct 24, 2021 · 8 comments
Open

Broken link to a ProGit book page presented via search #1642

sivaraam opened this issue Oct 24, 2021 · 8 comments

Comments

@sivaraam
Copy link
Member

Searching for keyword presented a result which turned out to be a broken link.

URL for broken page

https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine

Problem

When I searched for "intelliJ" in the search bar, I got the result for "Appendix A: Git in Other Environments - Git in IntelliJ / PyCharm / WebStorm / PhpStorm / RubyMine" section of the ProGit book. When I clicked on it though, it resulted in a 404 for the following URL:

https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine

I guess the culprit is that the URL should be encoded but it isn't. When I try to access the section by going to https://git-scm.com/book/en/v2 and clicking on the link to the same section, it works fine.

Operating system and browser

Firefox on Windows 10

Steps to reproduce

  1. Open https://git-scm.com
  2. Search for "intellij"
  3. Click on the result under the "Book" category.
@pedrorijo91
Copy link
Member

thanks for reporting the issue @sivaraam . would you be interested in helping to solve the issue?

@sivaraam
Copy link
Member Author

Thanks for asking @pedrorijo91 !
I would love to but I lack the knowledge (ruby) required to fix this issue myself. I hope someone else would be able to take up the task of fixing this issue 🙂

@C-Lion
Copy link

C-Lion commented Oct 25, 2021

I would like to work on this please, but I need a bit if guidance. I reproduced the issue but there based on the URL there should be a https://git-scm.com/book/en in the repo and I am unable to find this folder in this repo. Can some one please give me a hint where to find the git-scm.com/book/en/ document folder?

@pedrorijo91
Copy link
Member

sure @C-Lion !

so, the book routing is made through https://github.com/git/git-scm.com/blob/main/app/controllers/books_controller.rb#L6

if we look into the setup at https://github.com/git/git-scm.com/blob/main/README.md#setup, we'll see how is the book content imported - it uses the rake remote_genbook2 command

now we need to dig into https://github.com/git/git-scm.com/blob/main/lib/tasks/book2.rake#L32 to find where are we adding the chapter title to the search index, and make sure we URL encode it

@peff
Copy link
Member

peff commented Oct 26, 2021

I suspect the problem is actually in the search code, not the book importer. In the model for a book section, we index the content and provide an id field:

client.index index: ELASTIC_SEARCH_INDEX,
type: "book",
id: "#{code}---#{self.slug}",
body: {
chapter: self.chapter.title,
section: self.title,
number: self.cs_number,
lang: code,
html: self.html
}

And then when we do a search, the section model inherits from Searchable, which formats the results:

results.each do |result|
source = result["_source"]
name = source["section"] || source["chapter"] || source["name"]
slug = result["_id"].gsub("---", "/")
highlight = if search_type == "book"
result["highlight"]["html"].first rescue nil
else
result["highlight"]["text"].first rescue nil
end
hit = {
name: name,
score: result["_score"],
highlight: highlight,
url: (search_type == "book" ? "/book/#{slug}" : "/docs/#{name}")
}
ref_hits << hit

So one of those probably needs to be URL-encoding things (you can see in the section model that we URL-encode the slug in other contexts when generating links). I'm not sure which place makes more sense. It doesn't look like we use that id for anything else, but maybe there are rules we need to follow for ElasticSearch (otherwise why would it do that weird --- replacement and not just insert a slash).

@C-Lion
Copy link

C-Lion commented Oct 30, 2021

I think I understand the issue but it is not clear from your information which of the above needs to be corrected or what exactly you want changed.
Seems like a lot of code just to display a URL.
So just confirming this is not just a "correct the URL on the page issue"?
I have basic Ruby skills & will try the correct rake command to see if I can get this set up locally.

@peff
Copy link
Member

peff commented Nov 2, 2021

So just confirming this is not just a "correct the URL on the page issue"?

Right. The problem is that the URL is not found in this repository at all. It is in content which is imported from another repository (the book code). Hence the complexity. A cron job pulls in updated book content into the sql database nightly, and then we run a search index on that content, putting the result into the elasticsearch database. And then incoming search requests query the elasticsearch database. So if we are going to add a layer of quoting to the URL, it needs to happen either when we index the content and stick it into elasticsearch, or when we pull it out and return it to the browser.

@Nikolai48Nick
Copy link

Searching for keyword presented a result which turned out to be a broken link.

URL for broken page

https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine

Problem

When I searched for "intelliJ" in the search bar, I got the result for "Appendix A: Git in Other Environments - Git in IntelliJ / PyCharm / WebStorm / PhpStorm / RubyMine" section of the ProGit book. When I clicked on it though, it resulted in a 404 for the following URL:

https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine

I guess the culprit is that the URL should be encoded but it isn't. When I try to access the section by going to https://git-scm.com/book/en/v2 and clicking on the link to the same section, it works fine.

Operating system and browser

Firefox on Windows 10

Steps to reproduce

  1. Open https://git-scm.com
  2. Search for "intellij"
  3. Click on the result under the "Book" category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants