Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude specific directory pattern from search results #179

Open
opt9 opened this issue Apr 10, 2018 · 10 comments
Open

Exclude specific directory pattern from search results #179

opt9 opened this issue Apr 10, 2018 · 10 comments

Comments

@opt9
Copy link
Contributor

opt9 commented Apr 10, 2018

When I search with "some" keyword, so many results are returned and almost all of them are ".../doc/..." directories.

I've found "Path Filter" in UI, but It's disabled and can't use.

How can I exclude a specific directory pattern from search results?

@boyter
Copy link
Owner

boyter commented Apr 10, 2018

Currently you cannot. However it raises an interesting question. Some may want to remove certain portion of the file from the general search logic. Currently in the indexing pipeline it is hard-coded to add certain things such as path and the like. Being able to control this would allow you to configure the above.

@boyter
Copy link
Owner

boyter commented Apr 26, 2018

So this is now sitting in issue179 branch.

You can now control what makes its way into the "all" database. By default this would be the following

index_all_fields=content,filename,filenamereverse,path,interesting

However in your case you would want to remove path from the above. Then the path information will not be added to the general index. You will still be able to click on a path inside the UI to narrow to just that directory, but say you have ./examples/ in your codebase a search for examples will only match files which actually have the name or text examples inside them.

@boyter
Copy link
Owner

boyter commented Apr 27, 2018

Resolved all unit tests. Have verified that this works by setting it to only index the path by setting index_all_fields=filename so only the filename was indexed. Worked as expected. Going to merge into master.

@opt9 For your case pull from master, build and set your searchcode.properties file to have the following index_all_fields=content,filename,filenamereverse,interesting and then start. Path will be removed from the index. A start triggers a full index so it should take effect once indexing has finished.

@boyter boyter mentioned this issue Apr 27, 2018
@boyter boyter moved this from In Progress to Ready for Release in Release 1.3.13 Apr 27, 2018
@boyter boyter closed this as completed Apr 27, 2018
@opt9
Copy link
Contributor Author

opt9 commented Apr 27, 2018

Thanks for your efforts. :-)

There seems to be a miscommunication.
I'll clarify my use-case.

I'm using so many open source libraries, including AngularJS.
When I search with "Foo" keyword, I get so many results, because I have about 600 repositories.
Almost all results are AngularJS documents inside "my_repo/resources/angularjs/doc/*" directory.

I'm not interested in AngularJS documents.
Just want the result in my custom codes inside "my_repo/src/*" directory.

So I want to exclude "my_repo/resources/angularjs/doc/*" directory from results.

For example, if I want to search "{{{" in our code, I get so many AngularJS example codes in AngularJS documents. but it's not I want and annoying.

In other words, I want to narrow down search scope to my codes, excluding 3rd party libraries.

If you have any question, please do not hesitate to reply.

Thanks ;-)

@opt9
Copy link
Contributor Author

opt9 commented Apr 27, 2018

If you don’t mind, Would you please reopen #179 ?

@boyter
Copy link
Owner

boyter commented Apr 27, 2018

I think you can already do that actually based on what you have described.

So when you do a search have a look at the results, you can see where it says for each file pig.go in go /doc/codewalk/pig.go | 121 lines | Go each of the folders (other than the last one) is clickable. Click on it and it will be filtered down to the directory you want.

For example, http://demo.searchcodeserver.com/?q=copyright&repo=go&fl=doc_codewalk is filtered down to the /doc/codewalk directory inside Go.

In your case you would need to use something like,

~/?q=foo&fl=my_repo_src

Which should produce what you want. Admittedly there could be a better way to allow this filtering to happen on the UI somehow rather than just allowing you to filter down though the click. Something I will have a think about.

Is your real intention to just limit the search of never index those files?

@boyter
Copy link
Owner

boyter commented Apr 29, 2018

@opt9 Does that work for you? I have a few ideas on how to make this better consumed through the filters but wondering if at least it unblocks you.

@opt9
Copy link
Contributor Author

opt9 commented Apr 29, 2018

IMO, it’s not related to indexes.

and the inclusion of a specific directory is different from the exclusion of a specific directory pattern

If I can search with “?q=foo&fl<>/doc/”, that would be perfect.

Because I want to search all repositories have a “foo” keyword but exclude all documents directories.

@opt9
Copy link
Contributor Author

opt9 commented Apr 29, 2018

Oops, TIL, GitHub markdown change my one asterisk to italic, double asterisks to bold.

@boyter
Copy link
Owner

boyter commented Apr 30, 2018

Hmm ill have a think about if that's easily possible. It should be easy to do though the current search I think. Just requires some thinking about it.

@boyter boyter reopened this Apr 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Release 1.3.13
Ready for Release
Development

No branches or pull requests

2 participants