Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for text in multiple unopen binary files fails #3926

Open
mjross opened this issue Mar 19, 2021 · 8 comments
Open

Searching for text in multiple unopen binary files fails #3926

mjross opened this issue Mar 19, 2021 · 8 comments

Comments

@mjross
Copy link

mjross commented Mar 19, 2021

Summary

When searching for a string of plain text in a single binary file that is open in the current window, Komodo is able to find the string. Similarly, Komodo is able to find the string in multiple binary files if they are open in Komodo, even if none of them currently have focus. However, when searching across multiple files that are not open – specifically, when using Edit > Find in Files, with the “Where” option set to “Files” – Komodo either does not open the files or does open them but fails to find the string of text.

Note that the string of text is contained within a single paragraph, and does not contain any binary content. For instance, in a binary Microsoft Word document, the string of plain text does not contain any Word field codes or other formatting information, which, if present, would understandably prevent Komodo from finding the string, regardless of the method of searching for it.

Steps to Reproduce

  1. Help > Troubleshooting > Restart in Safe-Mode.
  2. Create two example Word ".doc" files containing the string “teststring” (just the word - no quote marks). They must be true binary files (using e.g. Word 2000) (unlike a ".docx" file, apparently). In this example, the Word files are located in a temporary directory, “D:\tmp”. Attached is a screenshot of the test string in a Word document, when viewed in Komodo IDE.
  3. Open one of the files in Komodo and perform a standard Edit > Find operation, searching for the string "teststring". Komodo finds the string.
  4. Open a second Word file containing the same string.
  5. Open a non-binary file, and leave it as the file in the current window.
  6. Perform an Edit > Find in Files, with the “Where” option set to “Open Files”. Komodo again finds the string - this time in both binary files.
  7. Close all the files.
  8. Search for the test string again, but this time performing Edit > Find in Files, with the “Where” option set to “Files” and the "Directories" option set to the directory containing the example Word files. Komodo fails to find the string.
  9. Komodo does see that the binary files exist, as evidenced by the number of reported files found, as seen in the second screenshot attached.

Note that this issue report is a (belated) follow-up to an earlier forum post.

Expected Results

Komodo IDE should find the string regardless of which method of searching is performed.

Actual Results

Komodo IDE only finds the string if the binary document(s) is already open.

Platform Information

Komodo IDE 11.1.1 build 91089 (on Windows Home 10.0.19041.746)

Attachments

Two screenshots:

Clipboard01

Clipboard02

@th3coop
Copy link
Member

th3coop commented Mar 19, 2021

I found a ticket relating to this. it looks like we decided to still not allow binary file editing in this context #467.

Unfortunately, the chance of this changing are very low. There are no plans to make updates to Komodo at this point. Looking at the fix for the above ticket though, it looks like you could make a fairly trivial change to a Python file to allow what you want: 455cfa7

@mjross
Copy link
Author

mjross commented Mar 19, 2021

@th3coop, thanks for the info. I can certainly understand any reticence toward allowing users to do a global search and replace in binary files that have not been opened. Can you imagine the damage someone could do with a global search and replace of strings within binary files that make up their operating system? :-) In my case, I'm not interested in doing any replacements, but instead just need a way to do regex searches through folders of Word documents – seeing as the gnomes in Redmond decided that we don't need that capability built into Word.

Dumb question here: If I make those changes to those Python files, do the changes take effect immediately, or do I need to restart Komodo, or, even worse, do I need to recompile anything?

@mjross
Copy link
Author

mjross commented Mar 21, 2021

I followed the code changes listed for that fix as best I could. (I'll include my notes below, just for reference.) I couldn't find the file confirmrepl.js anywhere, but I found and edited the three other files, even though not all the grep line numbers shown on that page matched what I found in the Komodo 11.1 code.

I restarted Komodo normally, but the code changes didn't fix the problem. I restarted Komodo in safe mode, and again it didn't work.

I then upgraded to version 12.0.1 of Komodo, but it didn't make any difference.

After trying the code changes, I noticed in the title of that post, the author specifies "in non-binary (text) file". So would those code changes apply to my situation (of searching through binary files)?


lib\mozilla\python\komodo\findlib2.py
132 + skip_unknown_lang_paths=False,
132 - skip_binary_files=False,
191 + if skip_unknown_lang_paths and ti.lang is None:
192 + yield SkipUnknownLangPath(path)
191 - if not ti.is_text and skip_binary_files:
192 - yield SkipBinaryPath(path)
295 + grepper = grep(regex, paths, skip_unknown_lang_paths=True,
295 - grepper = grep(regex, paths, skip_binary_files=True,
590 + class SkipUnknownLangPath(SkipPath):
591 + def __init__(self, path):
592 + SkipPath.__init__(self, path, "unknown language")

lib\mozilla\components\koFinder.py
284 + if isinstance(event, findlib2.SkipUnknownLangPath):
285 + self.num_paths_skipped += 1
286 + self._cache_skipped_path(event)
287 + elif isinstance(event, findlib2.SkipBinaryPath):
284 - if isinstance(event, findlib2.SkipBinaryPath):
572 + if isinstance(event, findlib2.SkipUnknownLangPath):
573 + self._add_skipped_path(event)
574 + elif isinstance(event, findlib2.SkipLargeFilePath):
572 - if isinstance(event, findlib2.SkipLargeFilePath):
601 + if isinstance(event, findlib2.SkipUnknownLangPath):
602 + return components.interfaces.koIConfirmReplacerInFiles.SKIPPED_UNKNOWN_LANG

lib\sdk\idl\koIFinder.idl
46 + const long SKIPPED_UNKNOWN_LANG = 3;

@th3coop
Copy link
Member

th3coop commented Mar 26, 2021

Dumb question here: If I make those changes to those Python files, do the changes take effect immediately, or do I need to restart Komodo, or, even worse, do I need to recompile anything?

Sorry for the delay, missed this. You need to restart.

I'm so sorry, I missed that you were not doing find-replace. You're just trying to do find (head to desk). Looking at this again, the code should ALREADY be returning results for binary files if you're just doing a search but if I search in a word doc i get no results for strings that I can see. I followed the bread crumbs from the find pane down to this findlib2.py code. Nothing is changing skip_binary_files to True except the replace function. treat_binary_files_as_text sounded suspicious though. So I changed that default to True. I can now search binary files. It's a shame that all the options that can be passed to the underlying find library weren't exposed to the higher lvl libs that actually call it so we could just write a JS toolbox Userscript to perform the search.
This is the change I made. I believe you've already found the file but it's at [install dir]/lib/mozilla/python/komodo/findlib2.py

diff --git a/findlib2.py b/findlib2.py
index 7c229caa7..b28122d03 100644
--- a/findlib2.py
+++ b/findlib2.py
@@ -128,7 +128,7 @@ def find(paths, includes=None, excludes=None, env=None):
 
 
 def grep(regex, paths, files_with_matches=False,
-         treat_binary_files_as_text=False,
+         treat_binary_files_as_text=True,
          skip_binary_files=False,
          skip_filesizes_larger_than=None,
          first_on_line=False,

To apply this I'd first revert any changes you made (just re-install Komodo). You'll need a patch tool installed for this (git has one in [git install dir]/usr/bin/patch.exe):

  1. Add the above diff to a fix-binary-search.patch file
  2. In a cmd window navigate to the folder where that file lives
  3. run patch fix-binary-search.patch

That should apply the change to that file. Keep the patch file around so you know what changes you've made and can re-apply later if needed.

@mjross
Copy link
Author

mjross commented Mar 26, 2021

Sorry for the delay, missed this. You need to restart.

No worries. I of course tried restarting anyway, just to be sure, but it didn't make any difference.

I'm so sorry, I missed that you were not doing find-replace. You're just trying to do find (head to desk).

No worries. I should have detected the misunderstanding given that the code changes listed on that forum page, appeared to be focusing on replacing and not just searching.

That should apply the change to that file.

It worked! Thank you so much for taking the time to understand all of this (I'm afraid it's beyond me, and probably always will be, given how perplexing I found the Komodo IDE source code when I looked through it a couple years ago), and for creating this patch file. The only difficulty for me – a (somewhat reluctant) Windows user – was finding a "patch" utility to run on my laptop. Windows Bash used to contain "patch", according to my notes, but it no longer does. No problem. I simply used UnxUtils. I mention this here in case any other Windows users read these notes in the future and, like me, have found problems with the "git" utilities.

As you probably noticed when you tested it, the output of the search command on binary files can be a bit ugly (example screenshot below), but that's to be expected. Also, I certainly won't complain, given how useful this could be in the future when searching through thousands of pages of text contained in dozens of Microsoft Word documents.

Clipboard01

@th3coop
Copy link
Member

th3coop commented Mar 29, 2021

That's awesome this will be so helpful to you.

@mjross
Copy link
Author

mjross commented Mar 29, 2021

Thank you, @th3coop!

@th3coop
Copy link
Member

th3coop commented Mar 29, 2021

You are super duper welcome @mjross!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants