Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support archive inspection #797

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

taminob
Copy link
Contributor

@taminob taminob commented Jan 24, 2024

Description

This PR introduces a new flag (currently -q and --quasi as suggested in the linked issue) to allow to list the content of archives. If a file is an archive is solely determined based on the file extension for now.

For now, I only added support for .tar archives, but the changes aim to lay the foundation for the future support of additional archive formats.

Resolves #600

How Has This Been Tested?

TODO

@daviessm
Copy link
Contributor

Ref #756 - apparently reading any files is a no-no

@taminob
Copy link
Contributor Author

taminob commented Jan 24, 2024

Ref #756 - apparently reading any files is a no-no

Thanks for bringing that up - that would, of course, make this feature impossible to implement.
Only compromise I could offer would be that on any error (invalid archive, IO error, etc.), it will fail silently and the file will simply be listed like any regular file. The atime could be manually reset to the previous atime, but not sure if I'm a fan of that solution.

Since I'd propose to hide this feature behind a special flag anyway, we could also say that the atime will not be modified except if a feature flag is given that has to read a file.

@cafkafk @MartinFillon ?

@taminob taminob force-pushed the feat/600/support-archive-inspection branch from 2a461ab to 68fe930 Compare January 24, 2024 13:03
@MartinFillon
Copy link
Contributor

I personnally doesn't care that much about that I was just saying back what was said on previous issues. But maybe feature flagging options that read files content (except for git) is a good idea. This is also debatable, do we prefer having it under a feature flag but still in the codebase or having it under a separate crate (still feature flagged). As for the silent fail, is it possible to replicate the broken link case to atleast show the error message on it ?

@taminob
Copy link
Contributor Author

taminob commented Jan 26, 2024

I personnally doesn't care that much about that I was just saying back what was said on previous issues. But maybe feature flagging options that read files content (except for git) is a good idea. This is also debatable, do we prefer having it under a feature flag but still in the codebase or having it under a separate crate (still feature flagged). As for the silent fail, is it possible to replicate the broken link case to atleast show the error message on it ?

Thanks for your thoughts on the topic - since I'm still early in development and don't really have a final design in mind yet, I can't really say much how easy that broken link/error message thing would be, but it sounds like a good idea.

So then I'll go on and we can see later if we want to move the archive stuff into a separate crate.

@johnsonjh
Copy link

johnsonjh commented Jan 29, 2024

I'm following along...

I just wanted to mention that I think using -q and --quasi are acceptable for the flags, not just because that's what li used, but also because it doesn't clash too terribly with the syntax of lsd (no -q) nor ls (where on GNU and most BSD systems, -q is short for --hide-control-chars).

I also think it would be a useful future addition to attempt identification of archives by contents (i.e. file magic) if a regular file is specified with a trailing slash.

@taminob taminob force-pushed the feat/600/support-archive-inspection branch 2 times, most recently from 7bb11c7 to 40c78d2 Compare February 6, 2024 20:27
@taminob taminob marked this pull request as ready for review February 6, 2024 20:29
@taminob
Copy link
Contributor Author

taminob commented Feb 6, 2024

I finished a first proposal, if you want to have a look at it - it currently only consists of .tar archive inspection and only uses the file extension to determine its file type. To at least somewhat reduce the complexity of this PR, I'd propose to include detection based on file content etc. in follow-up features - I already kept that in mind and the code should be easily extendable regarding that.

To avoid code duplication, I introduced a common trait for filesystem files (File) and the files in an archive (ArchiveEntry) which I called Filelike for now (better name suggestions are welcome).
Unfortunately, this requires to touch a lot of different places in the code base (especially since File had some public members which were used in different places which had to be wrapped into trait functions), but without a lot of code copying and specializing for both File and ArchiveEntry I didn't see any other way to implement this.

I left a couple of TODOs in the code with small remarks, feel free to leave a comment if you have an opinion. Also, tests are still missing and Windows fails to compile - that's still in progress
I also already looked at how to support ZIP archives, but would also leave that to a separate PR.

Looking at the current implementation in src/fs/archive.rs I think that splitting it off into a separate crate could actually make sense. Since it will probably be moved anyway, I kept it for now as-is in a single, way too overloaded file. How do you want to proceed for that?

Copy link
Member

@cafkafk cafkafk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time to review the code yet, but this did work with a .tar archive, very cool!

It seems to not work on windows, could you try to gate it with a #[cfg(any(target_os = "macos", target_os = "linux"))] to make it macos/linux only? That should resolve the CI issues.

@MartinFillon
Copy link
Contributor

Gonna take a look at the code, but this seems promising, I think bringing this is a great advancement.

@daviessm
Copy link
Contributor

daviessm commented Feb 7, 2024

The changes make the Windows build very broken because the refactoring requires modes::from_mode to parse the permissions of a FileLike but libc::S_IRUSR and co. don't exist on Windows. I can foresee trying to display Unix-like permissions of a tar in the Windows long output being problematic too.

Out of scope but on Windows, ZIP inspection would be much more useful than tar.

How useful is uncompressed tar inspection? I thought most tars that people would come across would be compressed in some way.

@johnsonjh
Copy link

It's a good start - compression can be figured out later, I'm sure.

I can foresee inspection of tar, cpio (and RPM as a special case of cpio), ar (.a archive), and zip as being the most important. Eventually it would be nice to see 7z, pax, ISO, CAB, rar, and lha support, and support for decompressing various streaming "outer" compression methods (lzip, bzip2, gzip, compress, etc.)

(I'd normally use libarchive for something like this, if I was working in C.)

Copy link
Contributor

@MartinFillon MartinFillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review on my side, I have some thoughts, that are debatable, on the options part and how its managed.

@@ -82,6 +82,7 @@ pub static OCTAL: Arg = Arg { short: Some(b'o'), long: "octal-permis
pub static SECURITY_CONTEXT: Arg = Arg { short: Some(b'Z'), long: "context", takes_value: TakesValue::Forbidden };
pub static STDIN: Arg = Arg { short: None, long: "stdin", takes_value: TakesValue::Forbidden };
pub static FILE_FLAGS: Arg = Arg { short: Some(b'O'), long: "flags", takes_value: TakesValue::Forbidden };
pub static INSPECT_ARCHIVES: Arg = Arg { short: Some(b'q'), long: "quasi", takes_value: TakesValue::Forbidden };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAy I ask the reason of --quasi ? I think its okay as a name but I don't get the link with archives

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's derived from AIX's li program, to treat archives as "quasi-directories".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer this to be --inspect-archives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference tbh so its up to you all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't have a strong opinion, although I think having a short flag for it would be great - but in the end, this can also be solved via a shell alias.
Maybe 🎉 for --quasi and 🚀 for --inspect-archives (or other suggestions)?

Copy link

@johnsonjh johnsonjh Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*                                                                          
 * li - list files, directories, archives                                        
 *                                                                          
   Rewritten by D. Landauer to combine the functions of seven different     
   `ls' programs we had here (May '79) ... also made it use stdio.  Flags:  
        -I<abcfghilmnoprsu>    include a field                              
        -E<abcfghilmnoprsu>    exclude a field                  **rrr**     
            accessed-time      headers       modified-time  protections     
            character count    i-number      name           size (in blocks)
            group id           link count    owner id       updated-time    
        -O<abcdfpsx> list only these types of files                         
            archives           directories           symbolic links         
            block devices      files (normal)        executable files       
            character devices  pipes (fifos)                                
                                                                            
        -S[r]<acmnsux> sort order [r= reversed]                             
            accessed-time (latest first)            name (default)          
            modified-time (latest first)            size (biggest first)    
            updated-time  (latest first)            x = no sort at all      
        -R[<n>anqp] recursive, to level n                                   
            a absolute pathnames                                            
            n don't list qdirs' contents (default)                          
            p pathnames (relative)                                          
            q do list qdirs' contents                                       
        -<n>    at most n columns                                           
        -a      all files                                                   
                     VMS:   all versions of files                           
                     other: even ones beginning w/ `.'                      
        -d      directories (not their contents)                            
        -f      force interpretation as a directory                         
        -l      local long form  (-Icglmnop)                                
        -k      remote long form (-Ibcfmnpr)            * removed *         
        -n      real file names:  do not make them printable (==> -1)       
        -s      sortable verbose (Berklix style)                            
        -v      visual/verbose                                              
        -x      extended (-Iabcfgilmoprsu)                                  
        -L      follow symbolic links (default is to not follow)            
                                                                            
   Routines are in alpha order except main & init (the first two) and       
   getpwn, getgrpn and search (the last three).   Throughout, I refer to    
   archives as "quasi-directories", often abbreviated as qdirs.             
   (Quasi-directories are a superset of directories.)                       
                                                                            
   Because of the removal of the Distribution Services, the following       
   options will be affected:                                                
        1) -k option is removed.                                            
        2) -Ibfr options are ignored, but it will display '-'s instead.     
        3) -Ebfr options are ignored.                                       
        4) -x option is affected. It will display '-' for -bfr fields.      
 */                                                                         

The IBM li program that treats archives as "quasi-directories" goes back to the late 70's at least, so maybe it is set in muscle memory of people. Although I do like -q/--quasi myself, it's not something I'm too worried about. :)

src/options/mod.rs Show resolved Hide resolved
@taminob
Copy link
Contributor Author

taminob commented Feb 7, 2024

It seems to not work on windows, could you try to gate it with a #[cfg(any(target_os = "macos", target_os = "linux"))] to make it macos/linux only? That should resolve the CI issues.

Thanks, didn't look at Windows yet at all. Will fix the CI and the other review comments this weekend.

I can foresee trying to display Unix-like permissions of a tar in the Windows long output being problematic too.

I will take a look at the Windows long output and how the permissions are displayed there, but don't really have any kind of Windows development setup to test it in a live environment. Hints on how to solve potential incompatibilities would be appreciated.

How useful is uncompressed tar inspection? I thought most tars that people would come across would be compressed in some way.

It's mainly as a proof-of-concept since the format is pretty easy to deal with.
During testing, I also noticed that neovim can actually also open tar archives which I didn't know before.

(I'd normally use libarchive for something like this, if I was working in C.)

Fair point, I actually found https://github.com/fnichol/libarchive-rust as a Rust wrapper, but last commit was 6 years ago and not sure if we want to add that as an dependency.

@taminob taminob force-pushed the feat/600/support-archive-inspection branch 3 times, most recently from b136f4f to 7ea3793 Compare February 10, 2024 10:14
@daviessm
Copy link
Contributor

I've tested it on Windows and it works but only seems to read a .tar file if it's specified explicitly, is that intentional? That is, if I specify --quasi --tree or --quasi -R for example, the .tar is shown but the contents are not listed.

@taminob taminob force-pushed the feat/600/support-archive-inspection branch 4 times, most recently from e0d960b to 9e18ec7 Compare February 10, 2024 12:17
@taminob
Copy link
Contributor Author

taminob commented Feb 10, 2024

I've tested it on Windows and it works but only seems to read a .tar file if it's specified explicitly, is that intentional? That is, if I specify --quasi --tree or --quasi -R for example, the .tar is shown but the contents are not listed.

Actually, I forgot that and only had explicitly specified tar archives in mind. Do you think that's an actual use case?
I can have a look at the code and how to integrate that, but could be that it's not entirely straight-forward (except maybe iterating another time recursively over all files, that would be pretty easy but potentially expensive from a performance perspective).

Edit: I just pushed a draft that archives are also inspected in combination with -R (but does not work with --tree yet).

@taminob taminob force-pushed the feat/600/support-archive-inspection branch from b48d5ab to 25f05ae Compare February 10, 2024 13:14
@MartinFillon
Copy link
Contributor

(but does not work with --tree yet)

That is strange cause as of what I know tree is the same as -R for most part

@taminob
Copy link
Contributor Author

taminob commented Feb 11, 2024

(but does not work with --tree yet)

That is strange cause as of what I know tree is the same as -R for most part

You can take a look at 60bdedd, I added a TODO to the place where --tree is handled. In main.rs there is explicitly an if !tree.
To allow --tree with archive inspection probably requires to_archive to be added to the Filelike trait which I tried to avoid to prevent bloating the trait. But shouldn't be too bad, I can do so later.

Or are there any other opinions if the archive inspection should also work for not explicitly specified archives?

@MartinFillon
Copy link
Contributor

Okay nice didnt catch that

@daviessm
Copy link
Contributor

We were chatting about feature gates in the Matrix channel earlier and I think that if we're going to introduce compressed archive inspection in the future it would probably make sense to put that behind a feature gate because it'll pull in compression libraries. Which, by extension, means that this feature should probably be feature gated too so we don't have to either break it when we gate it later, or explain that inspection of some archives is enabled by a feature.

I also think the feature should be enabled by default, though.

@taminob taminob force-pushed the feat/600/support-archive-inspection branch 4 times, most recently from c12447e to 03ed232 Compare February 17, 2024 17:43
@taminob
Copy link
Contributor Author

taminob commented Feb 17, 2024

@daviessm I agree regarding the feature flag - do you want to take a look at 1dffecb if you think that's what you meant? At first, I tried to actually remove every part of the code containing archive, but that lead to quite a mess with way too many conditional compilation attributes - thus, I decided to only remove the tar dependency and basically still handle archives, but without any supported format. Since the flag for the git feature also exists without the feature enabled, I did the same here and the flag is just ignored if passed.

Otherwise, I hopefully incorporated the other feedback as well - but feel free to give additional feedback.

The only thing I'm still struggling with is the nix environment that somehow fails to compile eza now due to linking issues in libgit2.so (/usr/lib/libgit2.so: undefined reference to pcre2_get_error_message_8', /usr/lib/libgit2.so: undefined reference to libssh2_session_disconnect_ex', /usr/lib/libgit2.so: undefined reference to SSL_CTX_set_verify@OPENSSL_3.0.0', ...). I don't quite understand which change causes this regression - previously I only had a linking issue with zlib which I was able to resolve (at least locally) by adding it to the dependencies in flake.nix, but wasn't able to fix the current issues - does anyone else have an idea? I guess my addition with zlib is also just a workaround masking the actual issue?

@daviessm
Copy link
Contributor

@daviessm I agree regarding the feature flag - do you want to take a look at 1dffecb if you think that's what you meant?

Yeah that makes sense, the only other thing is I think we should remove the help text too but I don't think we do that for git either and it'll be another thing to do after clap gets merged.

@taminob
Copy link
Contributor Author

taminob commented Feb 17, 2024

Yeah that makes sense, the only other thing is I think we should remove the help text too but I don't think we do that for git either and it'll be another thing to do after clap gets merged.

Fair point, looks like I forgot to add the flags to the help text altogether. I added it to the display options in 50d6221.

@taminob taminob force-pushed the feat/600/support-archive-inspection branch from 2ea37dd to a79f904 Compare March 28, 2024 13:17
@taminob taminob force-pushed the feat/600/support-archive-inspection branch from a79f904 to 9eff928 Compare March 28, 2024 13:28
@taminob taminob force-pushed the feat/600/support-archive-inspection branch from 9eff928 to 10be86d Compare March 28, 2024 13:31
@taminob
Copy link
Contributor Author

taminob commented Mar 28, 2024

@cafkafk @MartinFillon ping, not entirely sure who else to ping. I rebased the PR so that we can maybe get it ready for merge - but feel free to nitpick as much as you want.

Some help with the CI would be appreciated, but I don't think any change in this PR broke the BSD workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

feat: Add support for inspecting tar{,?z,bz2} and zip files
5 participants