Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Fix a bunch of Unicode related failures #2095

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

dunhor
Copy link
Contributor

@dunhor dunhor commented Mar 19, 2024

This mostly fixes issues with archive creation, typically where improper accesses assume that MBS strings are valid/correct. I've added tests for every bug I've found in every format that we care about at the moment. There may be additional bugs lingering elsewhere, however I don't currently have the means for testing such scenarios out easily at the moment.

Note that this also includes the changes for #2091 because I wanted to be extra sure I wasn't breaking anything.

This also fixes a few inefficiencies where it's assumed that UTF8 <-> MBS conversions are direct/most efficient, which is not the case on Windows. On Windows, WCS strings are needed as intermediate representations, so it's wasteful to throw them away, especially if we'll need them later (or for the immediate operation).

I also updated the CMake files & some sources to allow compiling as Debug on Windows as well as to compile with Clang on Windows in "MSVC mode" (i.e. clang-cl)

archive_entry_copy_symlink(entry,
(const char *)symname);

/* Symbolic links are embedded as UTF-8 strings */
Copy link
Contributor Author

@dunhor dunhor Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding good documentation on this is tricky. AFAICT the "official" documentation is from the LZMA SDK, however I didn't see any references to symbolic links there. I found different documentation here which states that link paths are UTF-8 encoded, however IDK how trustworthy that source is. That said, the 7-Zip application on Windows and WinRAR all behave correctly when this is a UTF-8 encoded string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably UTF-8 encoded for archives created on Windows, but I'd be curious about POSIX platforms that don't use Unicode filename encodings. (In POSIX, a filename is a sequences of non-zero bytes, with no guarantee that the sequence can be interpreted as characters at all. That can make things ... interesting ... for any code that assumes Unicode semantics.) I'm not sure it's possible to get this completely correct given libarchive's current architecture, though. I'm sure this is an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I can give these the ifdef treatment if you think that would be best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure what would be best here. The state of the art right now is pretty awkward: Major platforms (Windows, macOS) have standardized on Unicode filenames, Linux is moving that way, but there's still a sizable base of POSIX systems for which filenames can not be reliably handled as text. So different archive formats are going to do different things depending on what platforms they are mostly targeting. And frankly, libarchive's current system of proactive character set conversions is far from ideal. I have some vague ideas how it might be improved, but nothing concrete yet.

So I would go ahead with what you have here, but be prepared to reconsider some aspects as we get feedback from POSIX folks.

@mmatuska mmatuska requested a review from kientzle April 23, 2024 06:41
@mmatuska
Copy link
Member

mmatuska commented Apr 23, 2024

@kinentzle maybe we could finally go towards enabling MSVC tests with this patch

@mmatuska
Copy link
Member

mmatuska commented Apr 23, 2024

@dunhor could you please rebase? "test_read_format_rar5_unicode" conflicts with a test with the same name introduced in PR #1978 (aafb078)

@dunhor
Copy link
Contributor Author

dunhor commented Apr 27, 2024

@dunhor could you please rebase? "test_read_format_rar5_unicode" conflicts with a test with the same name introduced in PR #1978 (aafb078)

Woops, didn't realize I used the same name. Resolved and deleted the old test since the new one is more comprehensive and they otherwise test the same thing.

@stoeckmann
Copy link
Contributor

stoeckmann commented May 8, 2024

I'm looking forward to have the compiler fixes (for Visual Studio Express 2022 on Windows 11 amd64 to be very specific) merged in, but there's a lot going on in this PR...

@dunhor, would you mind if I (or we) split this PR into multiple ones with specific topics covered, i.e. at least one which is solely there to make libarchive compile without errors with Visual Studio 2022?

@dunhor
Copy link
Contributor Author

dunhor commented May 10, 2024

I'm looking forward to have the compiler fixes (for Visual Studio Express 2022 on Windows 11 amd64 to be very specific) merged in, but there's a lot going on in this PR...

@dunhor, would you mind if I (or we) split this PR into multiple ones with specific topics covered, i.e. at least one which is solely there to make libarchive compile without errors with Visual Studio 2022?

I wouldn't mind at all. I was actually contemplating breaking this PR into several separate changes - that being one of them - but I'm currently on paternity leave at the moment and haven't really had the time yet.

kientzle pushed a commit that referenced this pull request May 12, 2024
Fixes all test-related compiler warnings with Visual Studio 2022 on
Windows 11.

Contains some changes from
#2095.

CC: @dunhor

---------

Co-authored-by: Duncan Horn <dunhor@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants