Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows only: Consider using __argc and __argv #15

Open
adishavit opened this issue Oct 27, 2017 · 6 comments
Open

Windows only: Consider using __argc and __argv #15

adishavit opened this issue Oct 27, 2017 · 6 comments

Comments

@adishavit
Copy link
Owner

adishavit commented Oct 27, 2017

On Windows there is an extension to get the args automatically without passing them in, as in:

int main()
{
   argh::parser cmdl;
   // ...
}

Things to note:

  • __argv et al. perform wildcard expansion, which may or may not be desirable.
  • Need a way to differentiate between auto parsing and default ctor.
@BeErikk
Copy link

BeErikk commented May 7, 2020

Windows programs have their arguments stored in the structure Process Environment Block (PEB), more specifically in ProcessParameters->CommandLine which is a UNICODE_STRING. The GetCommandLineW() function give direct access to the UNICODE_STRING buffer. This is true for both Windows GUI and console programs. The buffer contains the command line arguments as a single wide string, exactly as entered, for example like this:

"path_to_program" -arg1 /f -abc file1 file2

The C library startup code parses the string, assign and allocate argc and argv before calling main.

So, wouldn't it be easier to pass the unaltered command line string as is to argh? Splitting the string into arguments using space as delimiter would be straightforward, I think.

GetCommandLineA() will give a copy of the string converted to the system ANSI codepage. Which gives, even if you develop a UTF8 console program, the most efficient would be to parse the arguments as UTF16, retrieved by GetCommandLineW() and ignoring main's parameters.

@adishavit
Copy link
Owner Author

Thanks for your enthusiasm for Argh!
A few comments about this.

  1. Argh! tries to be standard C++ conforming and cross platform. It may be possible to add extra Windows functionality, but it must not alter the standard API.
  2. Pulling the command line "out of the air" via the default ctor, could be an issue, if the user then uses other pre-parsing methods like add_param(). I guess the ctor would have to be written such that a later call to parse() would just overwrite anything done before.
  3. Unicode support is problematic, confusing and inconsistent in C++ and on Windows specifically. There are standardization efforts for both better Unicode support and more modern ways to get the command line arguments. Maybe we will have a C++23 branch for argh that will support these when they portably arrive.

I don't have a lot of experience with parsing Unicode in general nor on Unicode variants on Windows in particular, so it is hard for me to comment. Would you like to make a pull request?

@BeErikk
Copy link

BeErikk commented May 7, 2020

Thanks for answering. Just FYI, getting direct access to command line buffer in PEB via GetCommandLineW() is often the usual way, at least when it comes to Windows GUI. It is possible to manipulate the string and even overwrite it without any consequences for the program. After all, it is just a string. My intention was more of a fancy, to mention an idea without any concrete method to implement it argh. None of the different option parser solutions available has considered this possibility, I guess due to the subject is very much UNIX centred.

@adishavit
Copy link
Owner Author

I’m not a Unicode expert but I know the way Windows handles Unicode is messed up. I took the liberty to consult some of the more knowledgeable Unicode/C++ experts on Twitter (where else 😆).
The lively discussion is here.

@adishavit
Copy link
Owner Author

Seems like the only sane and portable thing to do is UTF8 only.
As @cor3ntin says:

I would immediately convert each argument to utf8 with WideCharToMultiByte, keep the rest of the code as it & assume utf8 ...
DO NOT let wchar_t invade your project.
Smallest possible conversion layer.

Essentially converting anything to UTF8 as shown here:

#include <windows.h>
#include <string>
#include <algorithm>
#include <vector>

std::string convert(const wchar_t* wstr) {
    int s = WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), NULL, 0, NULL, NULL);
    std::string str;
    str.resize(s);
    WideCharToMultiByte(CP_UTF8, 0, wstr, (int)wcslen(wstr), LPSTR(str.data()), s, NULL, NULL);
    return str;
}

int wmain( int argc, wchar_t ** argv) {
    std::vector<std::string> vec;
    vec.reserve(argc);
    std::transform(argv, argv + argc, std::back_inserter(vec), convert);
    // parse(vec)
} 

If you need the data in some other encoding (e.g. for passing to WIN32 componenets) do it on the other end.

@BeErikk
Copy link

BeErikk commented May 10, 2020

Thank you for your effort. However, your answer doesn't address the subject. My point was about using the PEB command line buffer for option parsing. As I said this could either be a direct wide UTF16 string or a copied and converted narrow ANSI string. Strong preference for parsing the wide buffer directly. I also made a suggestion on how this could be achieved (see below). I'm sure your twitter friends all are skilled people with valid points in their arguments, but where do they apply in this subject? I'm reluctant to engage in flaming UTF8 vs UTF16 vs UTF32 discussions, but in practice, in Windows Unicode is UTF16 and has been so for about 30 years. It's not a matter of preference or flavour, it's just how things are. The whole underlying system is coded in UTF16. Also in practice, coding with UTF8 is more or less unsupported. You deal with it when handling data as in manipulating a webpage for example, but the API expects UTF16 when called. It's all due to the UTF16 'W' API variants vs the legacy 'A' as in 'ANSI' API variants. Mixing narrow UTF8 and ANSI is likely to be troublesome. As I understand, this is about to change when it comes to Windows UWP apps where UTF8 is encouraged to facilitate web-centric code. Anyway, nothing of this should be a bother for you in this library. Especially if you consider my suggestion in the thread

https://github.com/adishavit/argh/issues/8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants