Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsers that consume multiple arguments #284

Open
chris-martin opened this issue Dec 28, 2017 · 12 comments
Open

Parsers that consume multiple arguments #284

chris-martin opened this issue Dec 28, 2017 · 12 comments

Comments

@chris-martin
Copy link
Contributor

Is there a way to write a parser for an option that takes multiple command-line arguments?

For example, if I wanted the command-line args

-p one two -p three four

to be parsed into a result like

[("one", "two"), ("three", "four")]
@HuwCampbell
Copy link
Collaborator

Unfortunately not; I have some ideas for this could be done, but backwards compatibility is always an issue and I haven't sat down and had a real crack at it yet.

@HuwCampbell
Copy link
Collaborator

It's also not really recommended by the posix command line specifications.

You can always do something like what git does with key maps [-c name=value] for instance.

Maybe I can help design something else if you can share your problem?

@chris-martin
Copy link
Contributor Author

chris-martin commented Dec 29, 2017

I'm making a simple tool for string replacements, where each pair ("abc", "xyz") means "where you see string 'xyz', replace it with 'abc';" a bit like sed s/abc/xyz/g, but with multiple replacement rules happening at once, and no regex. Unfortunately the general-purpose nature of this tool makes the name=value strategy awkward, since the character = (or anything else we pick as the delimiter) could very well be part of the input, and I really don't want to force the user to escape anything beyond what their shell requires.

I suppose I could make the delimiter customizable with another option, that wouldn't be too bad. Or the first character in each replacement pair could be the delimiter; -c /name/value

@bgamari
Copy link
Contributor

bgamari commented Jan 23, 2018

I have also needed this in the past. It rather took me by surprise that option ((,) <$> str <*> str) (long "test") will parse --test testing as ("testing", "testing"). I would have expected it to instead parse --test 1 2 as ("1", "2").

@HuwCampbell
Copy link
Collaborator

I actually think that that may be a pretty good API call all things being equal.

Sorry for not getting back to you @chris-martin, I didn't have a good solution apart from just use many -e options with sed (like, why not?).

At the moment, we have

newtype ReadM a = ReadM
  { unReadM :: ReaderT String (Except ParseError) a }

which this obviously doesn't support. The last time we changed this API however (see #108 in 2014), it caused quite a lot of pain, anger, and frustration, and even lead to a few libraries jumping ship.

As optparse is quite low on the pile of haskell dependencies, I'm pretty cautious about large changes to its types like this. Although I think it's a bit daft, I have seen people using the ReaderT constructor in their ReadM definitions; more pernicious, is that people might just use str multiple times, which would still compile but be wrong if I were to make this API change.

So I'm not against this, by all means, but I'm not going to rush it through.

Before going further, I would need to figure out the ramifications.

Cheers,
Huw

@chris-martin
Copy link
Contributor Author

like, why not?

Because my program does this:

$ echo "banana" | text-replace --delimiter " " --mapping "a b b a"
abnbnb

not this:

$ echo "banana" | sed -e 's/a/b/g' -e 's/b/a/g'
aanana

@HuwCampbell
Copy link
Collaborator

HuwCampbell commented Jan 23, 2018

Fair enough, sorry for oversimplifying.

Going back to the proposed API. It may be problematic in that ReadM is a monad, so the number of arguments which could be parsed would be able to be changed depending on the input. This is powerful, but I'm not sure how good a trade off it would be.

Essentially, ReadM would need to change to a more general list parser, something like

newtype ReadM a = ReadM { runReadM :: [String] -> Either ParseError (a, [String]) }

instance Monad ReadM where
   return a = ReadM $ \inputs -> Right (a, inputs)
   ReadM r >>= f = ReadM $ \inputs -> do
     (a, rest) <- r inputs
     runReadM (f a) rest

readerAsk :: ReadM String
readerAsk = ReadM $ \case
  x:xs -> Right (x, xs)
  []   -> Left NotEnoughInput

@HuwCampbell
Copy link
Collaborator

Hi @chris-martin , @bgamari

Please see the many-readers branch (and for Chris, try it out with your program).

This has a few wider reaching implications on bash completion and the user interface, but seems to work pretty well for a first cut.

Not sure where this will go down the line.

Huw

@HuwCampbell
Copy link
Collaborator

Any luck @chris-martin ?

@chris-martin
Copy link
Contributor Author

Sorry, I don't know when I'll have time to come back to this.

@tiniuclx
Copy link

Hello,

What is the status of this issue? From what I gather, such a change is risky due to backwards compatibility, and because it affects other parts of the codebase. However, such a feature would be really nice to have!

The tool I am working on currently lets the user to search through their notes by keywords. However, multiple keywords must be surrounded by quotes: noot show --body-contains "keyword1 keyword2". This doesn't feel particularly good compared to noot show --body-contains keyword1 keyword2 .

If the new feature would behave in a manner similar to many (strArgument ..., (which I feel might happen without any extra work) the mixing of keywords and phrases would be effortless: noot show --body-contains keyword1 "multi word phrase" keyword2. This is also rather intuitive for searching since it matches the syntax of a Google query. I'm not quite sure how to neatly implement this feature otherwise.

I did notice that there is a branch for this, but I am not entirely sure how to use it, the top commit seems to change a lot of internal code I am not familiar with... I might just clone it and see if I can figure it out in a few days.

Kind Regards,
tiniuclx

@HuwCampbell
Copy link
Collaborator

HuwCampbell commented Nov 1, 2018

That's a good part of it.

Using the branch it is relatively straight forwards, you need to clone the branch (probably as a git submodule in your project) and add it to your cabal sandbox, you should then be able to just use as you would normally.

There are a few issues with this change though, which aren't small. The main one is that it's not the safest combinator to use. Take your example for instance:

noot show --body-contains keyword1 keyword2

If this was done with the naïve solution many str as the reader it would work in this case, but as soon one of your users tried this

noot show --body-contains keyword1 keyword2 --other-flag

the behaviour would be extremely unintuitive, as they would get "other-flag" as an argument in the list to body-contains, and the flag they wanted would still be off. Now I could try to cover this off by make str fail to parse arguments starting with "-", but that would:

So by making the API more flexible, I'd be making it really easy to write really bad user interfaces (things like ambiguous parsers with optional flag arguments as a different example).

So I'm definitely leaning towards not merging this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants