allow use of host:port in a FROM instruction #75

petitlapin · 2021-10-29T15:53:41Z

when having a FROM line like: FROM myregistry:port/imagename:tag, there is an error unexpected ':' expecting '@', a new line followed by the next instruction, at least one space, or the image tag.
This patch proposes to simplify how the registry is computed.

fixes: hadolint/hadolint#355

Note: I don't know haskell, I tested the change by cloning the repo, and testing on a local file using the test in integration-tests

petitlapin · 2021-11-21T15:07:39Z

@lorenzo @m-ildefons , hi can you please take a look when you have some time?

m-ildefons · 2021-11-23T13:08:34Z

Hi @petitlapin ,
I have no permissions on this repo, so I can't merge or even approve the workflows, but I checked out your code locally. First of all, thanks for taking the time to dig into this topic. The change has good intentions, but isn't quite there yet unfortunately. There are two problems that need addressing:

Existing unit tests must pass. To run the unit tests, execute the command stack test. It currently has two failures because it no longer distinguishes between images with / in their tags and images that are tagged with a registry:

[...]
Failures:

  test/Language/Docker/ParserSpec.hs:38:7: 
  1) Language.Docker.Parser, parse FROM, parse image with spaces at the end
       ASTs are not equal
       expected: [From (BaseImage {image = Image {registryName = Nothing, imageName = "dockerfile/mariadb"}, tag = Nothing, digest = Nothing, alias = Nothing, platform = Nothing})]
        but got: [From (BaseImage {image = Image {registryName = Just (Registry {unRegistry = "dockerfile"}), imageName = "mariadb"}, tag = Nothing, digest = Nothing, alias = Nothing, platform = Nothing})]

  To rerun use: --match "/Language.Docker.Parser/parse FROM/parse image with spaces at the end/"

  test/Language/Docker/ParserSpec.hs:67:7: 
  2) Language.Docker.Parser, parse FROM with registry, Not a registry if no TLD
       ASTs are not equal
       expected: [From (BaseImage {image = Image {registryName = Nothing, imageName = "myfolder/imagename"}, tag = Just (Tag {unTag = "5.12-dev"}), digest = Nothing, alias = Nothing, platform = Nothing})]
        but got: [From (BaseImage {image = Image {registryName = Just (Registry {unRegistry = "myfolder"}), imageName = "imagename"}, tag = Just (Tag {unTag = "5.12-dev"}), digest = Nothing, alias = Nothing, platform = Nothing})]

  To rerun use: --match "/Language.Docker.Parser/parse FROM with registry/Not a registry if no TLD/"

Randomized with seed 1298134040

Finished in 0.0158 seconds
195 examples, 2 failures

You really should add unit tests for your specific case. A) this demonstrates that the parser actually parses the registry:port/image:tag synatax correctly and B) this is needed to detect breaking changes in the future. The integration tests are insufficient because they just test if the parser produces an AST, not if that AST is correct.

Let me know if you need any help with that.

petitlapin · 2021-11-24T07:39:17Z

Hi,
thank you for the review and the detailed explanation.
I managed to run the unit tests and see the failures. I'll take a closer look and I'll come back if I get stuck or when it will work.

petitlapin · 2021-12-11T18:23:42Z

Hi,

not sure on how much time you want to spend helping me but if you don't have, I guess we can close this PR and it can be implemented later by someone who has the knowledge.
On adding the unit test, it's done, not the most difficult part :).
I have issues with the syntax and what I want to achieving what I want to do. My logic is: go until a '/'. If before, there is a '.' or a ':', it is a registry, else it is an image.

parseRegistry :: (?esc :: Char) => Parser Registry
parseRegistry = do
  registry <- someUnless "a domain name" (\c -> c== '/')

will give me whatever is before '/' if there is one.
But I don't manage to create a function that takes this registry variable to check if there is a dot or colon in it and returns registry if it is the case, else Nothing.
Easier would be to be able to call someUnless directly on this registry variable but I don't think it can work because we can't pass the text argument to someUnless?

m-ildefons · 2021-12-14T12:53:41Z

Hi,
sorry it took me so long to answer, but I want to get it this right so you don't have to guess as much.

Regarding your question, try writing down the type signature of such a function first, it makes things somewhat easier:

isDomainName :: Text -> Bool

The input is just a Text (for simplicity think array of characters) and we want to know if it contains a . or a :. Therefore we utilize the find function from the Data.Text library, which will search a Text with a predicate function and return a Maybe Char, indicating success with a Just c and failure with a Nothing. We'll use the predicates (== '.') and (== ':') to search in our input and just list the possible cases:

isDomainName dom
  | isJust (find (== '.') dom) = True
  | isJust (find (== ':') dom) = True
  | otherwise = False

This is Haskell syntax expressing the equivalent of a switch/case statement int C++. It will check from top to bottom each case indicated by the |. If the left side of the = evaluates to True the function will return the right side of the =. otherwise always evaluates to True and just catches the default case.

Just as a side note, someUnless needs to succeed on at least one character, so make sure you are not expecting to be able to parse empty strings with it. Also (\c -> c == '/') can be written in short form as (== '/').

You can't call someUnless directly on the registry variable because it is not a function, but a monad. Think of it as a variable that does not hold a value like '5' or 'foo', but it holds a computation as a value. Calling it is impossible, but is value is a computation that can be evaluated with the runParser function. However that is not how parsers are supposed to be used in this context. You're supposed to combine the simple parsers from the library into more and more complex parsers until you have modeled the syntax.
I don't know how much you know about the way megaparsec works, but I'll try to explain anyways just in case.

A Parser T for a type T is a 'thing' if you will that can be evaluated with an input. It will either succeed and result in a thing of type T or it will fail, in which case an appropriate error message will be generated from the last state of the parser. Since parsers have such a coarse granularity with their failure mode, they are usually written only for small portions of the input. In our case that means we probably want multiple parsers each handling a different part of an image uri. These small parsers are then combined with choice, try, optional and other 'combinators' into a larger parser that parses the whole input (the whole uri in our case). This architecture will reflect the different parts of the input by modelling each part as a parser, and as a result we have to deal with much less combinations of different optional components of an image uri.
I hope you get what I'm on about, but let give you some pseudo-Haskell to illustrate:

-- This parser will succeed only if it can find something that looks like a domain name
-- at the beginning of the input. In case it also finds something that looks like a port
-- (e.g. `:5000`) it will also consume that and append it at the end of the return value.
parseRegistry :: Parser Registry
parseRegistry = do
  domain <- parseDomain
  port <- optional parsePort  -- note the `optional` [1]
  if isJust port
    then return $ Registry ( domain <> ":" <> port )
    else return $ Registry domain

-- This parser only succeeds if it can find something that roughly looks like a domain name  
parseDomain :: Parser Text
parseDomain = do
  subdomain <- someUnless "a subdomain" (/= '.')
  void $ char '.'  -- this ensures there is at least one dot '.' in the domain (e.g. 'docker.io')
  domain <- someUnless "a domain" isIllegalChar
  return $ subdomain <> "." <> domain

  where
    -- returns `False` as long as the given char may occur in a domain name and `True` if it is
    -- illegal in the context of a domain name (e.g. ':', '/', '&')
    isIllegalChar :: Char -> Bool
    isIllegalChar c = (not isAlphaNum c) && (c /= '.')

parsePort :: Parser Text
parsePort = do
  void $ char ':'  -- needs to start with colon, but  we will throw it away
  some digitChar  -- one or more digits, fails if there are none

I haven't tried any of these parser, but I think they should convey how the program could be structured and how combinators can be leveraged to deal with optional parts of a URI. Note however that this is still a very naive implementation and will fail horribly in edge cases like raw IPv6 addresses.

[1] The optional combinator will try to run the given parser on the input. If the parser (returning type T) succeeds, it will result in a Just T, if it fails, it will result in a Nothing.
I'm not sure if it is the right thing to use here, because I'm not sure about its behaviour in case parsePort fails after consuming input, but an alternative would be (Just <$> try parsePort) <|> return Nothing. This alternative will not consume input in case parsePort fails, which is important, because we don't want to lose part of an image name.

I hope you are not discouraged from Haskell by this. You are working on a hard problem and these parsers are not easy to grok. If you have more questions, feel free to ask, I'm happy to help you.

petitlapin · 2021-12-19T15:52:16Z

Hi,

don't worry for the delay, 3-4 days is more than acceptable :). I have a few things to do before coming back to this PR but I'll check it as soon as I can. Thank you again for the complete explanation (and yes, there is no need to do a solution in a hurry if it is not good enough or come back to it later)

fixes: hadolint/hadolint#355

petitlapin · 2022-01-15T16:21:28Z

Hi,
Sorry I haven't found time to continue this issue before. I was trying to fix this issue because we faced it at work, but finally we don't have these domains without tld anymore. But I would still like to fix the issue even if it is it no more work related, you already took a lot of time to help, I wouldn"t waste it.

I still have issues trying to understand how to make it work.

A registry can either be (I added the corresponding unit test in the PR):
domain.tld (docker.io)
domain:port (docker:4000)
domain.tld:port (docker.io:4000)

followed by a slash. However, if there are neither a tld or a port, it is not a domain but part of image name.
So I would like to write something like:

parseRegistry :: Parser Registry
parseRegistry = do
  domain <- (Just <$> try parseUntilDotOrSemiColonAndNoSlash) <|> return Nothing -- if we find a / before a : or a ., there is no registry
  port <- (Just <$> try parseUntilSlash) <|> return Nothing
  return $ Registry (domain <> separator <> port) -- where 'separator' would be the first : or . found

so parseUntilDotOrSemiColonAndNoSlash would be something like:

parseUntilDotOrSemiColonAndNoSlash :: Parser Text
parseUntilDotOrSemiColonAndNoSlash = do
  subdomain <- someUnless "a subdomain" (\c -> c == '.' || c == '/' || c == ':')
  if c == '/' -- Issue: how can we access c here or fail above if the first character is a '/'?
  then return Nothing
  else return $ subdomain

and parseUntilSlash:

parseUntilSlash :: Parser Text
parseUntilSlash = do
  portOrTld <- someUnless "a subdomain" ( == '/')
  return $ portOrTld

I can commit the (non working) code if it is easier to debug.

allow use of host:port in a FROM instruction

f5e79e0

fixes: hadolint/hadolint#355

petitlapin force-pushed the patch-1 branch from 6b7bebd to f5e79e0 Compare January 15, 2022 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow use of host:port in a FROM instruction #75

allow use of host:port in a FROM instruction #75

petitlapin commented Oct 29, 2021

petitlapin commented Nov 21, 2021

m-ildefons commented Nov 23, 2021

petitlapin commented Nov 24, 2021

petitlapin commented Dec 11, 2021

m-ildefons commented Dec 14, 2021

petitlapin commented Dec 19, 2021

petitlapin commented Jan 15, 2022

allow use of host:port in a FROM instruction #75

Are you sure you want to change the base?

allow use of host:port in a FROM instruction #75

Conversation

petitlapin commented Oct 29, 2021

petitlapin commented Nov 21, 2021

m-ildefons commented Nov 23, 2021

petitlapin commented Nov 24, 2021

petitlapin commented Dec 11, 2021

m-ildefons commented Dec 14, 2021

petitlapin commented Dec 19, 2021

petitlapin commented Jan 15, 2022