Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Var newtype with GADT to increase power in update callbacks #260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ruhatch
Copy link
Contributor

@ruhatch ruhatch commented Mar 28, 2019

This PR replaces the newtype Var with a version using GADTs. This increases power in the update callback, where one must provide a state update forall v. Previously if I had an output Var (Int, String) v, I would not be able to construct a Var String v, whereas now I can:

case output of
  VarSymbolic (Symbolic s) -> VarSymbolic (Symbolic s)
  VarConcrete (Concrete (_, s)) -> VarConcrete (Concrete s)

An extension of this would convert the Symbolic and Concrete types to nullary types, reducing them to tags. This would remove the need for two layers of constructors in Vars. I wanted to submit the PR without the extension to reduce breaking changes to the API. In this version the only breaking change is the change to the Var constructor.

@Skyfold
Copy link

Skyfold commented Jan 30, 2020

I really could use this update! I keep having to choose between saving extra useless information in my model and putting enough in the exec output to fully check everything worked.

@jacobstanley
Copy link
Member

I'll take another look at it

@HuwCampbell
Copy link
Member

I thought this was a bit odd when I first saw it.

Essentially, the difference this provides is that one will be able to pattern match and inspect the values during Update callbacks; but I don't think this is a good idea, as it means that the state built during generation may not line up with the one created during execution.

This could lead to some very odd bugs, where you're running commands and expectations on states which shouldn't support them.

Do you have a motivating case or short code snippet which demonstrates your problem?

@Skyfold
Copy link

Skyfold commented Feb 3, 2020

For example, I've got this exec function for a Command that creates a session where I generate the UUID locally. If I just return the UUID then I won't know if the request worked or not, but my model only needs the UUID, not the Status. So I end up with Var (HTTP.Status, UUID) v instead of Var UUID v in my model. Its not the end of the world and I can wrap the model in a lens that makes the Status unavailable. When I saw the above PR I just thought it could be a way to clean up my code.

    exec :: StartSession Concrete -> m (HTTP.Status, UUID)
    exec (StartSession var sec) = do
      uuid <- liftIO UUID.nextRandom
      response <- 
        HTTP.httpNoBody $ HTTP.mkRequest env (req uuid sec $ concrete var)
      return $ (HTTP.getResponseStatus response, uuid)

@HuwCampbell
Copy link
Member

I'm having a think about this.

It's a bit surprising to me that server's response when starting a session wouldn't change an expectation you might have down the line.

If the HTTP request fails, isn't that important to how your application works, the information you expect the server to know (there's a new active new session), and the properties of the system you expect to hold (the session is active till I close it)?

If it's not important, then either your model seems inadequate in that assumes that sessions don't actually need to exist on the server at all; or they actually don't need to exist on the server, and you might as well just not make the HTTP request.

This doesn't sound realistic.

To help model this, I would:

a) consider the UUID to be an input, and have it as part of the generated command (this way it's available for the model update anyway); or
b) contain the UUID in the response.

@Skyfold
Copy link

Skyfold commented Feb 3, 2020

It's a bit surprising to me that server's response when starting a session wouldn't change an expectation you might have down the line.

That request returns no data. Its only purpose is to tell the server the unique identifier that will be used for subsequent analytics calls and to let the server annotate that identifier with the request's meta data. That is why the UUID is needed for the model for subsequent requests and the Status is only needed for the Ensure. The Command should fail if the Status isn't 201, but if the Command doesn't fail I already know the Status so why put it in the model?

What I'm trying to show is that there are elements you return from your exec function that are just for Ensure and there are possibility separate elements that are for your model. However, because none of those values exist at compile time they are all encapsulated in the Var and there isn't a way to split out what you don't want in your model. Think about it this way: There are various elements you want to check from your response: status, headers and body. However, you don't want the status and headers in your model because they don't affect the execution of other commands. They are not part of the state of your server either, just meta data you want to ensure is correct. Its things like this that are fetched or created by the exec command to be checked with Ensure, but don't belong in the model.

Notes:

  • MonadGen doesn't let you do IO so I need to generate the UUID in the exec not in the command (UUID.nextRandom :: IO UUID).
  • This is the model for an older version of the API that I need to make sure I don't break. Having the API give me the session id is normally how sessions should work.

@HuwCampbell
Copy link
Member

I'll make more systematic comments later, but for now, I still think using the UUID as part of the input is a better way to go.

You can generate one with UUID.fromWords and standard generators like Gen.integral_.

@Skyfold
Copy link

Skyfold commented Feb 5, 2020

At first using UUID.fromWords to generate the UUID as part of the input seems like a really good idea, but it will actually cause the shrink to fail hiding actual errors. The issue is the API requires the UUID to be unique for each session and the shrinks will often re-use that UUID. So, when any sequence of commands that starts a session fails, the reported failure will be the failed start session shrink instead of the command that actually failed.

I've got a possibly better example of the Var needing to contain extra data. I didn't pick this one before because it is more complicated and has an unrelated issue. Here is the exec call:

    exec :: AnalyticsSearch Concrete -> m (HTTP.Status, SearchUUID)
    exec (AnalyticsSearch jwt qOpt uuid sec rand) = do
      let searchUUID :: SearchUUID
          searchUUID = SearchUUID
            { _search_session_id = uuid
            , _search_answer_options = 
              (concrete qOpt & traversed . answer_options %~ getSubset rand)
            , _search_section = sec
            }
      response <- HTTP.httpNoBody $ 
        HTTP.mkRequest env (req (concrete jwt) searchUUID)
      return $ (HTTP.getResponseStatus response, searchUUID)

The getSubset :: Integral a => a -> [b] -> [b] uses Shrink.list internally (from Hedgehog) to pick a subset of the answer_options list using a random Word16 from the command input. I'm doing this to simulate a user picking a random set of search options and then sending that choice to the analytics. However, I also get the search options from the API, so there isn't a way for me to create a random SearchUUID for the command input. I can only create a SearchUUID when the command input is Concrete, so it gets bundled with the HTTP.Status and both end up in the model.

The unrelated issue is if the way I'm getting a random subset of data I don't have at compile time is going to break shrinking or if there is a better way to randomly choose parts of data generated/fetched by commands.

@HuwCampbell
Copy link
Member

I would again posit that this should be modeled as input.

However, I also get the search options from the API

I don't think this should be the case, no? This would appear to mean that your model for the system doesn't accurately capture some of its important concepts.


I do understand now what you're asking for (I don't think it's this PR). I believe that what you want is for a Functor constraint to also be revealed to the update function. Like this:

Update (forall v. (Ord1 v, Functor v) => state v -> input v -> Var output v -> state v)

which would allow you to map over both symbolic and concrete outputs, and place either one or more projections of them into the state.

I could get this to type check, the issue though, is that it wouldn't actually work.

Symbolic a is essentially Const Int a. Think about the functor instance for this type; it does change the int. What a MyState Symbolic holds is essentially a bunch of names referring to the expected outputs of an execution.

If we had two differently typed values in there, but they were both derived from the same output, they would still hold the same name. We couldn't differentiate them when attempting to reify, nor turn them into correctly typed values.


Sorry to be repetitive, but I think you probably are reaching for this because you have a mismatch between what should be an input to your program and what should be an output from an execution.

@Skyfold
Copy link

Skyfold commented Feb 6, 2020

I agree that there isn't a meaningful way of splitting out what you want in your model and what you only wanted to test. What I am curious about is this mismatched between what should be an input and what should be an output from an execution.

If you will excuse this off topic discussion, here is the crux of the issue. Each API call, beyond user setup, is dependent on dynamic data from previous API calls. Now, I could copy down the current state of some of these endpoints and use them as inputs or write generators for all posible values, like you describe for the search options, but that would mean my Commands would only testing the functionality of each endpoint not the interplay between that functionality and the current data on the API.

As an example: If I made the search options an input that I generated locally that would test creating a valid search from any set of search options, but not the search options that are actually on the API. Ideally I should have both, but the latter is more important for making sure what is there now will work as expected.

Making the UUID in the input instead of the output is a different story. It makes sense for it to be an input, but I'm don't know how to adjust the shrinking to never re-use the same UUID.

So, my question to you is why do you think a model for the system doesn't accurately capture some of its important concepts if it has to get them dynamically (as a Var) instead of at compile time (as an input)?

Edit: I hope I have interpreted your words correctly, let me know if I've misunderstood what you are advocating. I'm also asking this because I'm doubting the way I am writing my Commands.

@jacobstanley jacobstanley force-pushed the master branch 2 times, most recently from 4139585 to c228279 Compare May 22, 2022 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants