Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design discussion: serverless.js config function exports. #4828

Open
simonbuchan opened this issue Mar 14, 2018 · 17 comments
Open

Design discussion: serverless.js config function exports. #4828

simonbuchan opened this issue Mar 14, 2018 · 17 comments

Comments

@simonbuchan
Copy link
Contributor

This is a Feature Proposal

Description

Breaking this out from #4787 as requested by @erikerikson

In that issue I stated that I would like to see several features implemented before I would consider using the undocumented serverless.js feature useful over YAML, and thus worth documenting. It boils down to "do what webpack does".

I created a PR (#4827) for returning promises, which allows dynamically looking up values asynchronously in order to generate a configuration. As you can see from the example serverless.js there, however, there are a lot of hard-coded values that should be parameterized, e.g. what region/stage are we deploying to.

As mentioned, this is basically a webpack feature (though I doubt they were the first), and they implement it with a flexible --env CLI parameter which allows passing arbitrary data to the configuration function - so simply lifting that as is is an option.

Alternatively, passing the full Serverless object, and letting it poke around in there how it wants is another extreme.

More likely the right answer is somewhere inbetween!

Since this has quite a bit of user-facing impact and could create a future maintenance burden, this issue is to give the for the serverless team the opportunity to give feedback about how they would like this to work.

Additional Data

  • Serverless Framework Version you're using:
  • Operating System:
  • Stack Trace:
  • Provider Error messages:
@HyperBrain
Copy link
Member

HyperBrain commented Mar 14, 2018

Continuing the discussion in #4787, I will explain a bit more, why the js approach does not have any advantage in regards of dynamically configuring Serverless over the yaml file and is and must be restricted in the same way as the yaml based configration is.
Hopefully, I mentioned all of you below, so we can continue the discussion here 😄

@thomasmichaelwallace

I see a lot of the things that serverless does to extend yaml (the variables, pulling in files &c) as a sign that maybe yaml isn't really suited for more complicated configurations

The configuration in Serverless is declarative, i.e. the configuration defines exactly how Serverless is configured and the main Serverless object is created. This is a fundamental design decision on which the whole architecture is based. As a consequence, the configuration itself, whether it is yaml or js is limited according to this paradigm. That means in practice, you cannot access anything else in the config than incoming command line parameters.
Supporting promise based js configurations does not change anything in the nature of the configuration itself nor does it change the fundamental design (more clearly: it must not!).

setup

documented, with extended loader for getting the required information from serverless- I'm thinking about the stage at the very least here

This is IMO not possible, because the configuration must lead to a deterministic creation of the Serverless object, so it cannot be available in the configuration itself. Additionally the variable resolution must operate on a fully consistent configuration object to be able to build a variable tree to resolve variable dependencies (asynchronous ones too!) in a deterministic way. That forces the variable resolution done after the configuration has been completely provided. The current variable resolution (available in master) handles all possible resolution paths. To see what I'm talking about, please have a look at the unit tests of the Variables class.

Any approach mixing the configuration setup and the variable resolution will certainly introduce indeterministic paths and races that cannot be solved by a separated simple-to-test algorithm (at least not without drastically complicating the implementation and adding huge technical debt).
As pointed out by @erikerikson , variable resolution needs a fully configured configuration, because the cf: and ssm: type variable resolutions need a fully resolved region variable to be able to execute the correct AWS API requests (the same might be true for other providers like Google or Azure in the future). And the region variable might depend on other variables that are resolved by walking the variable tree.

BTW: Non-AWS provider support is also a requirement for the configuration and this discussion and any approach or idea should take them into account.

@simonbuchan

Alternatively, passing the full Serverless object, and letting it poke around in there how it wants is another extreme.
What about passing the Serverless instance, and the config can do whatever it wants? Does that break things?

This is imo not possible as it creates the above mentioned issues and breaks the architecture and variable resolution. It also violates the separation of concerns pattern. The Serverless object initializes itself from the configuration and makes sure that everything is setup properly afterwards. In fact this initialization is much more than just setting some variables/properties - there is also the plugin system that is initialized by the Serverless object.

So from an architectural point of view, the Serverless object can only exist after the configuration object is completely resolved. The factory for the Serverless object fetches the configuration, resolves its contained variables and creates the Serverless object from that. During the creation of the Serverless object, the plugins are initialized, so that we have a fully initialized object as outcome.

@simlu

In short it feels like config loading has some design problems.

First the current design must be clear, how it is structured. The current architecture (see above) also explains, why the config loading (YAML, JSON and JS) is separated from the variable resolution (and why the variable resolution needs a completely loaded config).

@erikerikson

To be clear, my suggestion that it might be good to create a more specific and targeted issue or PR. I didn't want to totally spell this out because politics and because I have questions where I might have a more complete understanding of risks.

Agree. I also think that everyone of us must understand the current architecture, how the system is designed (and why this is so) and, important, what the pain points are, what risks different solutions bear and which problems they exactly solve. (Is "politics" me :-D ?)

Having a possibility of a promised approach in the JS configuration is not the problem here and only an implementation detail. That is ok and an improvement of the current JS config, but does not really matter for the discussion.

IMO, the very first thing to do is to discuss the general problems I mentioned above and think about what problems should be exactly solved, and how they can be solved (including an analysis of the risks imposed by different approaches). I don't think that this is a one-PR-simple-thing but has to be discussed and analyzed deeply first.

@erikerikson
Copy link
Member

Although you are politics a little @HyperBrain but maintaining good conversations well can be non-trivial. Thank you very much for this background and information. To be honest, I've learned as little as possible (I should maybe but the bullet) and this advances my understanding.

@simlu
Copy link

simlu commented Mar 14, 2018

@HyperBrain Thank you very much for that (rather long) post. There some great points in there (and some that I didn't fully understand). So let's dig into it:

The configuration in Serverless is declarative, i.e. the configuration defines exactly how Serverless is configured and the main Serverless object is created. [...] That means in practice, you cannot access anything else in the config than incoming command line parameters. - Yes, but we are still relying on external information e.g. variables that come from the server (e.g. cf: variables) and current stack state. The serverless.js makes it a lot easier to obtain additional information for power users. Is that a contradiction to your point?

I love your flow chart, but can you add file resolution in there? I'm currently very confused how that works / is supposed to work. Also looking at your chart, I'm really interested how this bug is even possible.

Continuing reading I think I understand better. Basically the configuration is supposedly completely resolved before variables are loaded. If I understand that correctly this prevents our use case and we need to rely on serverless.js and break the paradigm. Or in other words: How do we support loading of referenced files based on current state of the server / external variables? If I understand correctly you're suggesting to resolve the config completely before substituting variables. But we can't resolve before we have the variables.

Hope I made my point clear. If you could fill file resolution into the discussion I'm sure it would clarify a lot of things. Good stuff, thanks again!


Totally separate suggestion: Is there any reason serverless.js shouldn't take precedence over serverless.yml? The js file can load the yml file, but the other way around is less likely.

@HyperBrain
Copy link
Member

@simlu Thanks for the response. I'll add a proper reply and updated diagram to show some missing parts that I apparently skipped in there tomorrow (UTC+1 here ;-) )

@simonbuchan
Copy link
Contributor Author

So from an architectural point of view, the Serverless object can only exist after the configuration object is completely resolved. The factory for the Serverless object fetches the configuration, resolves its contained variables and creates the Serverless object from that. During the creation of the Serverless object, the plugins are initialized, so that we have a fully initialized object as outcome.

This makes sense, especially since webpack does the same - allowing poking into the runtime via plugins as well, applying them after configuration is complete. It should be straightforward to allow serverless.js to return actual plugin classes to do whatever wiggy magic you want:

module.exports = {
  service: 'example',
  plugins: [
    'serverless-webpack', // as before
    class MyPlugin {
      constructor(serverless, options) {
        // some magic
        this.commands = ...;
        this.hooks = ...;
      }
    },
  ],
};

The only thing is that serverless plugins being classes means they are a little more clumsy to factor out, with having to return a new class - webpack plugins are instances with an apply(compiler) method, so you just new MyPlugin({ some, options }). Not a big deal. but allowing plugins to be a function (serverless, options) => { commands, hooks } would be a little lighter-weight. I'd be happy to PR both changes, if you have a preference?

With that, I think just passing CLI options in some form to the config function would be fine: do you have a preference?

  • webpack only passes --env, but it's very flexible and ensures new flags won't conflict with user options.
  • Or simply pass the Serverless.processedInput, e.g. options and commands in some form, and let the user figure it out.

I have to say I am rather confused about how options currently combine with serverless.yml: I think it currently works that the config is parsed, then variables expand, then options override? So if provider.stage is alpha and --stage beta is passed, ${self:provider.stage} is still alpha, but the used stage is beta? That could be a problem if that still applies to serverless.js and all options are passed.

module.exports = ({ stage = 'alpha', client = 'internal' }) => {
  return {
    service: 'example',
    provider: {
      name: 'aws',
      stage: stage + '-' + client,
    },
  };
};

Here the intent is:

  • --client bob deploys to alpha-bob
  • --stage beta deploys to beta-internal
  • --stage prod --client alice deploys to prod-alice

but if my guess is right, then only the first would work as expected, the latter two would use simply beta and prod. If so, then a fix could be function configs only don't get the stomping behavior (e.g. the config value has priority), but that's confusing too. More likely that would mean something like --env so there's no chance for confusion.

@thomasmichaelwallace
Copy link
Contributor

Indeed- while I was thinking through how I would use a severless.js to replace our current hack of cat serverless.${AWS_PROVIDER}.yml serverless.common.yml serverless.project.yml > serverless.yml I was thinking that the one thing I'd miss was being able to access the stage.

Thinking it through I was considering putting together a noddy utility function that inspected the arguments (every javascript function has access to process.argv) and extracted the --stage. However that would be fragile at best, so I am particularly interested in something a little more robust. Although, if the serverless module did expose a utility class to build the opt: variables (it might already- I haven't looked), then that could be a reasonable compromise without making the config loader a function.

@simlu
Copy link

simlu commented Mar 18, 2018

@thomasmichaelwallace I knew we were not the only ones trying to resolve this problem. Similar to how you describe it we tried to nicely separate our configuration, but serverless has some deficiencies when it comes to resolving separate files into the configuration as discussed here.

We have separated this now in yaml-boost. Be very happy to discuss how / if this should be part of the serverless yaml parser logic or if using a config loading wrapper like this is the appropriate thing to do. Since you are already doing something similar I'd love to get your feedback on the yaml-boost package!

@thomasmichaelwallace
Copy link
Contributor

thomasmichaelwallace commented Mar 18, 2018

@simlu That looks interesting- although I still think that sticking to straight yaml and json with the option to use javascript for more complex scenarios makes the most sense:

  1. adding to an existing language (the way serverless already does for yaml and json) makes things surprising (n.b. I consider surprise to be a negative user experience). Not only do people who have never seen yaml have to learn it, but even if they do know it, they then have to learn and reason about a whole host of serverless specific extensions. Just consider the number of people accepting my cat files approach because they already understand cat and haven't got their head around the ways that serverless allows you to import in files. This puts a lot of pressure on the documentation (as tutorials about yaml/json won't discuss these).

  2. The json/yaml as 'static' configuration and javascript (/python, I'm unsure about golang) for dynamic/complex configuration is a well established pattern. I feel a bit like a broken record here, but webpack, mocha, babel, eslint, all do this, so it's much less surprising. And it seems a reasonable ask that people using serverless to deploy node/python functions be capable of using the language to express complex configuration objects if they need to. (I suggest that the number of node developers who are coming to serverless and know how to merge configuration from multiple sources into a single json object using javascript is greater than those who know how to do it using yaml and even greater than those who know how to do it using the serverless specific extensions.)

I guess there's inevitably a degree of subjectivity here. But upon learning that serverless accepts a javascript module I immediately regretted all the yaml/cat/script hacks I'd used to standardise and avoid repeating myself across the 50 or so serverless repos we use; as I could have just used the same techniques I'd used for webpack, babel and eslint and ended up with something our other devs who haven't spent as much time as I have wrangling with the framework could easily reason about.

@simlu
Copy link

simlu commented Mar 18, 2018

@thomasmichaelwallace Very good discussion here!

I'm just going to take a step back and list some observations:

  • yml is the recommended standard by serverless that most people will use
  • js is not documented as configuration at this point
  • serverless yml loader has some (necessary) extensions, however the complexity is not that big. I've recreated the functionality in a few lines. Mainly we are talking about (1) file resolution and (2) variable resolution that serverless added. Anything I'm missing?
  • as projects grow, more complex functionality for the yaml files is required (mainly splitting gigantic files, but probably others), which is probably why the yml functionality was custom extended

With these observations in mind:

  1. There is going to be a natural progression that projects will go through as they mature, requiring them to use additional features that are not part of the standard yaml syntax. So at some point projects will make use of the custom yaml additions that serverless made or they will switch to using serverless.js
  2. Making use of the custom syntax is easy if the project is already using yaml. Switching to serverless.js would require more work. This is different if you already have a lot of knowledge about the project complexity and are using serverless.js from the start - but lambda is hot and a lot of new people are coming in. All these people will go through the natural progression: yaml -> extended functionality -> maybe using serverless js

The previous two points now leave us with options: (a) Extend serverless custom yml logic further to capture most of the functionality that projects eventually require and if they really need to, recommend serverless.js eventually or (b) force people into using serverless.js as early as possible and recommending it as the standard for projects that will become somewhat complex. Maybe even making it the overall standard.

I've started (a) by writing yaml-boost and for now this is my preference, since that's what serverless started doing. However our projects are probably not as mature as yours. If (b) is an option this would be a pretty fundamental change.

Do you feel this is an adequate summary? Do you agree with the above? Am I overstating the complexity of changing from yaml to js? Based on your commend "I could have just used the same techniques I'd used for webpack, babel and eslint" there must be some complexity in switching over. Would you be able to just drop in yaml-boost (have a look at the Serverless Example readme section)?

@thomasmichaelwallace
Copy link
Contributor

Personally I think you're overstating the complexity of changing from a yml to javascript. I mean, at it's simplest you're just exporting an object with the same keys and the yaml.

Stick module.export = infront of the output from this https://www.json2yaml.com/convert-yaml-to-json and you'd be right.

I guess there's arguments either way for yaml vs javascript as configuration. But I've never been much sold on the additional overhead of potentially asking someone to learn a new language to configure something.

That said, I'm under absolutely no delusions that I'm going to convince everyone that the module.export() approach should be the standard. But I think it's a fair statement that, at the point you're inventing a new language to express your configurations (yaml + serverless extensions + new operator overloads from the yaml-boost library + whatever else you need) you're just adding overhead; so why not use an existing language that can do all of these things (javascript will happily merge objects, concat arrays and load files) and, incidentally, you are already using to write the functionality.

(n.b. this point comes with a with a repercussion of inclusivity. I expect, given that serverless is not just used by javascript developers, such a solution would want the equivalents for python and golang and whatever else. although, unfortunately, I can believe it's hard to find a serverless developer who hasn't already had to learn javascript 101.)

@simlu
Copy link

simlu commented Mar 18, 2018

@thomasmichaelwallace If you are just using plain yaml without file inclusion this is easy. However if you use the extended serverless yml functionality it is actually more complex to switch to js (you can use sls print, but then you've lost the variable and possible file resolution). I'm thinking for what we currently do with yaml it's far from trivial to switch to js export (which is why I wrote the custom loader abstraction instead).

You make a good point wrt to languages. Having the configuration language independent means it's much easier to exchange code between projects that use different lambda function languages. Having to replicate the functionality for serverless.js, serverless.py, etc doesn't sound great (I'm talking online configuration examples, discussion etc). I much rather would use yml one configuration language.

So it seems we might end up with (a). There will then be a pain point - pretty much exactly what you are experiencing - where yml plus the extended serverless loader functionality isn't enough and you then need to switch to serverless.js. My preference here would definitely be to extend yml custom logic to a point where it should not be necessary (but possible) to switch to js for most cases.

Could you please detail what you are referring to here "[...] all the yaml/cat/script hacks I'd used to standardise and avoid repeating myself". We are currently building up exactly that - many projects with shared configuration. What were the problem points here and how would you have better solved them using js? Your answer will help validating this feature request and highlight what custom yml loading logic might still be necessary to solve for your use case using yml configuration. If you can link related / relevant issues / discussions that would be great too!

@erikerikson
Copy link
Member

I'm not sure the progression to serverless.js is all that inevitable. It presumes a sort of meta programming approach to specifying your services which is sometimes valuable but not always necessary. There is perhaps room for a specification of the coordination or treatment of a set of services. However, largely speaking you can do a huge amount with the yml declarative style. I believe efforts are underway to facilitate an improved story around uncovered use cases.

That said, the use of serverless.js is likely to remain desirable for an important subset of the user base. I think supplying available elements such as processed command line options (environment variables are already exposed) is reasonable but that immediately faces all sides with the correct resolution problem. By that I mean the order with which one considers a variety of sources. See the getStage and getRegion methods of the AWS provider as one example of an attempt to retain consolidation of this logic/policy that had once been distributed around the code base and inconsistencies around which were coming in PRs prior to the consolidation. It seems important to provide a reasonably generalized mechanism in that space before or as part of formalizing the passing of any config to serverless.js

@simonbuchan
Copy link
Contributor Author

@simlu & @thomasmichaelwallace - your discussion seems to be more about yaml vs js at this point, rather not exporting a function? It's valuable for serverless to have a position on js, but i think that would be more relevant to the doc issue #4829

@simonbuchan
Copy link
Contributor Author

@erikerikson yeah, that's the stuff I'm interested in, I'll take a look at those.

Also, FYI all, you can put variables including imports in the js export. It's just less flexible than being able to drive logic off them.

@thomasmichaelwallace
Copy link
Contributor

@simonbuchan fair point, and I also agree with @erikerikson - there's going to be people, like me, who feel more comfortable in javascript and want to use it, and probably most relevant to this issue is the shape of the information passed to the function to allow people to use serverless.js effectively (stage, normalised provider data like region, and many other things I expect I don't know about!)

(@simlu for completeness- our hacks are a) have a versioned common file to import in environment variables &c, although this means we end up injecting far more than we need to, b) have 'mixins' for common long items like adding a base path mappings, authenticators &c to prevent repeating, c) pull in a merge various IAM requirements from shared internal packages (where used; e.g. dynamodb workers need access to dynamodb) with replacements as needed. d) supporting different deployment options when using different profiles (e.g. having lambdas in sandbox'd accounts).)

@dougmoscrop
Copy link
Contributor

dougmoscrop commented Apr 5, 2018

I've often thought that plugins ought to have configuration passed to their function, not their constructor. Even as it stands, what plugins get during ctor isn't entirely reliable/full, at least last time I checked. You usually bookmark the 'service' object, but accessing it during your ctor is a Very Bad Idea and you have to wait until a command is running to actually read anything off of service.

I also think the current state of affairs is a bit naive, there's some weird circular dependency stuff going on and for example plugins are part of config, but depend on variable resolution.

Furthermore, plugins alter config all the time - they inject things in to the configuration (Ok, maybe they don't? Maybe they alter something that is post-config but I don't think it is represented in the above diagram)

Architecturally, I think maybe "config" and "variable resolution" could be phases that occur within sub-sections of the service rather than as a global pass over it - so load the provider, load the plugins, (each of those steps can resolve variables -- but e.g. your provider section can't reference variables from your functions section). There's a hierarchy.

The current variable system, using files, is already dependent on external things (I mount Wasabi over s3fs as a file system sometimes 😱) -- it's just 'hacked' because of nodes *sync methods and require(). Consider YAML parser removing the sync option because it does some delegating in to userland C++ and how you might support that. Variable resolution really ought to understand async resolution (probably involves rewriting the variable system)

@jlarmstrongiv
Copy link
Contributor

jlarmstrongiv commented Jun 24, 2022

Related #8113 and #4827

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants