Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it safe to modify byte code? #418

Closed
ghost opened this issue Aug 30, 2022 · 5 comments
Closed

Is it safe to modify byte code? #418

ghost opened this issue Aug 30, 2022 · 5 comments

Comments

@ghost
Copy link

ghost commented Aug 30, 2022

Hi!

It's more of a question than anything else. Related to #410 and #304.

Starlark is a great scripting language with a great deal of isolation provided by default, but there is (understandably) no meaningful way to restrain amount of memory, consumed by a program. Including means for that into current implementation will make it slower for all people, even those, who are not interested in such a feature.

But, by analyzing and modifying byte-code of a program, one can track allocations. At least approximately, ignoring or approximating allocations in built-in functions.

So the questions are:

  1. Is it actually safe to modify byte-code? It doesn't look like some unwanted side-effect may appear, but maybe I'm missing something.
  2. How much changes to the byte-code format are planned?
  3. Is there any documentation on BC, other than code of interpreter itself?

Thanks!

@ghost
Copy link
Author

ghost commented Aug 30, 2022

There is only one thing, that makes modifications trickier - JMPs. After a modification, we need to update all jump addresses below the altered instruction.

@adonovan
Copy link
Collaborator

Is it actually safe to modify byte-code? It doesn't look like some unwanted side-effect may appear, but maybe I'm missing something.

The reason the byte code is represented as a []byte and not a string is to allow the interpreter's debugger (a work in very slow progress) to inject breakpoints. But otherwise there is no reason to modify the byte code, and client applications shouldn't even be looking at it: the details of the protocol could change at any moment. Be sure to produce and consume byte code using logic from the same version.

How much changes to the byte-code format are planned?

I plan that it will continue to evolve indefinitely and without warning. :)

Is there any documentation on BC, other than code of interpreter itself?

No.

@ghost
Copy link
Author

ghost commented Aug 31, 2022

@adonovan ok, got it. Alternative here is to add something like Thread.steps, but for allocations. This is a relatively big change and will make upkeep of the library a bit bigger, since developers need to remember to account memory usage semi-manually. Although it will be imprecise, just as Thread.steps are, it is very useful to have and shouldn't add too much of a performance overhead. Any existing custom libraries will still work fine, although in this case clients using them will have even less precise memory usage data. But again, at least something is better than nothing. I can make a PR, but does this sound like a valid feature to you?

Or may be there is other way? I'm trying to find greedy scripts and abort them when possible. So far the options are:

  1. Multi-process setup. Due to how GO's GC works it's kinda weak.
  2. Containerization. Cross-platform issues. In practice means that the program may only run on Linux.
  3. Custom interpreter.
  4. Byte-code instrumentation. Too risky.
  5. Static analysis. Too expensive, unreliable and fundamentally won't allow to find all the cases, since we have kind of "halting problem".

All of this is just way too expensive either in implementation or at runtime.

@adonovan
Copy link
Collaborator

add something like Thread.steps, but for allocations. [...] does this sound like a valid feature to you?

As I noted in #410 (comment), I don't think that approach is viable. In practice most allocations are done by application-defined functions called from Starlark, not from the interpreter itself, so the set of places to instrument is unbounded. Also, lifetime matters: it doesn't make sense to treat threads that allocates a ton of short-lived garbage the same as threads that allocate long-lived data structures. Also, runaway programs can allocate memory very rapidly, so if you want to defend against them, you would need to see all allocations as they happen, not some time after the fact. Go simply doesn't expose a way to do that.

But again, at least something is better than nothing.

I'm not convinced that's true. If "something" is an inherently misleading measure that adds complexity to the code and overhead to the runtime, then it could be worse than nothing.

Looking at your options, a "custom interpreter" (a fork) would let you instrument every place that Starlark allocates memory, but you would need to instrument every allocation in every one of your application's built-in functions too. It seems like a lot of work, and fragile too. By contrast, "multi-process setup" and "containerization" are essentially the same: enlist the help of the operating system. This means you would need to have the Starlark interpreter component of your application live in a child process from the rest of it, and have the two communicate by sending messages, not by sharing memory. It's an invasive design change, but it's conceptually straightforward.

@ghost
Copy link
Author

ghost commented Aug 31, 2022

Well, there are only a few obvious cases that can lead to runaway memory consumption, like

def main():
  x = "x"
  for i in range(1000):
    x = x + x # or x = x * 2

main()

I'm more eager to catch those. Anything that's not so obvious and heavily depends on whatever happens in custom builtins is a problem of API developer.

But I got your point. Thank you for response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant