Thunks tutorial #824

infinisil · 2023-12-04T21:26:40Z

Almost copied verbatim from https://nixos.wiki/wiki/Nix_Evaluation_Performance which I wrote some time ago.

This needs a bit of work, but I think it's not bad as is either. Feedback appreciated!

Almost copied verbatim from https://nixos.wiki/wiki/Nix_Evaluation_Performance which I wrote some time ago

github-actions · 2023-12-04T21:28:25Z

🚀 Deployed on https://656e4477bc84131fd93e664d--nix-dev.netlify.app

fricklerhandwerk · 2023-12-04T23:42:57Z

Looks like concepts rather than tutorial, but cool stuff! Seems it only a bit of polish.

roberth

It would be great to have this as more official documentation, even if not in the Nix manual.

Regarding the text, I have a problem in general with the phrasing of "allocating a thunk".
You allocate memory, and in that memory you put a state called a thunk.
If you were to put an actual value there, you would still allocate the same memory.
Thunks and values are necessarily of the same size, so allocating either one is really the same operation as the other.
Furthermore, you may allocate on the stack, and we actually do this for some values/thunks. This is super cheap (when permissible, ie without exacerbating the risk of stack overflows, and since 2.20 we'll stop doing allocations of arbitrary size). So what we really care about is heap allocations.

A more appropriate classification would be

heap objects vs stack objects
values vs thunks, but both of them being states of objects (unfortunately not the exact terminology used in the implementation though, where any nThunk can be in a Value struct)

I think we should care mostly about heap objects, many of which happen to be thunks at first. Conversely the thunks we care about tend to be on the heap, because those allocations are more expensive, and they may stick around for a long time.

roberth · 2023-12-04T22:41:41Z

source/tutorials/thunks.md

+It is only evaluated once needed.
+It consists of two parts:
+- The expression that the value should be evaluated from
+- The variables the expression has access to


Nix also has tApp thunks, which are functions that haven't really been applied yet.
It's used in cases where Nix knows which functions and arguments need to be paired up, but haven't been demanded yet. Example: map f [e1 e2], tApp is used for f e1 and f e2 thunks, because for the primop there's no expression that represents these applications.
Other thunks may exist, but either way, and partly because such details may not be permanent, I believe the representation of thunks is not actually that relevant.
They're just delayed computations.

What may be relevant is that call by need is implemented by mutating an object in memory, changing it from a representation of a delayed computation, to a representation of a value that is in a weak head normal form.

roberth · 2023-12-04T22:51:47Z

source/tutorials/thunks.md

+
+It is very easy to introduce a lot of thunks in Nix code, which can have negative consequences:
+
+- Every new thunk requires heap memory allocations.


Not sure if this is 100% true, but Values on the stack are cheap anyway and can not be allowed to be referenced by any expression whose lifetime extends beyond that of the stack frame, making their use somewhat limited (although we have plenty on-stack values at various points, I must say).

Also allocations for let should be cheaper than other allocations.

roberth · 2023-12-04T22:54:28Z

source/tutorials/thunks.md

+
+- Every new thunk requires heap memory allocations.
+- A thunk prevents the evaluation garbage collector from collecting any variables it needs,
+  causing not only the memory of the thunk itself to be kept alive, but also all its references.


Such references are often referred to as a closure, and the term applies at least at a conceptual level, where we can say that the free variables are closed over.

Nix is perhaps simplistic in how it represents closures: a reference to a singly linked list is retained, which has all scopes all the way to the top of the file, even if no references are made to many of the variables in those scope layers. (Env is a what I would call a layer here)

Naive thunk closures cause space leaks nix#8285

roberth · 2023-12-04T22:58:12Z

source/tutorials/thunks.md

+- Every new thunk requires heap memory allocations.
+- A thunk prevents the evaluation garbage collector from collecting any variables it needs,
+  causing not only the memory of the thunk itself to be kept alive, but also all its references.
+- Too deeply nested thunks can lead to stack overflows when evaluated.


Fun fact: if Nix changes its forceValue from an if (isThunk) to a while (isThunk) it could do a bunch of tail recursion, but traces might be worse or more expensive.

May not solve the problem for the mutual nesting of thunks and e.g. attrsets, so this stands. Probably for other patterns as well.

roberth · 2023-12-04T23:00:59Z

source/tutorials/thunks.md

+  causing not only the memory of the thunk itself to be kept alive, but also all its references.
+- Too deeply nested thunks can lead to stack overflows when evaluated.
+
+Of course, thunks are essential to Nix, so it's not possible to avoid them.


Suggested change

Of course, thunks are essential to Nix, so it's not possible to avoid them.

Thunks are essential to the implementation Nix, or any lazy functional language, so it's not possible to avoid them.

Less subjective and more informative respectively.

roberth · 2023-12-04T23:20:07Z

source/tutorials/thunks.md

+
+   - `let ... in` expressions attempt to create a thunk for each variable
+   - `{ ... }` (attribute set) expressions attempt to create a thunk for each attribute
+   - `[ ... ]` (list) expressions attempt to create a thunk for each element


Suggested change

- `[ ... ]` (list) expressions attempt to create a thunk for each element

- `[ ... ]` (list) literals behave similarly to attribute values

roberth · 2023-12-04T23:22:28Z

source/tutorials/thunks.md

+   - `let ... in` expressions attempt to create a thunk for each variable
+   - `{ ... }` (attribute set) expressions attempt to create a thunk for each attribute
+   - `[ ... ]` (list) expressions attempt to create a thunk for each element
+   - `f a` (function application) expressions attempt to create a thunk for the argument


It think it depends. The arguments are pointers on the stack (since recently up to 4, falling back to the heap, but that's actually a lot for currying), and those pointers can be acquired through maybeThunk ie eg ExprVar doesn't need to allocate. The return value may be written to the stack.
That's if the call needs to be made directly. Otherwise you might be looking at a tApp from a higher order function primop or something.

The allocation of an Env seems more certain in this situation. All it takes for that is that the function is not a primop.
Partially applied primops do allocate thunks though. Isn't this fun.

roberth · 2023-12-04T23:24:22Z

source/tutorials/thunks.md

+   - `f a` (function application) expressions attempt to create a thunk for the argument
+   - `{ attr ? def }: ...`:
+     For every function evaluation where the function takes an attribute set where an attribute has a default value which doesn't exist in the passed argument,
+     a thunk for the default value is attempted to be created.


Suggested change

a thunk for the default value is attempted to be created.

a thunk for the default value is to be created.

What's this "attempting" about?
If it fails, it crashes and it seems to be an unlikely cause, or not really worth considering for the purpose of optimization.
I would think it's for the final value. More something representing an ExprSelect with the default. (Probably even a fake ExprSelect, but that's unnecessary detail.)

roberth · 2023-12-04T23:31:00Z

source/tutorials/thunks.md

+# let in expressions can allocate thunks
+let
+
+  # 0 (+0) No thunk allocated because strings are atomic value expressions


Suggested change

# 0 (+0) No thunk allocated because strings are atomic value expressions

# 0 (+0) No thunk allocated because simple string literals in the parsed expression are accompanied by a reusable value which does not even start as a thunk.

roberth · 2023-12-04T23:32:24Z

source/tutorials/thunks.md

+
+  # 1 (+1) Thunk is allocated, because the + operator is neither an atomic
+  # value nor a direct variable
+  greeting = "Hello, " + name;


Suggested change

greeting = "Hello, " + name;

greeting = "Hello, " + "world";

We don't have general constant expression elimination, and I don't think strings are an exception. (but strings are special, so maybe check)

Create thunks tutorial

0d18436

Almost copied verbatim from https://nixos.wiki/wiki/Nix_Evaluation_Performance which I wrote some time ago

infinisil mentioned this pull request Dec 4, 2023

lib: Add contribution guidelines NixOS/nixpkgs#272083

Merged

roberth suggested changes Dec 5, 2023

View reviewed changes

fricklerhandwerk mentioned this pull request Feb 7, 2024

Nix language deep dive (with nix repl) #579

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thunks tutorial #824

Thunks tutorial #824

infinisil commented Dec 4, 2023

github-actions bot commented Dec 4, 2023

fricklerhandwerk commented Dec 4, 2023 •

edited

roberth left a comment

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023

roberth Dec 4, 2023


		It is very easy to introduce a lot of thunks in Nix code, which can have negative consequences:

		- Every new thunk requires heap memory allocations.

	Of course, thunks are essential to Nix, so it's not possible to avoid them.
	Thunks are essential to the implementation Nix, or any lazy functional language, so it's not possible to avoid them.

	- `[ ... ]` (list) expressions attempt to create a thunk for each element
	- `[ ... ]` (list) literals behave similarly to attribute values

	a thunk for the default value is attempted to be created.
	a thunk for the default value is to be created.

	# 0 (+0) No thunk allocated because strings are atomic value expressions
	# 0 (+0) No thunk allocated because simple string literals in the parsed expression are accompanied by a reusable value which does not even start as a thunk.

Thunks tutorial #824

Are you sure you want to change the base?

Thunks tutorial #824

Conversation

infinisil commented Dec 4, 2023

github-actions bot commented Dec 4, 2023

fricklerhandwerk commented Dec 4, 2023 • edited

roberth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fricklerhandwerk commented Dec 4, 2023 •

edited