Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: BatchIt #637

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open

WIP: BatchIt #637

wants to merge 41 commits into from

Conversation

nwf-msr
Copy link
Contributor

@nwf-msr nwf-msr commented Sep 21, 2023

As part of the assessment of #634, but also perhaps more generally useful. Opinions welcome.

@nwf-msr nwf-msr requested a review from mjp41 September 21, 2023 03:14
@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 4 times, most recently from 8debce9 to a26a036 Compare September 22, 2023 21:58
@mjp41
Copy link
Member

mjp41 commented Sep 24, 2023

@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 2 times, most recently from 81147f0 to 998c2f4 Compare September 25, 2023 20:36
@nwf-msr
Copy link
Contributor Author

nwf-msr commented Sep 25, 2023

Looks like it is leaking in some case https://github.com/microsoft/snmalloc/actions/runs/6279469686/job/17055222812?pr=637#step:7:149

Whoops; I had the loop termination conditions wrong. They're fixed now, I think. Let's see if CI agrees.

@nwf-msr nwf-msr changed the title WIP: msgpass test WIP: msgpass test and related changes Nov 16, 2023
@nwf-msr
Copy link
Contributor Author

nwf-msr commented Nov 16, 2023

After discussions with @mjp41 yesterday, I've introduced a notion of "tweakable obfuscation" and have made all the intra-slab free lists' backwards signatures use the address of the slab metadata as the "tweak". The next step would be to remove the per-thread keys and have everyone use a common global key (probably not RemoteAllocator::key_global!) and apply the same tweaking. This opens the door to sending threads being able to build up segments of slab free lists that can be spliced in by the recipient in O(1) rather than O(n).

@nwf-msr
Copy link
Contributor Author

nwf-msr commented Dec 14, 2023

I've (at long last) got things flying end to end with a very simple "cache" on the sending side -- a single open ring -- but I think some review and investigation is a good idea. Here's what mimalloc-bench makes of the current state of things in terms of time
image
and memory
image

The sizeclass was already testing most of this, so just add the missing bits.
Forgo some tests whose failure would have implied earlier failures.

This moves the last dynamic call of size_to_sizeclass_const into tests
(and so, too, to_exp_mant_const).  sizeclasstable.h still contains a static
call to compute NUM_SMALL_SIZECLASSES from MAX_SMALL_SIZECLASS_SIZE.
Only its _const sibling is used, and little at that, now that almost everything
to do with sizes and size classes is table-driven.
This just means I don't need to remember to set a breakpoint on exit
nwf-msr added 28 commits May 24, 2024 04:32
- Trace "Handling remote" once per batch, rather than per element

- Remote queue events also log the associated metaslab; we'll use this
  to assess the efficacy of microsoft#634
Approximate a message-passing application as a set of producers, a set of
consumers, and a set of proxies that do both.  We'll use this for some initial
insight for microsoft#634 but it seems worth
having in general.
We'll use these to pack values in message queues.

- Maximum distance between two objects in a single slab
- Maximum number of objects in a slab
We'll use the _slower form when we're just stepping a slab through
multiple rounds of state transition (to come), which can't involve
the actual memory object in question.
The pattern of `if (!fast()) { slow() }` occurs in a few places, including in
contexts where we already know the entry and so don't need to look it up.
Plumb its use around remoteallocator and remotecache
This prepares the recipient to process a batched message.
Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.
This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine.  Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.
We can, as Matt so kindly reminds me, go get them from the pagemap.  Since we
need this value only when closing a ring, the read from over there is probably
not very onerous.  (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants