Merge pull request #1157 from atouchet/faq

Re-wrap the FAQ to 80 character width
rayon-rs · Apr 8, 2024 · 8ccfda3 · 8ccfda3
2 parents 5623fcf + f56b38b
commit 8ccfda3
Showing 1 changed file with 99 additions and 112 deletions.
diff --git a/FAQ.md b/FAQ.md
@@ -1,51 +1,47 @@
 # Rayon FAQ
 
-This file is for general questions that don't fit into the README or
-crate docs.
+This file is for general questions that don't fit into the README or crate docs.
 
 ## How many threads will Rayon spawn?
 
-By default, Rayon uses the same number of threads as the number of
-CPUs available. Note that on systems with hyperthreading enabled this
-equals the number of logical cores and not the physical ones.
+By default, Rayon uses the same number of threads as the number of CPUs
+available. Note that on systems with hyperthreading enabled this equals the
+number of logical cores and not the physical ones.
 
 If you want to alter the number of threads spawned, you can set the
-environmental variable `RAYON_NUM_THREADS` to the desired number of
-threads or use the
+environmental variable `RAYON_NUM_THREADS` to the desired number of threads or
+use the
 [`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global)
 method.
 
 ## How does Rayon balance work between threads?
 
-Behind the scenes, Rayon uses a technique called **work stealing** to
-try and dynamically ascertain how much parallelism is available and
-exploit it. The idea is very simple: we always have a pool of worker
-threads available, waiting for some work to do. When you call `join`
-the first time, we shift over into that pool of threads. But if you
-call `join(a, b)` from a worker thread W, then W will place `b` into
-its work queue, advertising that this is work that other worker
-threads might help out with. W will then start executing `a`.
-
-While W is busy with `a`, other threads might come along and take `b`
-from its queue. That is called *stealing* `b`. Once `a` is done, W
-checks whether `b` was stolen by another thread and, if not, executes
-`b` itself. If W runs out of jobs in its own queue, it will look
-through the other threads' queues and try to steal work from them.
-
-This technique is not new. It was first introduced by the
-[Cilk project][cilk], done at MIT in the late nineties. The name Rayon
-is an homage to that work.
+Behind the scenes, Rayon uses a technique called **work stealing** to try and
+dynamically ascertain how much parallelism is available and exploit it. The idea
+is very simple: we always have a pool of worker threads available, waiting for
+some work to do. When you call `join` the first time, we shift over into that
+pool of threads. But if you call `join(a, b)` from a worker thread W, then W
+will place `b` into its work queue, advertising that this is work that other
+worker threads might help out with. W will then start executing `a`.
+
+While W is busy with `a`, other threads might come along and take `b` from its
+queue. That is called *stealing* `b`. Once `a` is done, W checks whether `b` was
+stolen by another thread and, if not, executes `b` itself. If W runs out of jobs
+in its own queue, it will look through the other threads' queues and try to
+steal work from them.
+
+This technique is not new. It was first introduced by the [Cilk project][cilk],
+done at MIT in the late nineties. The name Rayon is an homage to that work.
 
 [cilk]: http://supertech.csail.mit.edu/cilk/
 
 ## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types?
 
-There are a number of non-threadsafe types in the Rust standard library,
-and if your code is using them, you will not be able to combine it
-with Rayon. Similarly, even if you don't have such types, but you try
-to have multiple closures mutating the same state, you will get
-compilation errors; for example, this function won't work, because
-both closures access `slice`:
+There are a number of non-threadsafe types in the Rust standard library, and if
+your code is using them, you will not be able to combine it with Rayon.
+Similarly, even if you don't have such types, but you try to have multiple
+closures mutating the same state, you will get compilation errors; for example,
+this function won't work, because both closures access `slice`:
 
 ```rust
 /// Increment all values in slice.
@@ -54,9 +50,9 @@ fn increment_all(slice: &mut [i32]) {
 }
 ```
 
-The correct way to resolve such errors will depend on the case.  Some
-cases are easy: for example, uses of [`Rc`] can typically be replaced
-with [`Arc`], which is basically equivalent, but thread-safe.
+The correct way to resolve such errors will depend on the case. Some cases are
+easy: for example, uses of [`Rc`] can typically be replaced with [`Arc`], which
+is basically equivalent, but thread-safe.
 
 Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated.
 If you can refactor your code to avoid those types, that is often the best way
@@ -66,34 +62,33 @@ equivalents:
 - `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc
 - `RefCell` -- replacement: `RwLock`, or perhaps `Mutex`
 
-However, you have to be wary! The parallel versions of these types
-have different atomicity guarantees. For example, with a `Cell`, you
-can increment a counter like so:
+However, you have to be wary! The parallel versions of these types have
+different atomicity guarantees. For example, with a `Cell`, you can increment a
+counter like so:
 
 ```rust
 let value = counter.get();
 counter.set(value + 1);
 ```
 
-But when you use the equivalent `AtomicUsize` methods, you are
-actually introducing a potential race condition (not a data race,
-technically, but it can be an awfully fine distinction):
+But when you use the equivalent `AtomicUsize` methods, you are actually
+introducing a potential race condition (not a data race, technically, but it can
+be an awfully fine distinction):
 
 ```rust
 let value = tscounter.load(Ordering::SeqCst);
 tscounter.store(value + 1, Ordering::SeqCst);
 ```
 
-You can already see that the `AtomicUsize` API is a bit more complex,
-as it requires you to specify an
+You can already see that the `AtomicUsize` API is a bit more complex, as it
+requires you to specify an
 [ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I
-won't go into the details on ordering here, but suffice to say that if
-you don't know what an ordering is, and probably even if you do, you
-should use `Ordering::SeqCst`.) The danger in this parallel version of
-the counter is that other threads might be running at the same time
-and they could cause our counter to get out of sync. For example, if
-we have two threads, then they might both execute the "load" before
-either has a chance to execute the "store":
+won't go into the details on ordering here, but suffice to say that if you don't
+know what an ordering is, and probably even if you do, you should use
+`Ordering::SeqCst`.) The danger in this parallel version of the counter is that
+other threads might be running at the same time and they could cause our counter
+to get out of sync. For example, if we have two threads, then they might both
+execute the "load" before either has a chance to execute the "store":
 
 ```
 Thread 1                                          Thread 2
@@ -104,26 +99,23 @@ tscounter.store(value+1);                         tscounter.store(value+1);
 // tscounter = X+1                                // tscounter = X+1
 ```
 
-Now even though we've had two increments, we'll only increase the
-counter by one!  Even though we've got no data race, this is still
-probably not the result we wanted. The problem here is that the `Cell`
-API doesn't make clear the scope of a "transaction" -- that is, the
-set of reads/writes that should occur atomically. In this case, we
-probably wanted the get/set to occur together.
-
-In fact, when using the `Atomic` types, you very rarely want a plain
-`load` or plain `store`. You probably want the more complex
-operations. A counter, for example, would use `fetch_add` to
-atomically load and increment the value in one step. Compare-and-swap
-is another popular building block.
-
-A similar problem can arise when converting `RefCell` to `RwLock`, but
-it is somewhat less likely, because the `RefCell` API does in fact
-have a notion of a transaction: the scope of the handle returned by
-`borrow` or `borrow_mut`. So if you convert each call to `borrow` to
-`read` (and `borrow_mut` to `write`), things will mostly work fine in
-a parallel setting, but there can still be changes in behavior.
-Consider using a `handle: RefCell<Vec<i32>>` like:
+Now even though we've had two increments, we'll only increase the counter by
+one! Even though we've got no data race, this is still probably not the result
+we wanted. The problem here is that the `Cell` API doesn't make clear the scope
+of a "transaction" -- that is, the set of reads/writes that should occur
+atomically. In this case, we probably wanted the get/set to occur together.
+
+In fact, when using the `Atomic` types, you very rarely want a plain `load` or
+plain `store`. You probably want the more complex operations. A counter, for
+example, would use `fetch_add` to atomically load and increment the value in one
+step. Compare-and-swap is another popular building block.
+
+A similar problem can arise when converting `RefCell` to `RwLock`, but it is
+somewhat less likely, because the `RefCell` API does in fact have a notion of a
+transaction: the scope of the handle returned by `borrow` or `borrow_mut`. So if
+you convert each call to `borrow` to `read` (and `borrow_mut` to `write`),
+things will mostly work fine in a parallel setting, but there can still be
+changes in behavior. Consider using a `handle: RefCell<Vec<i32>>` like:
 
 ```rust
 let len = handle.borrow().len();
@@ -133,13 +125,12 @@ for i in 0 .. len {
 }
 ```
 
-In sequential code, we know that this loop is safe. But if we convert
-this to parallel code with an `RwLock`, we do not: this is because
-another thread could come along and do
-`handle.write().unwrap().pop()`, and thus change the length of the
-vector. In fact, even in *sequential* code, using very small borrow
-sections like this is an anti-pattern: you ought to be enclosing the
-entire transaction together, like so:
+In sequential code, we know that this loop is safe. But if we convert this to
+parallel code with an `RwLock`, we do not: this is because another thread could
+come along and do `handle.write().unwrap().pop()`, and thus change the length of
+the vector. In fact, even in *sequential* code, using very small borrow sections
+like this is an anti-pattern: you ought to be enclosing the entire transaction
+together, like so:
 
 ```rust
 let vec = handle.borrow();
@@ -159,11 +150,10 @@ for data in vec {
 }
 ```
 
-There are several reasons to prefer one borrow over many. The most
-obvious is that it is more efficient, since each borrow has to perform
-some safety checks. But it's also more reliable: suppose we modified
-the loop above to not just print things out, but also call into a
-helper function:
+There are several reasons to prefer one borrow over many. The most obvious is
+that it is more efficient, since each borrow has to perform some safety checks.
+But it's also more reliable: suppose we modified the loop above to not just
+print things out, but also call into a helper function:
 
 ```rust
 let vec = handle.borrow();
@@ -172,45 +162,42 @@ for data in vec {
 }
 ```
 
-And now suppose, independently, this helper fn evolved and had to pop
-something off of the vector:
+And now suppose, independently, this helper fn evolved and had to pop something
+off of the vector:
 
 ```rust
 fn helper(...) {
     handle.borrow_mut().pop();
 }
 ```
 
-Under the old model, where we did lots of small borrows, this would
-yield precisely the same error that we saw in parallel land using an
-`RwLock`: the length would be out of sync and our indexing would fail
-(note that in neither case would there be an actual *data race* and
-hence there would never be undefined behavior). But now that we use a
-single borrow, we'll see a borrow error instead, which is much easier
-to diagnose, since it occurs at the point of the `borrow_mut`, rather
-than downstream. Similarly, if we move to an `RwLock`, we'll find that
-the code either deadlocks (if the write is on the same thread as the
-read) or, if the write is on another thread, works just fine. Both of
-these are preferable to random failures in my experience.
+Under the old model, where we did lots of small borrows, this would yield
+precisely the same error that we saw in parallel land using an `RwLock`: the
+length would be out of sync and our indexing would fail (note that in neither
+case would there be an actual *data race* and hence there would never be
+undefined behavior). But now that we use a single borrow, we'll see a borrow
+error instead, which is much easier to diagnose, since it occurs at the point of
+the `borrow_mut`, rather than downstream. Similarly, if we move to an `RwLock`,
+we'll find that the code either deadlocks (if the write is on the same thread as
+the read) or, if the write is on another thread, works just fine. Both of these
+are preferable to random failures in my experience.
 
 ## But wait, isn't Rust supposed to free me from this kind of thinking?
 
-You might think that Rust is supposed to mean that you don't have to
-think about atomicity at all. In fact, if you avoid interior
-mutability (`Cell` and `RefCell` in a sequential setting, or
-`AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this
-is true: the type system will basically guarantee that you don't have
-to think about atomicity at all. But often there are times when you
-WANT threads to interleave in the ways I showed above.
-
-Consider for example when you are conducting a search in parallel, say
-to find the shortest route. To avoid fruitless search, you might want
-to keep a cell with the shortest route you've found thus far.  This
-way, when you are searching down some path that's already longer than
-this shortest route, you can just stop and avoid wasted effort. In
-sequential land, you might model this "best result" as a shared value
-like `Rc<Cell<usize>>` (here the `usize` represents the length of best
-path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
+You might think that Rust is supposed to mean that you don't have to think about
+atomicity at all. In fact, if you avoid interior mutability (`Cell` and
+`RefCell` in a sequential setting, or `AtomicUsize`, `RwLock`, `Mutex`, et al.
+in parallel code), then this is true: the type system will basically guarantee
+that you don't have to think about atomicity at all. But often there are times
+when you WANT threads to interleave in the ways I showed above.
+
+Consider for example when you are conducting a search in parallel, say to find
+the shortest route. To avoid fruitless search, you might want to keep a cell
+with the shortest route you've found thus far. This way, when you are searching
+down some path that's already longer than this shortest route, you can just stop
+and avoid wasted effort. In sequential land, you might model this "best result"
+as a shared value like `Rc<Cell<usize>>` (here the `usize` represents the length
+of best path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
 
 ```rust
 fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) {
@@ -222,5 +209,5 @@ fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) {
 }
 ```
 
-Now in this case, we really WANT to see results from other threads
-interjected into our execution!
+Now in this case, we really WANT to see results from other threads interjected
+into our execution!