Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Futures Use More Memory Than They Should #709

Open
jestover opened this issue Jan 4, 2024 · 1 comment
Open

Nested Futures Use More Memory Than They Should #709

jestover opened this issue Jan 4, 2024 · 1 comment

Comments

@jestover
Copy link

jestover commented Jan 4, 2024

I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr the double loop doesn't increase memory usage while with furrr and future.apply the memory usage explodes.

library(bench)
library(furrr)
library(future.apply)
library(purrr)

# purrr
single_loop <- function(x, n) {
  map(1:n, ~ x[, .x])
}

# future.apply
single_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) x[, i])
}

# furrr
single_loop_f <- function(x, n) {
  future_map(1:n, ~ x[, .x])
}

# purrr
inner_loop <- function(i, n, x = x) {
  map_dbl(1:n, ~ x[.x, i])
}

outer_loop <- function(x, n) {
  map(1:n, ~ inner_loop(.x, n, x = x))
}

# future.apply
inner_loop_a <- function(i, n, x = x) {
  future_sapply(1:n, FUN = function(j) x[j, i])
}

outer_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))
}

# furrr
inner_loop_f <- function(i, n, x = x) {
  future_map_dbl(1:n, ~ x[.x, i])
}

outer_loop_f <- function(x, n) {
  future_map(1:n, ~ inner_loop_f(.x, n, x = x))
}

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE

plan(sequential)

# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB

# With a double loop memory usage remains similar for purrr, but explodes 
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB

# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)

bench::mark(single_loop(x, n))$mem_alloc
287MB
bench::mark(single_loop_a(x, n))$mem_alloc
287MB
bench::mark(single_loop_f(x, n))$mem_alloc
287MB

bench::mark(outer_loop(x, n))$mem_alloc
191MB
bench::mark(outer_loop_a(x, n))$mem_alloc
2.88GB
bench::mark(outer_loop_f(x, n))$mem_alloc
1.57GB

As you can see, using the double loop actually decreases memory usage for purrr, although it stays very similar, but causes memory usage to explode for furrr and future.apply. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr and future.apply because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!

@jestover jestover added the bug label Jan 4, 2024
@jestover
Copy link
Author

jestover commented Jan 5, 2024

A little more information. I don't know much about memory profiling, so apologies if this is not the best way to present the information, but in the hopes it might be helpful...

library(profmem)
library(tidyverse)

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

single_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    201      130448

single_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    364      251360

single_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    475      353128

outer_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    101       85648

outer_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1  17101    12623144

outer_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes),
  )
#   allocs total_bytes
# 1  27812    22595240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants