Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for R #44

Open
bobjansen opened this issue Sep 16, 2019 · 3 comments
Open

Support for R #44

bobjansen opened this issue Sep 16, 2019 · 3 comments

Comments

@bobjansen
Copy link

I've tried to use Mesh in combination with R and unfortunately didn't see any gains. As I have no experience I might be overlooking something. The script in this gist creates and destroys some objects which should run using only base R. Am I missing something? Can I do something to make improvements somewhere?

@emeryberger
Copy link
Member

Here's what I found out about R memory management. TL;DR: it only calls malloc for objects > 128 bytes (http://adv-r.had.co.nz/memory.html).

...you need to know a little bit about how R requests memory from the operating system. Requesting memory (with malloc()) is a relatively expensive operation. Having to request memory every time a small vector is created would slow R down considerably. Instead, R asks for a big block of memory and then manages that block itself. This block is called the small vector pool and is used for vectors less than 128 bytes long. For efficiency and simplicity, it only allocates vectors that are 8, 16, 32, 48, 64, or 128 bytes long.

I wonder if it's possible to interpose on the small vector pool operations.

@bobjansen
Copy link
Author

bobjansen commented Sep 17, 2019

If I understand that excerpt and the code I wrote correctly, the code should create data.frames with a 1000 rows and 3 columns. I believe the two character columns to be pointer so that would come to 2 x 1000 x 8 + 1 x 1000 x 8 bytes = 24.000 bytes which are then dealt with by a malloc() for each column. Is it helpful if I 1) write some malloc'ing code using Rcpp and 2) get addresses of variables from R?

@bpowers
Copy link
Member

bpowers commented Sep 17, 2019

I confess to not being able read R code well; however when I run it under mesh what I see is:

Cosmic 01:37:34 {master} [bobby@cosmic-vm mesh]$ MALLOCSTATS=1 LD_PRELOAD=libmesh.so Rscript test.r
Meshed pages HWM:   0
Meshed MB HWM:      0.0
MH Alloc Count:     185
MH Free  Count:     0
MH High Water Mark: 185
Meshed pages HWM:   0
Meshed MB HWM:      0.0
MH Alloc Count:     145
MH Free  Count:     0
MH High Water Mark: 145
Meshed pages HWM:   0
Meshed MB HWM:      0.0
MH Alloc Count:     137
MH Free  Count:     0
MH High Water Mark: 137
Meshed pages HWM:   0
Meshed MB HWM:      0.0
MH Alloc Count:     185
MH Free  Count:     0
MH High Water Mark: 185
Running with pid 7562
6 tables removed
001 Max VmRss:	70600 kb
6 tables removed
002 Max VmRss:	73812 kb
4 tables removed
003 Max VmRss:	77544 kb
Meshed pages HWM:   199
Meshed MB HWM:      0.8
MH Alloc Count:     1882
MH Free  Count:     287
MH High Water Mark: 1785

you can ignore most of the top spew (that is Mesh reporting how much memory it used each time you spawn cat to read /proc/self/status).

The high-water mark (HWM) of mesh is only about 800 KB. When I turn up the rate we mesh at by setting MESH_PERIOD_MS=10 (the default is 100 ms), this increase to about 1%.

We should print out better diagnostics here, but my rough guess is that the program either isn't experiencing fragmentation, or at least isn't experiencing it in a way mesh can recover from (e.g. if all pages are 60% full we have rather significant fragmentation but we won't be able to mesh).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants