Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Fix memory safety issues #1852

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

david-cortes
Copy link
Contributor

@david-cortes david-cortes commented Sep 17, 2021

This PR fixes a couple of potential memory leaks and use-after-free errors in the R version. In short, lots of these errors come from 3 patterns:

  • Throwing R errors in a catch block, which trigger C long jumps and end up leaking the exception object.
  • Not protecting R objects of dynamic storage, which might get deallocated at the moment they are to be used.
  • Mixing R allocations and C++ allocations in different order, which can cause long jumps if an R allocation fail and leaves C++ objects undestructed if they were allocated before.

Some information about the new functions used here (particularly unwind protection) can be found in the R extensions manual: https://cran.r-project.org/doc/manuals/R-exts.html

IMPORTANT: Getting this PR to compile requires updating the R headers to something more recent, such as version 4.0.0 (see #1851). Current headers under contrib are at 3.5.0 and lack some of the functions used in this PR.

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@kizill
Copy link
Member

kizill commented Sep 21, 2021

Is it safe to upgrade headers and minimal requirements to 4.0.0 ? Do you know if there are any stats about version distribution in the wild?

@david-cortes
Copy link
Contributor Author

If you mean to ask "is it safe to compile with R headers 4.0.0 and load in a different R version", then no, there's no guarantee that it will work correctly, same as it isn't safe to compile with R headers 3.5.0 and then load in R 4.0.0 like it's currently being done (#1851). AFAIK it's only safe to reuse headers between "patch" versions, like 4.0.0 -> 4.0.1. The R headers are only meant to be stable in the sense that, if including them from an R installation and using the functions they contain, the functions and signatures from the headers will keep working in newer R installations, but the underlying definitions that they use internally might change - for example, they use type SEXP and the functions that return SEXP have remained the same for many versions, but the structure behind a SEXP type AFAIK has changed. Some of the headers are also setup-specific - for example, the visibility header will look different in windows, linux, and mac, and if building R with a non-default compiler it might also look different. The only other libraries that I know of which provide independent binary R builds typically build them for a specific R version and platform (e.g. https://anaconda.org/conda-forge/r-arrow/files).

If you mean to ask "is it common for users to have R version 4.0.0 or greater", then I'd guess it more or less is, since it was released 17 months ago and the current version is 4.1.1. R 4.0 is currently available in debian stable, but not in ubuntu LTS (unless using R's own repository instead of ubuntu's). In CRAN servers, they are not currently testing packages against anything older than R 4.0.4.

I don't know of any reliable source for counting users of R versions, but a check for number of R downloads in the past week from cranlogs shows that roughly 94% of downloads were for R>=4.0.0:

library(cranlogs)
library(data.table)
df = cran_downloads("R")
setDT(df)
df[
    grepl("^\\d+\\.\\d+\\.\\d+.*$", version)
][,
  .(count, tot = sum(count), version = gsub("^(\\d+\\.\\d+)\\.\\d+.*$", "\\1", version))
][,
   .(pct = sum(count) / max(tot))
  , by=.(version)
][
    order(-pct)
    , .(version, pct = sprintf("%.2f%%", 100*pct))
][1:5]
   version    pct
1:     4.1 87.93%
2:     4.0  6.48%
3:     3.6  3.55%
4:     3.5  0.52%
5:     3.3  0.36%

Although I think this PR might compile with headers from R 3.6.0 too, just not with 3.5.0.

@kizill
Copy link
Member

kizill commented Sep 28, 2021

@arcadia-devtools Ship it!

@kizill
Copy link
Member

kizill commented Sep 28, 2021

I'll try to do something with R headers in our repo, thanks. I guess i'll try 3.6 first.

@arcadia-devtools
Copy link
Collaborator

CLA already signed

@arcadia-devtools
Copy link
Collaborator

@kizill, internal review request created: 2049780

stevewinters pushed a commit to stevewinters/catboost that referenced this pull request Aug 18, 2023
stevewinters pushed a commit to stevewinters/catboost that referenced this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants