-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoGluonLite package for Web JupyterLite/PyScript #2384
Conversation
Job PR-2384-99efe43 is done. |
Job PR-2384-15fc1f2 is done. |
def _update_num_jobs_in_parallel_with_mem(): | ||
import psutil | ||
model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs) | ||
total_memory_available = psutil.virtual_memory().available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why not just use available_virtual_mem() directly, instead of using disable_if_lite_mode decorator for the whole function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried the above code with ray and it works fine:
import ray
import psutil
def disable_if_lite_mode(ret=None):
def inner(func):
def do_nothing(*args, **kwargs):
if callable(ret):
return ret(*args, **kwargs)
return ret
return do_nothing
return inner
@disable_if_lite_mode(ret=1)
def dummy():
return psutil.cpu_count()
@ray.remote
def test():
return dummy()
ray.init()
futures = [test.remote() for _ in range(4)]
print(ray.get(futures))
OUTPUTS:
(cloud) ubuntu@ip-172-31-11-12:~/yinweisu/test_scripts$ python3 temp.py
2022-11-21 20:27:51,964 INFO worker.py:1518 -- Started a local Ray instance.
[1, 1, 1, 1]
@disable_if_lite_mode(ret=1073741824) | ||
def available_virtual_mem(): | ||
import psutil | ||
return psutil.virtual_memory().available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing empty line
|
||
AUTOGLUON_ROOT_PATH = os.path.abspath( | ||
os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', '..', '..') | ||
) | ||
|
||
PYTHON_REQUIRES = '>=3.7, <3.10' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we expand python version range for lite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because jupyterlite works with pyodide, and latest pyodide only supports cpython==3.10.2 .
@@ -25,6 +27,12 @@ | |||
'tqdm': '>=4.38.0', | |||
'Pillow': '>=9.0.1,<9.1.0', | |||
'timm': '>=0.5.4,<0.7.0', | |||
} if not LITE_MODE else { | |||
'numpy': '>=1.21,<1.23', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is more restrictive? We'll have to update this to 1.22.2+
-> Vulnerability found in numpy version 1.21.0
Vulnerability ID: 44715
Affected spec: <1.22.2
ADVISORY: Numpy 1.22.2 includes a fix for CVE-2021-41495: Null Pointer Dereference vulnerability exists in numpy.sort in NumPy in the PyArray_DescrNew function due to missing return-value validation,
which allows attackers to conduct DoS attacks by repetitively creating sort arrays. NOTE: While correct that validation is missing, an error can only occur due to an exhaustion of memory. If the user can...
CVE-2021-41495
For more information, please visit https://pyup.io/v/44715/f17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I merged from the upstream and didn't notice it was upgraded.
def _update_num_jobs_in_parallel_with_mem(): | ||
import psutil | ||
model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs) | ||
total_memory_available = psutil.virtual_memory().available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.
@@ -10,6 +9,7 @@ | |||
|
|||
|
|||
def list_bucket_s3(bucket): | |||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this import required inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_s3 is part of loaders/__init__.py
, some code thereby import boto3 though they don't use it.
@yzhliu Please rebase with mainline, we updated the usage of psutil to be contained in a ResourceManager object, we could maybe use this to more easily enable the Lite functionality with fewer code modifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a look again @liangfu @gradientsky @Innixma @yinweisu , thanks!
@@ -10,6 +9,7 @@ | |||
|
|||
|
|||
def list_bucket_s3(bucket): | |||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_s3 is part of loaders/__init__.py
, some code thereby import boto3 though they don't use it.
@@ -25,6 +27,12 @@ | |||
'tqdm': '>=4.38.0', | |||
'Pillow': '>=9.0.1,<9.1.0', | |||
'timm': '>=0.5.4,<0.7.0', | |||
} if not LITE_MODE else { | |||
'numpy': '>=1.21,<1.23', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I merged from the upstream and didn't notice it was upgraded.
Job PR-2384-63d689f is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Thanks for the contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added initial review, looks quite good!
def disable_if_lite_mode(ret=None): | ||
def inner(func): | ||
def do_nothing(*args, **kwargs): | ||
if callable(ret): | ||
return ret(*args, **kwargs) | ||
return ret | ||
metadata = get_autogluon_metadata() | ||
if metadata['lite']: | ||
return do_nothing | ||
return func | ||
return inner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we checked if this has any meaningful overhead / slowdown to inference throughput when lite_mode is disabled due to having to repeatedly call get_autogluon_metadata()
? I'd expect not, but @liangfu might be able to run a quick sanity check on mainline compared to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better approach is to just have a dedicated is_lite_mode_enabled()
function so it avoids having to generate all the extra information present in metadata
.
return ret(*args, **kwargs) | ||
return ret | ||
metadata = get_autogluon_metadata() | ||
if metadata['lite']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for better chances at backward compatibility and general robustness, we should first check if 'lite' in metadata
] if not ag.LITE_MODE else [ | ||
# version ranges added in ag.get_dependency_version_ranges() | ||
'numpy', | ||
'scipy', | ||
'scikit-learn', | ||
'pandas', | ||
'tqdm', | ||
'matplotlib', | ||
|
||
f'{ag.PACKAGE_NAME}.common=={version}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see us relative importing dask or anything like that in this PR, do you know where dask and distributed are imported in our repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any either.
@@ -161,6 +161,7 @@ def load_multi(path_list, delimiter=',', encoding='utf-8', columns_to_keep_list= | |||
|
|||
def load_multipart_s3(bucket, prefix, columns_to_keep=None, dtype=None, sample_count=None, filters=None, | |||
worker_count=None, multiprocessing_method='forkserver'): | |||
from .load_s3 import list_bucket_prefix_suffix_s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
f'{ag.PACKAGE_NAME}.core=={version}', | ||
f'{ag.PACKAGE_NAME}.features=={version}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do ag.PACKAGE_NAME for the non-lite requirements as well? Seems like a bit of code dupe going on currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. seems to be merging issue.
tabular/src/autogluon/tabular/models/catboost/catboost_model.py
Outdated
Show resolved
Hide resolved
tabular/src/autogluon/tabular/models/catboost/catboost_model.py
Outdated
Show resolved
Hide resolved
tabular/src/autogluon/tabular/models/fasttext/fasttext_model.py
Outdated
Show resolved
Hide resolved
Overall looks great! I added some comments, mostly minor cleanup around consistent usage of ResourceManager. |
Job PR-2384-1d82286 is done. |
Job PR-2384-30f640e is done. |
Job PR-2384-cb14e91 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just to block accidental merge, since we are about to release v0.6.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome work!
Job PR-2384-69e0498 is done. |
export AUTOGLUON_PACKAGE_NAME="autogluon-lite"
to enable the lite-mode build for browsers. Currently 9/12 TabularPrediction tutorials run through, though some of them are not fully functional due to lack of py package support (e.g., pytorch).Background of code change
JupyterLite and Pyscript are using WebAssembly/Pyodide to execute Python code in a web browser. However, WebAssembly/Pyodide is yet able to support all the packages required by AutoGluon, due to following issues,
Design and implementation
Per discussion above, there are three types of changes needed,
In addition, for first POC we might only have AutoGluon.tabular package and its dependencies (core/common/feature) released as the rest (multimodal/text/vision) will need neural network support which takes time (see Pytorch support#1625)