Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoGluonLite package for Web JupyterLite/PyScript #2384

Merged
merged 23 commits into from
Jan 17, 2023

Conversation

yzhliu
Copy link
Contributor

@yzhliu yzhliu commented Nov 14, 2022

export AUTOGLUON_PACKAGE_NAME="autogluon-lite" to enable the lite-mode build for browsers. Currently 9/12 TabularPrediction tutorials run through, though some of them are not fully functional due to lack of py package support (e.g., pytorch).

Background of code change

JupyterLite and Pyscript are using WebAssembly/Pyodide to execute Python code in a web browser. However, WebAssembly/Pyodide is yet able to support all the packages required by AutoGluon, due to following issues,

  • [Issue 1] Pyodide distribution releases come together with a set of built-in Python packages, e.g., scikit-learn, xgboost, etc. These packages might not use the same version as what AutoGluon requires. However, if we limit the use case for Web, version mismatch usually isn’t a problem.
  • [Issue 2] Though packages written in pure Python can be installed directly via Pyodide, most ML/DL Python toolkits contains C/C++ implementations for best operation performance. These packages have to be manually built ahead-of-time. See https://pyodide.org/en/stable/development/new-packages.html for details.
  • [Issue 3] Some packages are difficult to adopt due to browsers’ poor support of of I/O, network, system-call, etc. For example, psutils, boto3, etc. are not in Pyodide/JupyterLite therefore has to be disabled in AutoGluonLite.

Design and implementation

Per discussion above, there are three types of changes needed,

  • [Change 1] We need to define slightly different package dependencies for AutoGluonLite (in setup.py, etc.), remove unsupported packages and versions. (Issue 1)
  • [Change 2] Have different default settings for model training. For example, by default we should disable CatBoost and Pytorch because they are yet supported in Pyodide. (Issue 2)
  • [Change 3] Disable some functionalities in source code, for example the use of psutil to get memory/computation resources. (Issue 3)

In addition, for first POC we might only have AutoGluon.tabular package and its dependencies (core/common/feature) released as the rest (multimodal/text/vision) will need neural network support which takes time (see Pytorch support#1625)

@github-actions
Copy link

Job PR-2384-99efe43 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/99efe43/index.html

@github-actions
Copy link

Job PR-2384-15fc1f2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/15fc1f2/index.html

def _update_num_jobs_in_parallel_with_mem():
import psutil
model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs)
total_memory_available = psutil.virtual_memory().available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, why not just use available_virtual_mem() directly, instead of using disable_if_lite_mode decorator for the whole function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the above code with ray and it works fine:

import ray
import psutil

def disable_if_lite_mode(ret=None):
    def inner(func):
        def do_nothing(*args, **kwargs):
            if callable(ret):
                return ret(*args, **kwargs)
            return ret
        return do_nothing
    return inner

@disable_if_lite_mode(ret=1)
def dummy():
    return psutil.cpu_count()

@ray.remote
def test():
    return dummy()
    
ray.init()
futures = [test.remote() for _ in range(4)]
print(ray.get(futures))

OUTPUTS:

(cloud) ubuntu@ip-172-31-11-12:~/yinweisu/test_scripts$ python3 temp.py 
2022-11-21 20:27:51,964 INFO worker.py:1518 -- Started a local Ray instance.
[1, 1, 1, 1]

@disable_if_lite_mode(ret=1073741824)
def available_virtual_mem():
import psutil
return psutil.virtual_memory().available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing empty line


AUTOGLUON_ROOT_PATH = os.path.abspath(
os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', '..', '..')
)

PYTHON_REQUIRES = '>=3.7, <3.10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we expand python version range for lite?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because jupyterlite works with pyodide, and latest pyodide only supports cpython==3.10.2 .

@@ -25,6 +27,12 @@
'tqdm': '>=4.38.0',
'Pillow': '>=9.0.1,<9.1.0',
'timm': '>=0.5.4,<0.7.0',
} if not LITE_MODE else {
'numpy': '>=1.21,<1.23',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is more restrictive? We'll have to update this to 1.22.2+

-> Vulnerability found in numpy version 1.21.0
   Vulnerability ID: 44715
   Affected spec: <1.22.2
   ADVISORY: Numpy 1.22.2  includes a fix for CVE-2021-41495: Null Pointer Dereference vulnerability exists in numpy.sort in NumPy in the PyArray_DescrNew function due to missing return-value validation,
   which allows attackers to conduct DoS attacks by repetitively creating sort arrays. NOTE: While correct that validation is missing, an error can only occur due to an exhaustion of memory. If the user can...
   CVE-2021-41495
   For more information, please visit https://pyup.io/v/44715/f17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I merged from the upstream and didn't notice it was upgraded.

def _update_num_jobs_in_parallel_with_mem():
import psutil
model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs)
total_memory_available = psutil.virtual_memory().available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.

@@ -10,6 +9,7 @@


def list_bucket_s3(bucket):
import boto3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this import required inline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_s3 is part of loaders/__init__.py, some code thereby import boto3 though they don't use it.

@Innixma
Copy link
Contributor

Innixma commented Nov 23, 2022

@yzhliu Please rebase with mainline, we updated the usage of psutil to be contained in a ResourceManager object, we could maybe use this to more easily enable the Lite functionality with fewer code modifications

Copy link
Contributor Author

@yzhliu yzhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look again @liangfu @gradientsky @Innixma @yinweisu , thanks!

@@ -10,6 +9,7 @@


def list_bucket_s3(bucket):
import boto3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_s3 is part of loaders/__init__.py, some code thereby import boto3 though they don't use it.

@@ -25,6 +27,12 @@
'tqdm': '>=4.38.0',
'Pillow': '>=9.0.1,<9.1.0',
'timm': '>=0.5.4,<0.7.0',
} if not LITE_MODE else {
'numpy': '>=1.21,<1.23',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I merged from the upstream and didn't notice it was upgraded.

@github-actions
Copy link

github-actions bot commented Dec 3, 2022

Job PR-2384-63d689f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/63d689f/index.html

Copy link
Collaborator

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Thanks for the contribution.

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added initial review, looks quite good!

core/src/autogluon/core/_setup_utils.py Outdated Show resolved Hide resolved
core/src/autogluon/core/_setup_utils.py Outdated Show resolved Hide resolved
core/src/autogluon/core/_setup_utils.py Outdated Show resolved Hide resolved
core/src/autogluon/core/hpo/executors.py Outdated Show resolved Hide resolved
core/src/autogluon/core/models/abstract/abstract_model.py Outdated Show resolved Hide resolved
Comment on lines 4 to 14
def disable_if_lite_mode(ret=None):
def inner(func):
def do_nothing(*args, **kwargs):
if callable(ret):
return ret(*args, **kwargs)
return ret
metadata = get_autogluon_metadata()
if metadata['lite']:
return do_nothing
return func
return inner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked if this has any meaningful overhead / slowdown to inference throughput when lite_mode is disabled due to having to repeatedly call get_autogluon_metadata()? I'd expect not, but @liangfu might be able to run a quick sanity check on mainline compared to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a better approach is to just have a dedicated is_lite_mode_enabled() function so it avoids having to generate all the extra information present in metadata.

return ret(*args, **kwargs)
return ret
metadata = get_autogluon_metadata()
if metadata['lite']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for better chances at backward compatibility and general robustness, we should first check if 'lite' in metadata

Comment on lines +37 to +46
] if not ag.LITE_MODE else [
# version ranges added in ag.get_dependency_version_ranges()
'numpy',
'scipy',
'scikit-learn',
'pandas',
'tqdm',
'matplotlib',

f'{ag.PACKAGE_NAME}.common=={version}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see us relative importing dask or anything like that in this PR, do you know where dask and distributed are imported in our repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any either.

@@ -161,6 +161,7 @@ def load_multi(path_list, delimiter=',', encoding='utf-8', columns_to_keep_list=

def load_multipart_s3(bucket, prefix, columns_to_keep=None, dtype=None, sample_count=None, filters=None,
worker_count=None, multiprocessing_method='forkserver'):
from .load_s3 import list_bucket_prefix_suffix_s3
Copy link
Contributor

@Innixma Innixma Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines +37 to +38
f'{ag.PACKAGE_NAME}.core=={version}',
f'{ag.PACKAGE_NAME}.features=={version}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do ag.PACKAGE_NAME for the non-lite requirements as well? Seems like a bit of code dupe going on currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. seems to be merging issue.

@Innixma
Copy link
Contributor

Innixma commented Jan 5, 2023

Overall looks great! I added some comments, mostly minor cleanup around consistent usage of ResourceManager.

@Innixma Innixma added this to the 0.7 Release milestone Jan 5, 2023
@github-actions
Copy link

github-actions bot commented Jan 6, 2023

Job PR-2384-1d82286 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/1d82286/index.html

@github-actions
Copy link

Job PR-2384-30f640e is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/30f640e/index.html

@github-actions
Copy link

Job PR-2384-cb14e91 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/cb14e91/index.html

Copy link
Collaborator

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just to block accidental merge, since we are about to release v0.6.2

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work!

@github-actions
Copy link

Job PR-2384-69e0498 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/69e0498/index.html

@Innixma Innixma merged commit b18eaff into autogluon:master Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants