AutoGluonLite package for Web JupyterLite/PyScript #2384

yzhliu · 2022-11-14T18:57:40Z

export AUTOGLUON_PACKAGE_NAME="autogluon-lite" to enable the lite-mode build for browsers. Currently 9/12 TabularPrediction tutorials run through, though some of them are not fully functional due to lack of py package support (e.g., pytorch).

Background of code change

JupyterLite and Pyscript are using WebAssembly/Pyodide to execute Python code in a web browser. However, WebAssembly/Pyodide is yet able to support all the packages required by AutoGluon, due to following issues,

[Issue 1] Pyodide distribution releases come together with a set of built-in Python packages, e.g., scikit-learn, xgboost, etc. These packages might not use the same version as what AutoGluon requires. However, if we limit the use case for Web, version mismatch usually isn’t a problem.
[Issue 2] Though packages written in pure Python can be installed directly via Pyodide, most ML/DL Python toolkits contains C/C++ implementations for best operation performance. These packages have to be manually built ahead-of-time. See https://pyodide.org/en/stable/development/new-packages.html for details.
[Issue 3] Some packages are difficult to adopt due to browsers’ poor support of of I/O, network, system-call, etc. For example, psutils, boto3, etc. are not in Pyodide/JupyterLite therefore has to be disabled in AutoGluonLite.

Design and implementation

Per discussion above, there are three types of changes needed,

[Change 1] We need to define slightly different package dependencies for AutoGluonLite (in setup.py, etc.), remove unsupported packages and versions. (Issue 1)
[Change 2] Have different default settings for model training. For example, by default we should disable CatBoost and Pytorch because they are yet supported in Pyodide. (Issue 2)
[Change 3] Disable some functionalities in source code, for example the use of psutil to get memory/computation resources. (Issue 3)

In addition, for first POC we might only have AutoGluon.tabular package and its dependencies (core/common/feature) released as the rest (multimodal/text/vision) will need neural network support which takes time (see Pytorch support#1625)

github-actions · 2022-11-15T00:07:31Z

Job PR-2384-99efe43 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/99efe43/index.html

github-actions · 2022-11-15T00:09:26Z

Job PR-2384-15fc1f2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/15fc1f2/index.html

liangfu · 2022-11-17T17:26:17Z

core/src/autogluon/core/hpo/executors.py

+                def _update_num_jobs_in_parallel_with_mem():
+                    import psutil
+                    model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs)
+                    total_memory_available = psutil.virtual_memory().available


Just out of curiosity, why not just use available_virtual_mem() directly, instead of using disable_if_lite_mode decorator for the whole function?

nice catch.

@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.

I just tried the above code with ray and it works fine:

import ray import psutil def disable_if_lite_mode(ret=None): def inner(func): def do_nothing(*args, **kwargs): if callable(ret): return ret(*args, **kwargs) return ret return do_nothing return inner @disable_if_lite_mode(ret=1) def dummy(): return psutil.cpu_count() @ray.remote def test(): return dummy() ray.init() futures = [test.remote() for _ in range(4)] print(ray.get(futures))

OUTPUTS:

(cloud) ubuntu@ip-172-31-11-12:~/yinweisu/test_scripts$ python3 temp.py 2022-11-21 20:27:51,964 INFO worker.py:1518 -- Started a local Ray instance. [1, 1, 1, 1]

gradientsky · 2022-11-21T18:51:00Z

common/src/autogluon/common/utils/ps_utils.py

+@disable_if_lite_mode(ret=1073741824)
+def available_virtual_mem():
+    import psutil
+    return psutil.virtual_memory().available


nit: missing empty line

gradientsky · 2022-11-21T18:57:05Z

core/src/autogluon/core/_setup_utils.py


 AUTOGLUON_ROOT_PATH = os.path.abspath(
    os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', '..', '..', '..')
 )

-PYTHON_REQUIRES = '>=3.7, <3.10'


why do we expand python version range for lite?

I think it's because jupyterlite works with pyodide, and latest pyodide only supports cpython==3.10.2 .

gradientsky · 2022-11-21T19:18:51Z

core/src/autogluon/core/_setup_utils.py

@@ -25,6 +27,12 @@
    'tqdm': '>=4.38.0',
    'Pillow': '>=9.0.1,<9.1.0',
    'timm': '>=0.5.4,<0.7.0',
+} if not LITE_MODE else {
+    'numpy': '>=1.21,<1.23',


why this is more restrictive? We'll have to update this to 1.22.2+

-> Vulnerability found in numpy version 1.21.0 Vulnerability ID: 44715 Affected spec: <1.22.2 ADVISORY: Numpy 1.22.2 includes a fix for CVE-2021-41495: Null Pointer Dereference vulnerability exists in numpy.sort in NumPy in the PyArray_DescrNew function due to missing return-value validation, which allows attackers to conduct DoS attacks by repetitively creating sort arrays. NOTE: While correct that validation is missing, an error can only occur due to an exhaustion of memory. If the user can... CVE-2021-41495 For more information, please visit https://pyup.io/v/44715/f17

thanks. I merged from the upstream and didn't notice it was upgraded.

gradientsky · 2022-11-21T19:43:36Z

core/src/autogluon/core/hpo/executors.py

+                def _update_num_jobs_in_parallel_with_mem():
+                    import psutil
+                    model_estimate_memory_usage = initialized_model.estimate_memory_usage(**kwargs)
+                    total_memory_available = psutil.virtual_memory().available


@yinweisu is it a problem to have wrappers like this for distributed training? I remember some decorators around serializable objects which has to be passed in a job was causing problems when I started do distributed folds work.

gradientsky · 2022-11-21T19:48:20Z

common/src/autogluon/common/loaders/load_s3.py

@@ -10,6 +9,7 @@


 def list_bucket_s3(bucket):
+    import boto3


why is this import required inline?

load_s3 is part of loaders/__init__.py, some code thereby import boto3 though they don't use it.

Innixma · 2022-11-23T19:48:13Z

@yzhliu Please rebase with mainline, we updated the usage of psutil to be contained in a ResourceManager object, we could maybe use this to more easily enable the Lite functionality with fewer code modifications

yzhliu

please take a look again @liangfu @gradientsky @Innixma @yinweisu , thanks!

yzhliu · 2022-11-28T19:05:19Z

common/src/autogluon/common/loaders/load_s3.py

@@ -10,6 +9,7 @@


 def list_bucket_s3(bucket):
+    import boto3


load_s3 is part of loaders/__init__.py, some code thereby import boto3 though they don't use it.

yzhliu · 2022-11-28T19:08:18Z

core/src/autogluon/core/_setup_utils.py

@@ -25,6 +27,12 @@
    'tqdm': '>=4.38.0',
    'Pillow': '>=9.0.1,<9.1.0',
    'timm': '>=0.5.4,<0.7.0',
+} if not LITE_MODE else {
+    'numpy': '>=1.21,<1.23',


thanks. I merged from the upstream and didn't notice it was upgraded.

github-actions · 2022-12-03T02:10:12Z

Job PR-2384-63d689f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/63d689f/index.html

liangfu

Looks good overall. Thanks for the contribution.

Innixma

Added initial review, looks quite good!

core/src/autogluon/core/_setup_utils.py

core/src/autogluon/core/hpo/executors.py

core/src/autogluon/core/models/abstract/abstract_model.py

tabular/src/autogluon/tabular/models/xgboost/xgboost_model.py

Innixma · 2022-12-14T20:05:00Z

common/src/autogluon/common/utils/lite.py

+def disable_if_lite_mode(ret=None):
+    def inner(func):
+        def do_nothing(*args, **kwargs):
+            if callable(ret):
+                return ret(*args, **kwargs)
+            return ret
+        metadata = get_autogluon_metadata()
+        if metadata['lite']:
+            return do_nothing
+        return func
+    return inner


Have we checked if this has any meaningful overhead / slowdown to inference throughput when lite_mode is disabled due to having to repeatedly call get_autogluon_metadata()? I'd expect not, but @liangfu might be able to run a quick sanity check on mainline compared to this PR.

Perhaps a better approach is to just have a dedicated is_lite_mode_enabled() function so it avoids having to generate all the extra information present in metadata.

Innixma · 2022-12-14T20:06:03Z

common/src/autogluon/common/utils/lite.py

+                return ret(*args, **kwargs)
+            return ret
+        metadata = get_autogluon_metadata()
+        if metadata['lite']:


for better chances at backward compatibility and general robustness, we should first check if 'lite' in metadata

Innixma · 2022-12-14T20:10:14Z

core/setup.py

+] if not ag.LITE_MODE else [
+    # version ranges added in ag.get_dependency_version_ranges()
+    'numpy',
+    'scipy',
+    'scikit-learn',
+    'pandas',
+    'tqdm',
+    'matplotlib',
+
+    f'{ag.PACKAGE_NAME}.common=={version}',


I don't see us relative importing dask or anything like that in this PR, do you know where dask and distributed are imported in our repo?

I don't see any either.

Innixma · 2023-01-05T23:46:58Z

common/src/autogluon/common/loaders/load_pd.py

@@ -161,6 +161,7 @@ def load_multi(path_list, delimiter=',', encoding='utf-8', columns_to_keep_list=

 def load_multipart_s3(bucket, prefix, columns_to_keep=None, dtype=None, sample_count=None, filters=None,
                      worker_count=None, multiprocessing_method='forkserver'):
+    from .load_s3 import list_bucket_prefix_suffix_s3


Can remove this line?

Innixma · 2023-01-05T23:48:59Z

tabular/setup.py

+    f'{ag.PACKAGE_NAME}.core=={version}',
+    f'{ag.PACKAGE_NAME}.features=={version}',


Should we do ag.PACKAGE_NAME for the non-lite requirements as well? Seems like a bit of code dupe going on currently.

good catch. seems to be merging issue.

tabular/src/autogluon/tabular/models/catboost/catboost_model.py

tabular/src/autogluon/tabular/models/fasttext/fasttext_model.py

tabular/src/autogluon/tabular/models/lgb/callbacks.py

tabular/src/autogluon/tabular/models/xgboost/callbacks.py

Innixma · 2023-01-05T23:57:11Z

Overall looks great! I added some comments, mostly minor cleanup around consistent usage of ResourceManager.

github-actions · 2023-01-06T01:05:57Z

Job PR-2384-1d82286 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/1d82286/index.html

github-actions · 2023-01-10T20:11:46Z

Job PR-2384-30f640e is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/30f640e/index.html

github-actions · 2023-01-10T22:16:12Z

Job PR-2384-cb14e91 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/cb14e91/index.html

liangfu

LGTM. Just to block accidental merge, since we are about to release v0.6.2

Innixma

LGTM! Awesome work!

github-actions · 2023-01-17T18:53:01Z

Job PR-2384-69e0498 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2384/69e0498/index.html

yzhliu added 8 commits October 14, 2022 10:50

aglite-test installs in JupyterLite

51d0a2c

9/12 notebook in tabular_prediction works

c9c255f

Merge remote-tracking branch 'origin/master' into lite

a7c178b

lite.py in common/utils

6603131

move xgboost to extras_require

6016c3f

Merge remote-tracking branch 'origin/master' into lite

4601888

remove wheel, disable psutil in hpo

99efe43

Merge remote-tracking branch 'origin/master' into lite

15fc1f2

liangfu reviewed Nov 17, 2022

View reviewed changes

Innixma requested review from Innixma and gradientsky November 17, 2022 22:48

gradientsky reviewed Nov 21, 2022

View reviewed changes

yzhliu added 2 commits November 28, 2022 09:54

Merge remote-tracking branch 'origin/master' into lite

b66fdf2

remove ps_utils and use ResourceManager

14800e4

yzhliu commented Nov 28, 2022

View reviewed changes

yzhliu added 5 commits November 29, 2022 11:19

remove disable_if_lite_mode for _get_gpu_count_cuda

b1ac540

Merge remote-tracking branch 'origin/master' into lite

f7a59f1

fix __lite__ mark

1fc4efb

Merge remote-tracking branch 'origin/master' into lite

9d2f603

Merge remote-tracking branch 'origin/master' into lite

63d689f

liangfu approved these changes Dec 5, 2022

View reviewed changes

Innixma reviewed Dec 14, 2022

View reviewed changes

yzhliu added 3 commits December 16, 2022 16:52

use ResourceManager and avoid lazy import psutil

3921559

Merge remote-tracking branch 'origin/master' into lite

a0d807d

Merge remote-tracking branch 'origin/master' into lite

6532614

ResourceManager refactor for lite mode

1d82286