Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size limitation for the PutObject operation #1469

Open
sergii-mamedov opened this issue Feb 1, 2024 · 0 comments
Open

Size limitation for the PutObject operation #1469

sergii-mamedov opened this issue Feb 1, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@sergii-mamedov
Copy link
Contributor

After migrating to AWS for large datasets, an exception is observed when trying to save files related to imzML browser. This is due to limits other than IBM Cloud.

From AWS documentation:

Depending on the size of the data that you're uploading, Amazon S3 offers the following options:

Upload an object in a single operation by using the AWS SDKs, REST API, or AWS CLI – With a single PUT operation, you can upload a single object up to 5 GB in size.

Upload an object in parts by using the AWS SDKs, REST API, or AWS CLI – Using the multipart upload API operation, you can upload a single large object, up to 5 TB in size.

It is necessary to rewrite this part using multipart upload. We already have a similar implementation for python-client.

Traceback (most recent call last):
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/lithops.py", line 91, in _callback
    self._manager.annotate_lithops(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/daemons/dataset_manager.py", line 116, in annotate_lithops
    ServerAnnotationJob(executor, ds, perf, perform_enrichment=perform_enrichment).run()
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/annotation_job.py", line 347, in run
    self.results_dfs, self.png_cobjs, self.enrichment_data = self.pipe.run_pipeline(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 104, in run_pipeline
    self.load_ds(use_cache=use_cache)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/cache.py", line 81, in wrapper
    return f(self, *args, **kwargs)
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/pipeline.py", line 139, in load_ds
    ) = load_ds(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/load_ds.py", line 216, in load_ds
    (imzml_reader, ds_segments_bounds, ds_segms_cobjs, ds_segm_lens,) = executor.call(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 390, in call
    return self.map(
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 295, in map
    raise exc
  File "/opt/dev/metaspace/metaspace/engine/sm/engine/annotation_lithops/executor.py", line 331, in run
    return_vals = executor.get_result(futures)
botocore.exceptions.ClientError: An error occurred (EntityTooLarge) when calling the PutObject operation: Your proposed upload exceeds the maximum allowed size

Related issues:

  1. AWS S3 PutObject limitation lithops-cloud/lithops#1239
@sergii-mamedov sergii-mamedov added the bug Something isn't working label Feb 1, 2024
@sergii-mamedov sergii-mamedov self-assigned this Feb 1, 2024
sergii-mamedov added a commit that referenced this issue Feb 1, 2024
sergii-mamedov added a commit that referenced this issue Feb 8, 2024
* Previous implementation

* Creating docker image for AWS Lambda

* Temporary code

* Update boto3 and swap Process to Thread PoolExecutor

* Added EC2 support

* Add AWS EC2

* Change code related to imzml browser files

* Change executor class

* Fix test

* Fix some issues

* Executor changes

* Removed all the excess from load_ds

* Remove legacy code

* Fixed the test

* Fixed a bug

* Lithops 3.1.0

* Added the ability to run different EC2 instances depending on the required amount of RAM

* Remove everything related to md5 hash and file size from load_ds file

* Move calc_hash and file size to the pipeline

* Update config file template

* Change vars template file

* Fixed an error in the variable name

* Added the missing quote.

* Removed extra comma

* Added the missing comma�

* Added the missing comma

* Removed extra comma

* Fix bug in f-string

* Change parallelism to avoid S3's upload limit

* Temporary fix to avoid #1469

* Some changes in calc_save_ds_size_hash function

* Increase 2 times amount of RAM for colocalization step (temporary)

* Move save_size_hash() as last call in annotate_lithops function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant