Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 support #326

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open

S3 support #326

wants to merge 36 commits into from

Conversation

gmaze
Copy link
Member

@gmaze gmaze commented Jan 19, 2024

The Argo ADMT is experiencing with Amazon S3 in order to move the GDAC infrastructure into the cloud.
In order to prepare argopy for this and to be able to access and test the AWS prototype server, we need to develop support for S3.
This would require:

  • New file store to support S3 with fsspec, this is based on s3fs
  • Update Index store to support S3

A new data fetcher will be developed in another PR

@gmaze gmaze self-assigned this Apr 15, 2024
@gmaze gmaze added enhancement New feature or request backends performance labels Apr 15, 2024
@gmaze
Copy link
Member Author

gmaze commented Apr 22, 2024

@tcarval is there any reasons for not having the gz index files on s3 ?
https://argo-gdac-sandbox.s3.eu-west-3.amazonaws.com/pub/index.html#pub/idx/

@tcarval
Copy link

tcarval commented May 6, 2024

@tcarval is there any reasons for not having the gz index files on s3 ? https://argo-gdac-sandbox.s3.eu-west-3.amazonaws.com/pub/index.html#pub/idx/

I am adding the gz indexes (the synchronization gdac - aws is underway)

@gmaze
Copy link
Member Author

gmaze commented May 17, 2024

New IndexStore ready to work with AWS S3 core index file

from argopy import ArgoIndex
idx = ArgoIndex(host='s3://argo-gdac-sandbox/pub/idx').load()
idx.search_wmo_cyc(6903091, 1)

poke @tcarval

@gmaze gmaze requested review from quai20 May 17, 2024 09:56
gmaze added 15 commits May 17, 2024 15:02
… on s3

- refactor argo_index_pa and argo_index_pd to use argostore super init
- index_path now a dynamic property
- index_path set to use gz file when found on server at instanciation
- index property not set when used with s3 store
- new search_s3 decorator for some search methods (to be used with s3)
- more Path usage
- minor reformatting
- fix bug in s3index to return appropriate empty pyarrow table if no SQL response is found
- add keywords/shortcuts for hosts
- add a decorator to fix errors raised when pyarrow is not availalble
fix bug for unknown AWS credentials with boto3 client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants