[Umbrella] Object Storage Support (Help Wanted) #1030

jerqi · 2023-07-23T07:48:27Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the proposal

Now, remote shuffle storage only support Hadoop Compatible Filesystem. Object storage is also important and widely used in the big data system. Object storage has different implement. Some systems don't support the method of list or have bad performance of the method list. Some systems need to consider file names to use more buckets of object. Some object storage don't support the method append. Now, we use store index and data separately, it will cause many small index files if we don't support the method append. So we should consider merging index files and data files. To have better performance, we should support object storage to adapt different object storage systems.
https://docs.google.com/document/d/1E88wZA9Yhr-pGeUEfxo6uSgsIXxg_ivPYBNcTOeaaZA/edit

Task list

[Improvement] Merge data file and index file #892
[FEATURE] Expect to support the filesystem not implementing the append-mode. #391
documents for object storage (need to create an issue)
[FEATURE] Add an interface for object storage #1133

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

jerqi · 2023-07-23T07:57:32Z

I'm not familiar with object storage. Could you give me more inputs? @hiboyang @pspoerri @melodyyangaws @zhaohc10 @LantaoJin @yuyang733

jerqi · 2023-07-23T07:58:08Z

cc @xianjingfeng @zuston Could we finish this issue together?

pspoerri · 2023-07-23T11:08:56Z

Regarding upload to S3: As long as you use the Apache HDFS S3A adapter you can stream data to an object store. However you can only append as long as you keep the stream open and you can only do so from a single client. The S3A filesystem implementation uses buffered multi-part uploads to stream a file to an object store. Streaming from multiple clients should be possible in principle, but the coordination overhead and the way Java streams are implemented make things tricky.

Regarding .index files: For better performance you can always cache the index files, or serve them from a different location (Redis, Uniffle Coordinator, etc...).

Regarding list support: You can always store the list of objects somewhere else if you want to avoid any expensive file-listing operations. spark-s3-shuffle only uses listings when it needs to delete objects.

jerqi · 2023-07-24T02:23:11Z

Regarding upload to S3: As long as you use the Apache HDFS S3A adapter you can stream data to an object store. However you can only append as long as you keep the stream open and you can only do so from a single client. The S3A filesystem implementation uses buffered multi-part uploads to stream a file to an object store. Streaming from multiple clients should be possible in principle, but the coordination overhead and the way Java streams are implemented make things tricky.

Regarding .index files: For better performance you can always cache the index files, or serve them from a different location (Redis, Uniffle Coordinator, etc...).

Regarding list support: You can always store the list of objects somewhere else if you want to avoid any expensive file-listing operations. spark-s3-shuffle only uses listings when it needs to delete objects.

Thanks for your input.

zuston · 2023-07-24T02:30:59Z

cc @xianjingfeng @zuston Could we finish this issue together?

Yes. I will

Regarding upload to S3: As long as you use the Apache HDFS S3A adapter you can stream data to an object store. However you can only append as long as you keep the stream open and you can only do so from a single client. The S3A filesystem implementation uses buffered multi-part uploads to stream a file to an object store. Streaming from multiple clients should be possible in principle, but the coordination overhead and the way Java streams are implemented make things tricky.

I hope the append could be avoided in this design. And I think it's OK to store same partition data into different files in object store. Like this:

s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/0.index
s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/0.data
s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/1.index
s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/1.data
....


....
s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/990.index
s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/990.data

The one flush of shuffle-server for one partition could be flushed into one file. But this is ensured by the following rules.

The partition must be managed by single shuffle-server. Because the id of file prefix name only known with shuffle-server

For reader, it could get the endId (s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/endId.data) from the shuffle-server. That means we need not list operation

If I'm wrong, feel free to point out

xianjingfeng · 2023-07-24T03:11:59Z

cc @xianjingfeng @zuston Could we finish this issue together?

It is ok for me.

xianjingfeng · 2023-07-24T03:23:29Z

s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/990.data

s3a://xxxxxxxxxx/{app_id}/{shuffle_id}/{partition_id}/{shuffle_server_id}/990.data may be better?

Regarding .index files: For better performance you can always cache the index files, or serve them from a different location (Redis, Uniffle Coordinator, etc...)

Agree. And I think we can let users to choose whether store index files in memory or other external system.

hiboyang · 2023-08-05T22:31:41Z

Yeah, a lot of small index files will not work well in object storage like S3. Maybe good idea to store in other places. Or is it possible to serve index file from Spark driver?

pegasas · 2023-08-06T18:08:06Z

With my limited experience, I think s3 is a good choice in this issue.

If we choose to support s3, then it will easily extend to other filesystems (NFS, CIFS, EFS, GCS fuse, Azure File System) by using a solution like MinIO.

Yeah, a lot of small index files will not work well in object storage like S3. Maybe a good idea to store it in other places. Or is it possible to serve index files from Spark driver?

I think we may have other solution for merging small index files like application-and-practice-of-spark-small-file-merging-function-on-aws-s3?

Feel free to correct me if I am wrong.

jerqi · 2023-08-07T03:09:40Z

Yeah, a lot of small index files will not work well in object storage like S3. Maybe good idea to store in other places. Or is it possible to serve index file from Spark driver?

Thanks for your input.

jerqi · 2023-08-07T03:12:40Z

With my limited experience, I think s3 is a good choice in this issue.

If we choose to support s3, then it will easily extend to other filesystems (NFS, CIFS, EFS, GCS fuse, Azure File System) by using a solution like MinIO.

Yeah, a lot of small index files will not work well in object storage like S3. Maybe a good idea to store it in other places. Or is it possible to serve index files from Spark driver?

I think we may have other solution for merging small index files like application-and-practice-of-spark-small-file-merging-function-on-aws-s3?

Feel free to correct me if I am wrong.

I will propose a document at this weekend. First, we can only define some interfaces. Then, we will implement some object systems according to the need of xianjingfeng and zuston.

@xianjingfeng What object system do you want to implement?

jerqi · 2023-08-07T03:14:22Z

@jiafuzha Do you have extra input?

xianjingfeng · 2023-08-07T14:34:20Z

@xianjingfeng What object system do you want to implement?

s3

jiafuzha · 2023-08-14T02:40:36Z

@jiafuzha Do you have extra input?

I was on vacation last week.
Do we have any interface defined for RemoteStorageManager? I am looking forward to it since our DAOS is pure remote storage.

zuston · 2023-11-10T03:21:25Z

Is this on the roadmap? @jerqi @xianjingfeng

zuston · 2024-02-07T02:01:01Z

I'm interested on this proposal, and will implement this in rust side. @jerqi

jerqi pinned this issue Jul 23, 2023

jerqi changed the title ~~[Umbrella] Object Storage Support~~ [Umbrella] Object Storage Support (help wanted) Jul 23, 2023

jerqi changed the title ~~[Umbrella] Object Storage Support (help wanted)~~ [Umbrella] Object Storage Support (Help wanted) Jul 24, 2023

jerqi added the help wanted Extra attention is needed label Jul 24, 2023

jerqi changed the title ~~[Umbrella] Object Storage Support (Help wanted)~~ [Umbrella] Object Storage Support (Help Wanted) Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Umbrella] Object Storage Support (Help Wanted) #1030

[Umbrella] Object Storage Support (Help Wanted) #1030

jerqi commented Jul 23, 2023 •

edited

jerqi commented Jul 23, 2023 •

edited

jerqi commented Jul 23, 2023

pspoerri commented Jul 23, 2023

jerqi commented Jul 24, 2023 •

edited

zuston commented Jul 24, 2023

xianjingfeng commented Jul 24, 2023

xianjingfeng commented Jul 24, 2023

hiboyang commented Aug 5, 2023

pegasas commented Aug 6, 2023

jerqi commented Aug 7, 2023

jerqi commented Aug 7, 2023

jerqi commented Aug 7, 2023

xianjingfeng commented Aug 7, 2023

jiafuzha commented Aug 14, 2023

zuston commented Nov 10, 2023

zuston commented Feb 7, 2024

[Umbrella] Object Storage Support (Help Wanted) #1030

[Umbrella] Object Storage Support (Help Wanted) #1030

Comments

jerqi commented Jul 23, 2023 • edited

Code of Conduct

Search before asking

Describe the proposal

Task list

Are you willing to submit PR?

jerqi commented Jul 23, 2023 • edited

jerqi commented Jul 23, 2023

pspoerri commented Jul 23, 2023

jerqi commented Jul 24, 2023 • edited

zuston commented Jul 24, 2023

xianjingfeng commented Jul 24, 2023

xianjingfeng commented Jul 24, 2023

hiboyang commented Aug 5, 2023

pegasas commented Aug 6, 2023

jerqi commented Aug 7, 2023

jerqi commented Aug 7, 2023

jerqi commented Aug 7, 2023

xianjingfeng commented Aug 7, 2023

jiafuzha commented Aug 14, 2023

zuston commented Nov 10, 2023

zuston commented Feb 7, 2024

jerqi commented Jul 23, 2023 •

edited

jerqi commented Jul 23, 2023 •

edited

jerqi commented Jul 24, 2023 •

edited