Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-14837 : Support Read Restored Glacier Objects #6407

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

bpahuja
Copy link

@bpahuja bpahuja commented Jan 4, 2024

Description of PR

Currently S3A does not distinguish Glacier and Glacier Deep Archive files, as it doesn't examine the storage class or verify if an object is in the process of restoration from Glacier. Attempting to access an in-progress Glacier file via S3A results in an AmazonS3Exception, indicating the operation's invalidity for the object's storage class.

As part of this change, Users will be able to successfully read restored glacier objects from the s3 location of the table using S3A. It will ignore any Glacier archived files if they are in process of being restored asynchronously. There will be no change to the existing behavior and additional configuration will be needed to enable the above mentioned flow.

The config which would control the behavior of the S3AFileSystem with respect to glacier storage classes will be fs.s3a.glacier.read-restored-objects

The config can have 3 values:

  • READ_ALL : This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the users and not changing the experience for them.

  • SKIP_ALL_GLACIER: If this value is set then this will ignore any S3 Objects which are tagged with Glacier storage classes and retrieve the others.

  • READ_RESTORED_GLACIER_OBJECTS: If this value is set then restored status of the Glacier object will be checked, if restored the objects would be read like normal S3 objects else they will be ignored as the objects would not have been retrieved from the S3 Glacier. ( To check the restored status, newly introduced RestoredStatus will be used which is present in the S3 Object). This wasn't previously possible as ListObjects did not return any information about the restore status of an object, only it's storage class.

A new FileStatusAcceptor is created which will use the RestoreStatus attribute from the S3Object and will filter out or include the glacier objects from the list as defined by the config. FileStatusAcceptor is an interface with 3 overloaded predicates, which filter the files based on the conditions defined in the said predicates. A new attribute RestoreStatus of will be used from the response of ListObjects . This field will indicate whether the object is unrestored, restoring, or restored, and also when the expiration of that restore is.

How was this patch tested?

Integration Tests (hadoop-aws)

All the Integration tests are passing, the tests were run in accordance with https://hadoop.apache.org/docs/current2/hadoop-aws/tools/hadoop-aws/testing.html. The tests were executed in the region us-east-1.

There were 2 failures observed which seem intermittent and unrelated to the change introduced in this CR. As the default behavior of S3AFileSystem was not changed.

Failures observed :

ITestS3ACommitterFactory.testEverything
ITestS3AConfiguration.testRequestTimeout

Manual Testing

Manual testing of the change was done with Spark v3.5

A Parquet table was created using the following in Spark-SQL.

CREATE DATABASE IF NOT EXISTS glacier_test  location "s3a://<bucket>/data/glacier_test";

USE glacier_test;

CREATE TABLE IF NOT EXISTS parquet_glacier_test (id int, data string) using parquet location "s3a://<bucket>/data/glacier_test/parquet_glacier_test";

INSERT INTO parquet_glacier_test  VALUES (1, 'a'), (2, 'b'), (3, 'c');

INSERT INTO parquet_glacier_test  VALUES (4, 'a'), (5, 'b'), (6, 'c');

INSERT INTO parquet_glacier_test  VALUES (7, 'a'), (8, 'b'), (9, 'c');

Was able to successfully retrieve the data using the following.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
|  1|   a|
|  2|   b|
|  3|   c|
+---+----+

The storage class of the file s3://<bucket>/data/glacier_test/parquet_glacier_test/part-00000-f9cb400e-35b2-41f7-9c39-8e34cd830fed-c000.snappy.parquet was changed to Glacier Flexible Retrieval (formerly Glacier) from Standard.

When trying to retrieve the data again form the same table, the following exception was observed.

software.amazon.awssdk.services.s3.model.InvalidObjectStateException: The operation is not valid for the object's storage class (Service: S3, Status Code: 403, Request ID: X05JDR633AAK4TBQ, Extended Request ID: uOxWdN4giUAuB9a4YWvnyrXPYCi2U35P5BrHhFO3aLSLLe4GtWhXGXCEJ/Ld5EyGr5b6VezTzeI=):InvalidObjectState
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243)
	at org.apache.hadoop.fs.s3a.Invoker.onceTrackingDuration(Invoker.java:149)
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:278)
	at org.apache.hadoop.fs.s3a.S3AInputStream.lambda$lazySeek$1(S3AInputStream.java:425)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$3(Invoker.java:284)
	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
	at org.apache.hadoop.fs.s3a.Invoker.lambda$maybeRetry$5(Invoker.java:408)
	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:404)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:282)
	at org.apache.hadoop.fs.s3a.Invoker.maybeRetry(Invoker.java:326)
	at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:417)
	at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:536)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)

I restarted the spark-sql session by setting the following config.

spark-sql --conf spark.hadoop.fs.s3a.glacier.read-restored-objects=SKIP_ALL_GLACIER

Trying to access the table now , resulted in the following.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

The spark-sql session was restarted with the following config

spark-sql --conf spark.hadoop.fs.s3a.glacier.read-restored-objects=READ_RESTORED_GLACIER_OBJECTS

Trying to access the table now , resulted in the same result as the previous step as unrestored glacier file was ignored when the table was read.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

The restore for the file s3://<bucket>/data/glacier_test/parquet_glacier_test/part-00000-f9cb400e-35b2-41f7-9c39-8e34cd830fed-c000.snappy.parquet was initiated using the S3 Console.

Trying to access the table now , resulted in the same result as the previous step as the glacier file was still being restored and was not available.

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
+---+----+

On retrying after 5-7 minutes ( as it was expedited retrieval ) the following was the result, which is as expected :

SELECT * FROM parquet_glacier_test;

+---+----+
| id|data|
+---+----+
|  7|   a|
|  8|   b|
|  9|   c|
|  4|   a|
|  5|   b|
|  6|   c|
|  1|   a|
|  2|   b|
|  3|   c|
+---+----+

For code changes:

  • [Yes] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [Yes] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [NA] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [NA] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Copy link
Contributor

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing, looks good mostly, sharing some initial feedback.

Let's get rid of the acceptor, update documentation and should add an ITest.
Test that for a glacier file, if it's READ_ALL, an error is thrown (I think that is existing behaviour?), if it's SKIP_ALL_GLACIER then it's skipped. and test restore behaviour too if possible.

/**
* S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem
* The config can have 3 values:
* READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying current behaviour, can you specify what that current behaviour is. It errors right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will update

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of my comments are on the basic stuff, especially those test assertions and the need to have a single factored out assertion() method.

Now, can we have the storage class an attribute in S3AFileStatus? populated in listings and from HEAD requests? included in .toString()? that could be useful in future

@@ -18,6 +18,7 @@

package org.apache.hadoop.fs.s3a;

import org.apache.hadoop.util.Lists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go with the rest of the hadoop imports. it's in the wrong place in a lot of files as the move off guava lists was a search and replace without reordering

* Accept all entries except Glacier objects if the config fs.s3a.glacier.read.restored.objects,
* is set to SKIP_ALL_GLACIER
*/
public static class GlacierStatusAcceptor implements FileStatusAcceptor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this storage class come in on a v2 LIST request? guess it must...

@@ -52,6 +52,8 @@
import java.util.concurrent.atomic.AtomicBoolean;
import javax.annotation.Nullable;

import org.apache.hadoop.fs.s3a.Listing.AcceptAllButSelfAndS3nDirs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, import ordering

@@ -5213,7 +5223,7 @@ private RemoteIterator<S3ALocatedFileStatus> innerListFiles(
RemoteIterator<S3ALocatedFileStatus> listFilesAssumingDir =
listing.getListFilesAssumingDir(path,
recursive,
acceptor,
acceptors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: restore consistent indentation



/**
* S3ObjectStorageClassFilter will filter the S3 files based on the fs.s3a.glacier.read.restored.objects configuration set in S3AFileSystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be good to use

on lines and {@code } around formatted code so that javadocs and IDEs render better

RestoreStatus.builder().isRestoreInProgress(false).build(),
ObjectStorageClass.GLACIER));

Assert.assertFalse(result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i require a meaningful error message on all failures. ideally using assertj whose describedAs() class can do string formatting.

o = getS3ObjectWithStorageClassAndRestoreStatus(...)
Assertions.assertThat(acceptor.accept(...))
 .describedAs("accept %s", o)
 .isFalse()

just imagine you've seen a test failure: what information would you want in the assertion message to begin debugging it?

RestoreStatus.builder().isRestoreInProgress(false).build(),
ObjectStorageClass.DEEP_ARCHIVE));

Assert.assertFalse(result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. you could actually have an assertAcceptance(object, outcome) method instead of all this duplication

@@ -576,6 +580,10 @@ public void initialize(URI name, Configuration originalConf)

s3aInternals = createS3AInternals();

s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getTrimmed(key, ""), toUpper() and then do the matching. fail meaningfully if the value isn't recognised.

@bpahuja
Copy link
Author

bpahuja commented Jan 19, 2024

most of my comments are on the basic stuff, especially those test assertions and the need to have a single factored out assertion() method.

Am converting the current test into an ITest, will update assertions according to the recommendations

Now, can we have the storage class an attribute in S3AFileStatus? populated in listings and from HEAD requests? included in .toString()? that could be useful in future

This is something that can be done, do we want this as part of this PR ? Or a separate one adding the storage class to S3AFileStatus ?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 44s Maven dependency ordering for branch
+1 💚 mvninstall 22m 14s trunk passed
+1 💚 compile 9m 50s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 8m 59s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 2m 29s trunk passed
+1 💚 mvnsite 1m 20s trunk passed
+1 💚 javadoc 0m 59s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 46s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 2m 0s trunk passed
+1 💚 shadedclient 21m 15s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 15s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 8m 45s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 8m 45s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 8m 9s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌ javac 8m 9s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 9s /results-checkstyle-root.txt root: The patch generated 22 new + 6 unchanged - 0 fixed = 28 total (was 6)
-1 ❌ mvnsite 0m 24s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 47s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 ❌ spotbugs 0m 25s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 21m 14s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 27s hadoop-common in the patch passed.
-1 ❌ unit 0m 28s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
150m 53s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux e8c67875814a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4a9f038
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/testReport/
Max. process+thread count 3149 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 42s Maven dependency ordering for branch
+1 💚 mvninstall 22m 18s trunk passed
+1 💚 compile 9m 43s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 9m 4s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 2m 27s trunk passed
+1 💚 mvnsite 1m 18s trunk passed
+1 💚 javadoc 0m 57s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 2m 3s trunk passed
+1 💚 shadedclient 21m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 16s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 8m 39s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 8m 39s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 8m 7s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 ❌ javac 8m 7s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 12s /results-checkstyle-root.txt root: The patch generated 22 new + 6 unchanged - 0 fixed = 28 total (was 6)
-1 ❌ mvnsite 0m 23s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 50s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 ❌ spotbugs 0m 24s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 21m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 28s hadoop-common in the patch passed.
-1 ❌ unit 0m 28s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 0m 31s The patch does not generate ASF License warnings.
150m 41s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname Linux 5fa6c0e6973e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4a9f038
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/testReport/
Max. process+thread count 1281 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 57s trunk passed
+1 💚 compile 0m 26s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 21s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 21s trunk passed
+1 💚 mvnsite 0m 28s trunk passed
+1 💚 javadoc 0m 19s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 45s trunk passed
+1 💚 shadedclient 19m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 19s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 20s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 12s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 18 new + 6 unchanged - 0 fixed = 24 total (was 6)
+1 💚 mvnsite 0m 18s the patch passed
+1 💚 javadoc 0m 10s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 43s the patch passed
+1 💚 shadedclient 19m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 26s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
81m 44s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 4dfd5fc0d4f0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f0144a7
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/testReport/
Max. process+thread count 552 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 31s trunk passed
+1 💚 compile 0m 25s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 17s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 20s trunk passed
+1 💚 mvnsite 0m 26s trunk passed
+1 💚 javadoc 0m 18s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 44s trunk passed
+1 💚 shadedclient 19m 20s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 21s the patch passed
+1 💚 compile 0m 15s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 10s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 13 new + 6 unchanged - 0 fixed = 19 total (was 6)
+1 💚 mvnsite 0m 17s the patch passed
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 45s the patch passed
+1 💚 shadedclient 19m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 28s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 22s The patch does not generate ASF License warnings.
82m 58s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 732160df4277 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4e9b673
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/testReport/
Max. process+thread count 551 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran self-assigned this Jan 25, 2024
Copy link
Contributor

@ahmarsuhail ahmarsuhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good mostly, but the ITest needs work. will sync with you on that

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

s3aInternals = createS3AInternals();

s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer conf.get(READ_RESTORED_GLACIER_OBJECTS, READ_ALL) , meaning READ_ALL is the default. and then you can get rid of the orElse()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmarsuhail but doing the the way it is does handle case differences.

I'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

I'd change Configuration to do that case mapping if it wasn't such a critical class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

* <pre>
* {@link S3ObjectStorageClassFilter} will filter the S3 files based on the {@code fs.s3a.glacier.read.restored.objects} configuration set in {@link S3AFileSystem}
* The config can have 3 values:
* {@code READ_ALL}: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior for the customers and not changing the experience for them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove This will be done to keep the current behavior for the customers and not changing the experience for them. and add something like "Retrieval of Galcier files will fail with xxx ` whatever the error is currently

@@ -411,4 +416,8 @@ public RequestFactory getRequestFactory() {
public boolean isCSEEnabled() {
return isCSEEnabled;
}

public S3ObjectStorageClassFilter s3ObjectsStorageClassFilter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to getS3ObjectStorageClassFilter(), and add java docs for the method

<description>
The config can have 3 values:

* READ_ALL: This would conform to the current default behavior of not taking into account the storage classes retrieved from S3. This will be done to keep the current behavior (i.e failing for an unrestored glacier class file) for the customers and not changing the experience for them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above, remove This will be done to keep the current behavior for the customers and not changing the experience for them. and add something like "Retrieval of Galcier files will fail with xxx ` whatever the error is currently

}

@Override
protected Configuration createConfiguration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move method to the top

newConf.set(STORAGE_CLASS, STORAGE_CLASS_GLACIER); // Create Glacier objects
skipIfStorageClassTestsDisabled(newConf);
disableFilesystemCaching(newConf);
removeBaseAndBucketOverrides(newConf, STORAGE_CLASS, FAST_UPLOAD_BUFFER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line 103 and 104, don't think they're needed ..


@Parameterized.Parameters(name = "fast-upload-buffer-{0}")
public static Collection<Object[]> params() {
return Arrays.asList(new Object[][]{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to parameterize here as we already test this in ITestS3AStorageClass. here we just want to focus on this glacier specific behaviour.


FileSystem fs = contract.getTestFileSystem();
Path dir = methodPath();
fs.mkdirs(dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to test for this here

@Override
protected Configuration createConfiguration() {
Configuration newConf = super.createConfiguration();
newConf.set(STORAGE_CLASS, STORAGE_CLASS_GLACIER); // Create Glacier objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should test for Glaicer and deep archive though..as that's in your StorageClassFilterMap

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 16s #6407 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #6407
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/7/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 31s trunk passed
+1 💚 compile 0m 23s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 19s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 19s trunk passed
+1 💚 mvnsite 0m 23s trunk passed
+1 💚 javadoc 0m 17s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 22s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 41s trunk passed
+1 💚 shadedclient 19m 21s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 16s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 20s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 13s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 21 new + 6 unchanged - 0 fixed = 27 total (was 6)
+1 💚 mvnsite 0m 18s the patch passed
+1 💚 javadoc 0m 10s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 41s the patch passed
+1 💚 shadedclient 19m 5s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
81m 4s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 6f1c70e86dea 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d035d15
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/testReport/
Max. process+thread count 663 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

s3aInternals = createS3AInternals();

s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmarsuhail but doing the the way it is does handle case differences.

I'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

I'd change Configuration to do that case mapping if it wasn't such a critical class

this.filter = filter;
}

private static boolean isNotGlacierObject(S3Object object) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadocs all the way down here, thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will add for methods in the class

@@ -25,6 +25,7 @@
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;

import org.apache.hadoop.fs.s3a.S3ObjectStorageClassFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move down to the rest of the org.apache.
these guava things are in the wrong block due to the big search and replace which created them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -581,6 +583,12 @@ public void initialize(URI name, Configuration originalConf)

s3aInternals = createS3AInternals();

s3ObjectStorageClassFilter = Optional.ofNullable(conf.get(READ_RESTORED_GLACIER_OBJECTS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

S3Client s3Client = getFileSystem().getS3AInternals().getAmazonS3Client("test");

// Create a restore object request
RestoreObjectRequest requestRestore = RestoreObjectRequest.builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer this was in the RequestFactory interface and builder, as it'll let us do things like add audit context and anything else in future

getFilePrefixForListObjects(), "/");
S3Object s3GlacierObject = getS3GlacierObject(s3Client, s3ListRequest);

while ((s3GlacierObject != null && s3GlacierObject.restoreStatus().isRestoreInProgress()) && retryCount < MAX_RETRIES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LambdaTestUtils.await() is designed to handle this.

skipIfStorageClassTestsDisabled(newConf);
disableFilesystemCaching(newConf);
removeBaseAndBucketOverrides(newConf, STORAGE_CLASS);
newConf.set(REJECT_OUT_OF_SPAN_OPERATIONS, "false");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better: create an audit span


enum Type { GLACIER_AND_DEEP_ARCHIVE, GLACIER }

@Parameterized.Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • look at other uses of this to see how we generate useful strings for logs
  • be aware the pattern is used in the method path, so musn't create invalid paths. it's just text is so much better than [0]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will update

@bpahuja
Copy link
Author

bpahuja commented Feb 6, 2024

I 'd go for getTrimmed(READ_RESTORED_GLACIER_OBJECTS, ""); if empty string map to empty optional, otherwise .toupper and valueof. one thing to consider: meaningful failure if the value doesn't map.

or we just go for "upper case is required" and use what you've proposed. more brittle but simpler?

Regarding the above, I will update the config read to getTrimmed and set the config values of defualt to READ_ALL instead of "".

meaningful failure if the value doesn't map

the valueof method will throw the illegalArgs exception if an invalid value is set in the config

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 1m 40s /branch-mvninstall-root.txt root in trunk failed.
-1 ❌ compile 0m 22s /branch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-aws in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 3m 21s /branch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt hadoop-aws in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-0 ⚠️ checkstyle 0m 19s /buildtool-branch-checkstyle-hadoop-tools_hadoop-aws.txt The patch fails to run checkstyle in hadoop-aws
-1 ❌ mvnsite 0m 21s /branch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in trunk failed.
-1 ❌ javadoc 0m 22s /branch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-aws in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 5s /branch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt hadoop-aws in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 💚 spotbugs 1m 45s trunk passed
-1 ❌ shadedclient 4m 27s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 20s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 0m 20s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 0m 20s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 27 new + 0 unchanged - 0 fixed = 27 total (was 0)
+1 💚 mvnsite 0m 19s the patch passed
+1 💚 javadoc 0m 11s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 39s the patch passed
-1 ❌ shadedclient 1m 11s patch has errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 22s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+0 🆗 asflicense 0m 20s ASF License check generated no output?
19m 52s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux b4c0370956e2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e110e2a
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/testReport/
Max. process+thread count 88 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@bpahuja
Copy link
Author

bpahuja commented Feb 7, 2024

Build failed due to an OOM Error,

build passing on local

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:33 min
[INFO] Finished at: 2024-02-07T08:13:12Z
[INFO] ------------------------------------------------------------------------
[ERROR] unable to create new native thread -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 45s trunk passed
+1 💚 compile 0m 25s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 20s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 21s trunk passed
+1 💚 mvnsite 0m 25s trunk passed
+1 💚 javadoc 0m 18s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 21s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 43s trunk passed
+1 💚 shadedclient 19m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 17s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 12s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 21 new + 6 unchanged - 0 fixed = 27 total (was 6)
+1 💚 mvnsite 0m 18s the patch passed
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 42s the patch passed
+1 💚 shadedclient 19m 1s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 28s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
81m 32s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 8e4ff0d8df99 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c5dcd5c
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/10/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really close now: just import placement (always a PITA) and some checkstyle line length complaints. Other than that: ready to merge!

@@ -18,6 +18,7 @@

package org.apache.hadoop.fs.s3a;

import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new org.apache imports MUST go into that block for new classes. set your IDE up for this and life gets simpler for all

@@ -52,6 +52,7 @@
import java.util.concurrent.atomic.AtomicBoolean;
import javax.annotation.Nullable;

import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -37,7 +37,7 @@
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3AInputPolicy;
import org.apache.hadoop.fs.s3a.S3AStorageStatistics;
import org.apache.hadoop.fs.s3a.S3ObjectStorageClassFilter;
import org.apache.hadoop.fs.s3a.api.S3ObjectStorageClassFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to move down to L42 now, as its in a sub package.


import org.apache.hadoop.fs.s3a.S3AFileSystem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, this is a branch new class. in which case the L22 import can go into the org.apache block and the S3aFilesystem one with it. the only reason it is in the wrong place in existing code is that the move to repackaged classes was a big search and replace only: no re-ordering

@@ -89,7 +92,7 @@ private FileSystem createFiles(String s3ObjectStorageClassFilter) throws Throwab
FileSystem fs = contract.getTestFileSystem();
Path dir = methodPath();
fs.mkdirs(dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip this for a marginal increase in performance. create() assumes there's a dir and just creates the object in the destination path without checks.

@steveloughran
Copy link
Contributor

@bpahuja have you tested this with any of: s3 express, third party stores, gcs? Just want to make sure things work and the tests are skipped?

I'll inevitably do the test runs anyway, but it'd save me time and effort, and as I don't run those regularly enough, multiple commits may be the source of regressions

@bpahuja
Copy link
Author

bpahuja commented Feb 16, 2024

@bpahuja have you tested this with any of: s3 express, third party stores, gcs? Just want to make sure things work and the tests are skipped?

Hello @steveloughran , wanted to confirm the requirement here ? Is it that you wanted me to ensure that this test is skipped when ran in other environments ?

If that's the case then IMO, it should work as other Integration tests do in this package as it is being inheriting AbstractS3ATestBase.

Do let me know if something other then this is needed from me :)

Thanks

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 13s trunk passed
+1 💚 compile 0m 23s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 19s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 20s trunk passed
+1 💚 mvnsite 0m 23s trunk passed
+1 💚 javadoc 0m 17s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 21s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 44s trunk passed
+1 💚 shadedclient 19m 11s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 19m 23s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 19s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 14s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 11s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 6 unchanged - 0 fixed = 8 total (was 6)
+1 💚 mvnsite 0m 19s the patch passed
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 16s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 40s the patch passed
+1 💚 shadedclient 18m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 13s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
80m 59s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 7c86bd58d065 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / df48a67
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/testReport/
Max. process+thread count 551 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/14/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@bpahuja
Copy link
Author

bpahuja commented Feb 21, 2024

Hello @steveloughran, Just a gentle reminder to review the PR. Thanks 😄

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 54s trunk passed
+1 💚 compile 0m 28s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 23s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 30s trunk passed
+1 💚 javadoc 0m 21s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 48s trunk passed
+1 💚 shadedclient 22m 54s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 23m 7s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 20s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 20s the patch passed
+1 💚 compile 0m 15s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 0m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 14s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 6 unchanged - 0 fixed = 8 total (was 6)
+1 💚 mvnsite 0m 19s the patch passed
+1 💚 javadoc 0m 11s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 0m 47s the patch passed
+1 💚 shadedclient 22m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 15s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 26s The patch does not generate ASF License warnings.
93m 44s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux fe484a3a2b53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cbb8580
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/testReport/
Max. process+thread count 556 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/15/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@bpahuja
Copy link
Author

bpahuja commented Feb 26, 2024

Hello @steveloughran, Just a gentle reminder to review the PR. Thanks 😄

@steveloughran
Copy link
Contributor

I'm doing nothing but helping it get Hadoop 3.4.0 out the door this week. No code of my except related to packaging; no reviews of other people except related to the release at all. Sorry.

Well you are waiting for a review on your code from myself or someone else -why not getting involved? There is no release engineering team doing this and it is up to all of us developers in the community to get our hands dirty. We all have different deployment environments and we all have different things we want to test. And, given this is the first public release with the AWS V2 SDK: it matters a lot that this thing ships. It will be the way we actually find real world bugs.

Look on the hadoop common-dev list for the announcement of the next release candidate: once the announcement is made we have five days to test and vote on the RC. That is why we are under so much time pressure here.

Note I've created a project, https://github.com/apache/hadoop-release-support , to assist in qualifying. One thing which would be good it would be some extra scripts we could run to help validate storage operations -operations which we can then execute against cloud storage from either the local host or a remote one -I am qualifying the Arm64 build on a raspberry pi5 under my television and would like to have that testing fully automated.

Anything you can do here will be very much appreciated. And like I said: I'm unlikely to be looking at any other code right now.

@bpahuja
Copy link
Author

bpahuja commented Mar 18, 2024

Hello @steveloughran, Just a gentle reminder to review the PR :) , Do take a look whenever you get some time

Thanks

@steveloughran
Copy link
Contributor

I am catching up now, as warned giving priority to people who helped validate the RC. the core project is a community project, and everyone gets to participate

import software.amazon.awssdk.services.s3.model.S3Object;

import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.thirdparty.com.google.common.collect.Sets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use org.apache.hadoop.util.Sets here. its part of our attempt to isolate ourselves better from guava changes and the pain that causes downstream


[Amazon S3 Glacier (S3 Glacier)](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) is a secure and durable service for low-cost data archiving and
long-term backup.
With S3 Glacier, you can store your data cost effectively for months, years, or even decades.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "you" with something like "it is possible to store data more cost-effectively"; this is the docs for the connector, not marketing...

}

private FileSystem createFiles(String s3ObjectStorageClassFilter) throws Throwable {
Configuration conf = this.createConfiguration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant "remove the this. prefix"; just use createConfiguration() directly

try (FileSystem fs = createFiles(S3ObjectStorageClassFilter.SKIP_ALL_GLACIER.name())) {
Assertions.assertThat(
fs.listStatus(methodPath()))
.describedAs("FileStatus List of %s", methodPath()).isEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the .isEmpty() on the line below; we like to try and split them up as they're very complex to read.

});
}

private final int maxRetries = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are static/final, so make constants and upper case.

conf.set(READ_RESTORED_GLACIER_OBJECTS, s3ObjectStorageClassFilter);
// Create Glacier objects:Storage Class:DEEP_ARCHIVE/GLACIER
conf.set(STORAGE_CLASS, glacierClass);
S3AContract contract = (S3AContract) createContract(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do this. just do

fs = new S3AFileSystem()
fs.init(conf)

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with production code apart from a couple of minor comments; been testing locally now.

Having just tested it, it took 2+ minutes to complete, even while skipping 6 tests.

Even as part of a parallel run, that is going to hurt.

Would it be possible to abandon the elegance and structure of parameterised test suites and try and combine them?
that is:

Create one glaciated object and then verify the different filesystem configurations do/don't find it. This will save a lot of set up and tear down overhead.

I know it goes against the "test one thing per method" and "parameterize" but it helps scale.

if you look at the AbstractSTestS3AHugeFiles tests you can see how by running the test methods in name order we can actually retain some of this isolation. however, it complicates life in other ways (we need to keep the test files out of the normal test methodpath, can't run a test on its own anyway,... -so don't bother with this, just know that sometimes testing just gets complicated.

side issue: is this test going to run up significant charges?

@steveloughran
Copy link
Contributor

@bpahuja can you address the review comments? Otherwise I'm going to forget about it.

Get it done now and we can target 3.4.1

@bpahuja
Copy link
Author

bpahuja commented Apr 16, 2024

Sure @steveloughran, will prioritize this

@bpahuja
Copy link
Author

bpahuja commented May 16, 2024

@steveloughran , I have pushed the changes for ITestS3AReadRestoredGlacierObjects.java , and it now has only 1 test with all the test cases. In total , the setup happens twice, once for Glacier Storage class and once more for Glacier Deep Archive Storage class. The number of setups happening has been reduced , where earlier it created the file for each test case.
Please review whenever you get some time.
Let me know if any more changes are needed here.

Thanks :)

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 30s trunk passed
+1 💚 compile 0m 28s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 24s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 25s trunk passed
-1 ❌ javadoc 0m 24s /branch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in trunk failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ javadoc 0m 27s /branch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in trunk failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌ spotbugs 0m 26s /branch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in trunk failed.
+1 💚 shadedclient 3m 0s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 3m 24s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 21s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 0m 22s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ javac 0m 22s /patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ compile 0m 22s /patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌ javac 0m 22s /patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt The patch fails to run checkstyle in hadoop-aws
-1 ❌ mvnsite 0m 22s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ javadoc 0m 5s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.
-1 ❌ javadoc 0m 20s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.
-1 ❌ spotbugs 0m 22s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 3m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 22s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+0 🆗 asflicense 0m 22s ASF License check generated no output?
50m 21s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/artifact/out/Dockerfile
GITHUB PR #6407
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux 2f4a590eb200 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c1c72d9
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/testReport/
Max. process+thread count 77 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6407/16/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus May 16, 2024
@apache apache deleted a comment from hadoop-yetus May 16, 2024
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, been busy and neglected this

I've got some minor comments and then once yetus is happy we can merge.
Note that in #6789 I'm adding some case insensitive way to resolve a list of enum values in a config, for now the work you've done is the only way to be case insensitive...pity.

now, I wonder what is going to break on stores without this feature?

what happens on S3 Express stores?

I've proposed how to bind to the existing skipIfStorageClassTestsDisabled(conf) call which will skip the test during setup.

LOG.warn("Invalid value for the config {} is set. Valid values are:" +
"READ_ALL, SKIP_ALL_GLACIER, READ_RESTORED_GLACIER_OBJECTS. Defaulting to READ_ALL",
READ_RESTORED_GLACIER_OBJECTS);
s3ObjectStorageClassFilter = S3ObjectStorageClassFilter.READ_ALL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets fall to the default. maybe pull the conf.getTrimmed() out of the try {} so it's value can be printed too.

FWIW in #6789 I'm doing a getEnumSet() which is case independent too.

try {
s3ObjectStorageClassFilter = Optional.of(conf.getTrimmed(READ_RESTORED_GLACIER_OBJECTS,
DEFAULT_READ_RESTORED_GLACIER_OBJECTS))
.map(String::toUpperCase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Locale.ROOT)

}

/**
* Returns the filter function set as part of the enum definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a trailing .

@@ -117,6 +117,8 @@ public class StoreContext implements ActiveThreadSpanSource<AuditSpan> {
/** Is client side encryption enabled? */
private final boolean isCSEEnabled;

private final S3ObjectStorageClassFilter s3ObjectStorageClassFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: javadocs

@@ -927,8 +927,34 @@ The switch to turn S3A auditing on or off.
Should auditing of S3A requests be enabled?
</description>
</property>
```
## <a name="glacier"></a> Glacier Object Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a newline. thanks

import java.util.Arrays;
import java.util.Collection;

import org.apache.hadoop.fs.s3a.S3AFileSystem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the apache group


private FileSystem createFileSystem(String s3ObjectStorageClassFilter) throws Throwable {
Configuration conf = createConfiguration();
conf.set(READ_RESTORED_GLACIER_OBJECTS, s3ObjectStorageClassFilter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert line

    skipIfStorageClassTestsDisabled(conf);

(and matching import

import static org.apache.hadoop.fs.s3a.S3ATestUtils.skipIfStorageClassTestsDisabled;

See ITestS3AStorageClass for an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants