New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1303] refactor(client): using the configOption in RssTezConfig #1413
base: master
Are you sure you want to change the base?
Conversation
d1b1f26
to
529c34c
Compare
I like this PR that refactor the client config. Could you help fix the failed test cases |
Could you help review this? @zhengchenyu |
Ok, I will fix the failed test cases recently. |
1869dd1
to
7498984
Compare
ping @zhengchenyu |
appId.toString(), sessionToken, conf, appId.toString(), client, null); | ||
appId.toString(), | ||
sessionToken, | ||
new TezClientConf(conf), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we construct TezClientConf directly? Then we will ignore conversion process!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs a way to init and read configuration, like ShuffleServerConf 、DashboardConf、CoordinatorConf read config from fileName:
public ShuffleServerConf(String fileName) {
super();
boolean ret = loadConfFromFile(fileName);
if (!ret) {
throw new IllegalStateException("Fail to load config file " + fileName);
}
}
public DashboardConf(String fileName) {
super();
boolean ret = loadConfFromFile(fileName);
if (!ret) {
throw new IllegalStateException("Fail to load config file " + fileName);
}
}
public CoordinatorConf(String fileName) {
super();
boolean ret = loadConfFromFile(fileName);
if (!ret) {
throw new IllegalStateException("Fail to load config file " + fileName);
}
}
And the only difference is TezClientConf init from configuration。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means that new TezClientConf() without any parameter, then set TEZ_RSS_TEST_MODE_ENABLE.
Then TezClientConf without any parameter will be zero cost.
Here we must load configuration, I think it is not neccessary. Because the configuration in TezClientConf(new Configuration) doesn't pass any useful config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think set TEZ_RSS_TEST_MODE_ENABLE should be obvious, and not like hidden in TezClientConf().
The normal case should read config from client side by Configuration, and convert to TezClientConfig, so there need TezClientConfig(new Configuration()).
And no need new TezClientConfig(), because if so, there will be default config and meanless.
So, I changed the test case to be:TezClientConf(new Configuration()), new Configuration() just like conf from client side.
client-tez/src/main/java/org/apache/tez/common/TezClientConf.java
Outdated
Show resolved
Hide resolved
client-tez/src/main/java/org/apache/tez/common/TezClientConf.java
Outdated
Show resolved
Hide resolved
return loadConfFromHadoopConfig(config, ConfigUtils.getAllConfigOptions(TezClientConf.class)); | ||
} | ||
|
||
public RssConf toRssConf() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TezClientConf is extended from RssBaseConf, RssBaseConf is extended from RssConf. So is this method still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary.
public RssConf toRssConf() {
RssConf rssConf = new RssConf();
for (Map.Entry<String, Object> entry : getAll()) {
String key = entry.getKey();
if (!key.startsWith(TEZ_RSS_CONFIG_PREFIX)) {
continue;
}
key = key.substring(TEZ_RSS_CONFIG_PREFIX.length());
rssConf.setString(key, String.valueOf(entry.getValue()));
}
return rssConf;
}
Between the conversion, to RssConf, removed prefex. like:
key = key.substring(TEZ_RSS_CONFIG_PREFIX.length());
rssConf.setString(key, String.valueOf(entry.getValue()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you!
But I don't know the design of these issue. How about RssClientConfig?
Maybe RssClientConfig should be extended from RssConf or RssBaseConf. RssTezConfig should be extended from RssClientConfig.
The the config key in RssXXXConfig should not be start with "tez", and the config key in hadoop Configuration should be start with "tez". That's my understanding, maybe it is wrong.
What do you think? @lifeSo
@zuston Can you give us some advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In now design, the config in Tez client, start with "tez.",
similar, the config in Spark client, start with "spark.".
And there is also leading substring in RssSparkConfig:
public static RssConf toRssConf(SparkConf sparkConf) {
RssConf rssConf = new RssConf();
for (Tuple2<String, String> tuple : sparkConf.getAll()) {
String key = tuple._1;
if (!key.startsWith(SPARK_RSS_CONFIG_PREFIX)) {
continue;
}
key = key.substring(SPARK_RSS_CONFIG_PREFIX.length());
rssConf.setString(key, tuple._2);
}
return rssConf;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some tez config not start with "tez.", it will be confusing because some default tez config is start with 'tez.'.
eg:
@@ -195,15 +190,15 @@ protected FetchResult callInternal() throws Exception { | |||
.hadoopConf(hadoopConf) | |||
.idHelper(new TezIdHelper()) | |||
.expectedTaskIdsBitmapFilterEnable(expectedTaskIdsBitmapFilterEnable) | |||
.rssConf(RssTezConfig.toRssConf(this.conf))); | |||
.rssConf(this.conf.toRssConf())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with toRssConf. Can we pass this.conf directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not, the RSS_STORAGE_TYPE in TezClientConf is 'tez.rss.storage.type',
public static final ConfigOption<String> RSS_STORAGE_TYPE =
ConfigOptions.key("tez.rss.storage.type")
.stringType()
.defaultValue("MEMORY_LOCALFILE")
.withDescription("");
But in ShuffleReadClientImpl.java, when get storageType, it call:
final String storageType = builder.getRssConf().get(RssClientConf.RSS_STORAGE_TYPE);
The problem is RssClientConf.RSS_STORAGE_TYPE
is:
public static final ConfigOption<String> RSS_STORAGE_TYPE =
ConfigOptions.key("rss.storage.type")
.stringType()
.defaultValue("")
.withDescription("Supports MEMORY_LOCALFILE, MEMORY_HDFS, MEMORY_LOCALFILE_HDFS");
There is no prefix : 'tez.' .
So, there need the conversion to delete prefix: 'tez'.
It is a great PR! The code about config will be cleaner! |
LGTM. It looks the memory will not consume much more. If I'm wrong, feel free to correct. |
7fc7d6d
to
79a08b2
Compare
I summarized the usage of TezClientConf, and why need the transitions:
|
79a08b2
to
c1bb5bc
Compare
c1bb5bc
to
368e632
Compare
cc @zhengchenyu I'm not familiar with Tez code. |
This PR does not involve the core logic of Tez. This PR is only about how the MR/Tez/Spark client uses configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emm... I take a look about current impl, I don't think we should take the tez prefix into the config key, you can refer the spark's RssClientConf .
In spark's partial configOption, we just extract the spark's conf like spark.rss.client.type into RssConf, that means we could share some general configOption defined in the RssClientConf. So you just only create the TezClientConfig.java, which could define some tez specified configOptions, shareable configs could use defined in RssClientConf. cc @zhengchenyu
@zuston I agree with you! |
What changes were proposed in this pull request?
Improvement: using the configOption in RssTezConfig #1303
Why are the changes needed?
Fix: #1303
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test