Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Track number of failure in batching when Batcher#close is called #800

Merged
merged 11 commits into from Nov 4, 2019

Conversation

rahulKQL
Copy link
Contributor

@rahulKQL rahulKQL commented Oct 9, 2019

This change will enable Batcher to throw BatchingException when the user will close the batching connection. BatcherImpl maintains total numOfFailure, count of ExceptionTypes and count of StatusCode in case ApiException has occurred.

/ cc: @igorbernstein2 , @kolea2

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Oct 9, 2019
@codecov
Copy link

codecov bot commented Oct 9, 2019

Codecov Report

Merging #800 into master will increase coverage by 0.35%.
The diff coverage is 97.75%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #800      +/-   ##
============================================
+ Coverage     78.41%   78.76%   +0.35%     
- Complexity     1105     1126      +21     
============================================
  Files           198      200       +2     
  Lines          4897     4983      +86     
  Branches        385      398      +13     
============================================
+ Hits           3840     3925      +85     
- Misses          887      888       +1     
  Partials        170      170
Impacted Files Coverage Δ Complexity Δ
...com/google/api/gax/batching/BatchingException.java 100% <100%> (ø) 1 <1> (?)
.../java/com/google/api/gax/batching/BatcherImpl.java 98% <100%> (+0.79%) 16 <0> (+2) ⬆️
...java/com/google/api/gax/batching/BatcherStats.java 97.4% <97.4%> (ø) 18 <18> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e46e4f...60705f7. Read the comment docs.

Now BatchStats will keep the counter for each type of exception happened at RPC as well as ElementT/entry object level.

Also refactored exception message to be more detailed.
@rahulKQL
Copy link
Contributor Author

rahulKQL commented Oct 21, 2019

@igorbernstein2 Thanks again for explaining this to me.

I believe It would be quite hard to distinguish if the received entry failure for recording is just partially failed or complete batch has been failed. According to current change, If a single batch fails then it will increment partial failure error count for all the elements present in that batch.

With the latest commit, Now we are logging partial failures only if the RPC is successful(If not then we are providing only batch failure cause & count).

Please have a look and let me know your thoughts.

@igorbernstein2
Copy link
Contributor

I think we can simplify this a bit more:

  • lets remove the generic ApiFunctions from BatchStats
  • add 2 methods:
    • recordBatchElementsCompletion(List batchElementResultFutures)
    • recordBatchFailure(Throwable)
  • Batch#onBatchFailure can invoke recordBatchFailure() and record it as a full batch failure
  • Batch#onSuccess can invoke recordBatchElementsCompletion, which can iterate over the futures. If any failed the partial batch failures I incremented once per list and the entry failure is incremented once per failed entry

Addressed feedback comments to simplify BatcherStats
@@ -176,6 +177,7 @@ public void onSuccess(ResponseT response) {
@Override
public void onFailure(Throwable throwable) {
try {
batcherStats.recordBatchFailure(throwable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we push this down into Batch to make it symmetrical with onBatchSuccess?

Copy link
Contributor Author

@rahulKQL rahulKQL Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried with the same approach, but If some exception occurs while post-processing in splitResponse() then we would end up with 1 batch and n partial failures even though some elements in the batch could be successful.

void onBatchSuccess(ResponseT response) {
try {
descriptor.splitResponse(response, results);
} catch (Exception ex) {
onBatchFailure(ex);
}
}
void onBatchFailure(Throwable throwable) {
try {
descriptor.splitException(throwable, results);
} catch (Exception ex) {
for (SettableApiFuture<ElementResultT> result : results) {
result.setException(ex);
}
}
}

I thought that would be a misleading figure, that why I kept the BatcherStats#recordBatchFailure here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to keep the code symmetrical and then have a follow up PR that handles failures from splitResponse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Updated this change with symmetrical code and added a TODO as well for splitResponse().

final AtomicBoolean markBatchFailure = new AtomicBoolean();

for (ApiFuture<T> elementResult : batchElementResultFutures) {
ApiFutures.addCallback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think try/catch would be easier to read:

for (ApiFuture<T> elementFuture : batchElementResultFutures) {
  try {
    elementFuture.get();
  } catch (Throwable t) {
    ....
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove the atomic above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, you meant to block and check if any entry is being failed after splitResponse() is executed. I believe going with this would create a lot of blocking time until each entry in the batch finishes applying the mutation. If possible I would like to avoid that.

I hope after the last commit, this method looks better, Can you please take a look and let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the element results should be resolved at this point, so nothing should block. If an entry is not resolved at this point then it never will be. This would be due to a bug in the splitResponse. We can address this issue in a follow up PR that would iterate over all of the element results and sets an IllegalStateException for unresolved entries

@@ -258,6 +268,7 @@ void onBatchSuccess(ResponseT response) {

void onBatchFailure(Throwable throwable) {
try {
batcherStats.recordBatchFailure(throwable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come after splitException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shifted batcherStats.recordBatchFailure(throwable); outside try/catch as we would be record failure in either of the case. I think Batcher#recordBatchFailure is just keeping a track of element so it is no needed to the inside try block.

- Refactored record methods in BatcherImpl.Batch.
- Removed callback and implemented try/catch for entry failures.
- Made all three default method synchronized and removed lock as all the content of method needed to be inside lock.
- Fixed test case and added a todo for followUp PR.
Copy link
Contributor

@igorbernstein2 igorbernstein2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for a minor nit

@vam-google or @kolea2 can you take a pass?

@igorbernstein2 igorbernstein2 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 29, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 29, 2019
Code code = ((ApiException) actualCause).getStatusCode().getCode();
exceptionClass = ApiException.class;

int oldExceptionCount = MoreObjects.firstNonNull(entryStatusCounts.get(code), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be oldStatusCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, I have updated it now.

@rahulKQL rahulKQL requested a review from kolea2 October 29, 2019 15:31
@rahulKQL rahulKQL added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 29, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 29, 2019

if (ApiException.class.equals(request.getKey())) {
keyValue.append("(");
Iterator<Map.Entry<Code, Integer>> statusItrator = statusCounts.entrySet().iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: statusIterator

*/
private String printKeyValue(
Map<Class, Integer> exceptionCounts, Map<Code, Integer> statusCounts) {
StringBuilder keyValue = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard, but keyValue doesn't really describe this variable. Maybe messageBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, Have updated the same.

@igorbernstein2 igorbernstein2 merged commit 782f36f into googleapis:master Nov 4, 2019
@kolea2 kolea2 mentioned this pull request Nov 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants