Does nghttpx ingress intercept errors? #82

ingridgoh · 2018-03-07T01:49:21Z

Hello,

I currently have tensorflow serving deployed in a container and I've noticed that where there are any prediction errors, the actual error stack is not returned to the client when using nghttpx ingress. The following are my observations (all aspects/environment is kept constant except for the usage of an intermediate ingress):

1. Client Request --> Load Balancer --> Ingress --> Container (Tensorflow-serving)
Observation: Error is obscured from the client, a generic error message is received
Error Received:
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INTERNAL, details="Received RST_STREAM with error code 2")

2. Client Request --> Load Balancer --> Container (Tensorflow-serving)
Observation: Detailed error stack is returned to client
Error Received:
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Matrix size-incompatible: In[0]: [3592,10], In[1]: [3592,10]
[[Node: MatMul = MatMul[T=DT_FLOAT, _output_shapes=[[?,10]], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_x_0_0, Variable/read)]]")

Thank you!

tatsuhiro-t · 2018-03-07T08:03:52Z

Could you provide a way how to reproduce this, for example, using https://github.com/tensorflow/serving/tree/master/tensorflow_serving/example ?

ingridgoh · 2018-03-08T01:51:55Z

The error mentioned was a failed inference query against a DNN model. However you do not need to replicate the exact error that I have received since all error thrown by TF-Serving server will result in the "Received RST_STREAM with error code 2" error if the ingress is used. You could take https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_client.py as an example and tweak the script such that a random matrix is sent in the request instead of an image (Please note that I'm a complete novice at this):

e.g.:

rand_array = np.random.rand(10, 3592)
request = predict_pb2.PredictRequest()
request.model_spec.name = MODEL_NAME
request.model_spec.signature_name = 'predict_images'
request.inputs['inputs'].CopyFrom(
    tf.contrib.util.make_tensor_proto(rand_array, dtype=tf.float32)
)

Here's a simple architectural diagram for my setup:

tatsuhiro-t · 2018-03-09T01:26:25Z

I tried to reproduce the issue with the following patch to tensorflow_serving/example/mnist_client.py:

diff --git a/tensorflow_serving/example/mnist_client.py b/tensorflow_serving/example/mnist_client.py
index 947f7c4..93f1e91 100644
--- a/tensorflow_serving/example/mnist_client.py
+++ b/tensorflow_serving/example/mnist_client.py
@@ -146,8 +146,9 @@ def do_inference(hostport, work_dir, concurrency, num_tests):
     request.model_spec.name = 'mnist'
     request.model_spec.signature_name = 'predict_images'
     image, label = test_data_set.next_batch(1)
+    rand_array = numpy.random.rand(10, 3592)
     request.inputs['images'].CopyFrom(
-        tf.contrib.util.make_tensor_proto(image[0], shape=[1, image[0].size]))
+        tf.contrib.util.make_tensor_proto(rand_array, dtype=tf.float32))
     result_counter.throttle()
     result_future = stub.Predict.future(request, 5.0)  # 5 seconds
     result_future.add_done_callback(

But, I got the same error messages with, or without proxy:

AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Matrix size-incompatible: In[0]: [10,3592], In[1]: [784,10]
[[Node: MatMul = MatMul[T=DT_FLOAT, _output_shapes=[[?,10]], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_x_0_0, Variable/read)]]")

Which version of Ingress controller are you using? It is worth to try the latest version.

ingridgoh · 2018-03-09T08:44:35Z

I was using v0.28.0. I've updated the controller to 0.31.0 but the same behaviour still occurs. The following are the exact errors I have received:

With ingress:

$ python 1_non_mlpkit_our_data.py
Traceback (most recent call last):
  File "1_non_mlpkit_our_data.py", line 94, in <module>
    print stub.Predict(request, 120)
  File "/Users/setup/virtualenv/lib/python2.7/site-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
    self._request_serializer, self._response_deserializer)
  File "/Users/setup/virtualenv/lib/python2.7/site-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
    raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INTERNAL, details="Received RST_STREAM with error code 2")

Without ingress

$ python 1_non_mlpkit_our_data.py
Traceback (most recent call last):
  File "1_non_mlpkit_our_data.py", line 94, in <module>
    print stub.Predict(request, 120)
  File "/Users/setup/virtualenv/lib/python2.7/site-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
    self._request_serializer, self._response_deserializer)
  File "/Users/setup/virtualenv/lib/python2.7/site-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
    raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="Matrix size-incompatible: In[0]: [3592,10], In[1]: [3592,10]
	 [[Node: MatMul = MatMul[T=DT_FLOAT, _output_shapes=[[?,10]], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_x_0_0, Variable/read)]]")

ingridgoh mentioned this issue Mar 12, 2018

Regarding nghttpx window size nghttp2/nghttp2#1145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does nghttpx ingress intercept errors? #82

Does nghttpx ingress intercept errors? #82

ingridgoh commented Mar 7, 2018

tatsuhiro-t commented Mar 7, 2018

ingridgoh commented Mar 8, 2018

tatsuhiro-t commented Mar 9, 2018

ingridgoh commented Mar 9, 2018

Does nghttpx ingress intercept errors? #82

Does nghttpx ingress intercept errors? #82

Comments

ingridgoh commented Mar 7, 2018

tatsuhiro-t commented Mar 7, 2018

ingridgoh commented Mar 8, 2018

tatsuhiro-t commented Mar 9, 2018

ingridgoh commented Mar 9, 2018