Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce the correct order of responses. Handle non-idempotent requests. #660

Merged
merged 67 commits into from
Mar 3, 2017

Conversation

keshonok
Copy link
Contributor

@keshonok keshonok commented Dec 5, 2016

No description provided.

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have couple of questions needed to be clarified.

My IDE showed a bunch of errors in the patch, like as comparing signed to unsigned, shortening 64 bit types to 32 and dropping const qualifier. I marked that lines, but there still can be more of them.

@@ -81,7 +81,8 @@ static struct {

TfwHttpReq *req;
TfwHttpResp *resp;
TfwConnection connection;
TfwConnection conn_req;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment with spaces instead of tabs

if (unlikely(tfw_connection_restricted(ch->conn))
|| unlikely(tfw_server_queue_full(ch->conn))
|| unlikely(!tfw_connection_live(ch->conn)))
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw_sched_hash_get_srv_conn() can schedule message to connections with non-idempotent requests. Is tfw_connection_hasnip() check missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of non-idempotent requests in a forwarding queue does not prohibit sending though the connection. It's just it may be a little while before this particular message is sent out, as Tempesta has to wait for responses to non-idempotent requests in the queue before forwarding more requests.

The goal for a hash scheduler is to always send over the same hash-bound connection. So how do you suggest we skip connections with non-idempotent requests in them? Do you want the algorithm to "wait out" looping in hopes that the resulting connection gets available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i got it, you right. Speeding up by skipping connections with non-idempotent requests in them is not a priority for hash scheduler. Didn't think in that way.

continue;
if (skipnip && tfw_connection_hasnip(conn)) {
if (likely(tfw_connection_live(conn)))
nipconn++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just save first connection with non-idempotent requests and return it if better candidate is not found? That won't require to rerun the whole cycle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good question and the answer should be commented for the function. RR scheduler must be the fastest one. Usually the optimistic approach gives us the fastest solution: we're optimistic in assumption that there are not too many non-idempotent request or there are enough server connections. So we can save two CMPXCHG instructions on getting and putting the saved connection in quick path.

int c, s, i;
TfwConnection *conn;
unsigned long idx;
int c, s, skipnip = 1, nipconn = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug here due to limitation of possible values, but c and s should be type of size_t, since they are compared to conn_n and srv_n.

BUG_ON(!loc);
BUG_ON(!arg);

for (i = 0; i < loc->nipdef_sz; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Wsing-compare: warning: comparison of integers of different signs: 'int' and 'unsigned int'

};

/* Mappings for HTTP request methods. */
static const TfwCfgEnum const __read_mostly tfw_method_enum[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated const qualifiers.

@@ -734,7 +971,14 @@ static TfwCfgSpec tfw_location_specs[] = {
.allow_repeat = true,
.cleanup = tfw_cleanup_locache
},
{}
{
"nonidempotent", NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New configuration option is added, but readme and configuration example do not mention it

@@ -734,7 +971,14 @@ static TfwCfgSpec tfw_location_specs[] = {
.allow_repeat = true,
.cleanup = tfw_cleanup_locache
},
{}
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment with spaces, not tabs


/* The method: one of GET, PUT, POST, etc. in form of a bitmask. */
in_method = (char *)ce->vals[0];
ret = tfw_cfg_map_enum(tfw_method_enum, in_method, &method);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the RFC 7231 4.2:

Of the request methods defined by this specification, the GET, HEAD, OPTIONS, and 
TRACE methods are defined to be safe.
...
Of the request methods defined by this specification, PUT, DELETE, and safe request
methods are idempotent.

Maybe we should add new enum with methods that could be non-idempotent?

* Forwarding to a server is considered to be on hold after
* a non-idempotent request is forwarded to the server. The hold
* is removed when the holding non-idempotent request is followed
* by another request from the same client, which enables pipelining.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that suits RFC 7230 6.3.2?

A user agent SHOULD NOT pipeline requests after a non-idempotent method, 
until the final response status code for that method has been received, unless 
the user agent has a means to detect and recover from partial failure conditions 
involving the pipelined sequence.

As i got it, we cannot remove hold untill request was answered

Copy link
Contributor Author

@keshonok keshonok Dec 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. As it's said in the cited excerpt from the RFC, a user agent should not pipeline after a non-idempotent method. However, if a user agent has a means to detect and recover from from partial failure conditions involving the pipelined sequence, then it may still pipeline requests after a non-idempotent method. So if another message is received from a client before a response comes to a non-idempotent request, that means that the client knows what it's doing and can recover from failure if necessary. Effectively, that removes the requirement of waiting for a response to a non-idempotent request and makes it possible to continue the pipelining.

.allow_repeat = false,
.cleanup = tfw_clean_srv_groups,
},
{ 0 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add description of the options to README or better create a server configuration page at the Wiki and put description of the fields there.

.allow_none = true,
.allow_repeat = true,
.cleanup = tfw_cleanup_locache
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add description of the configuration option to README/Wiki

* FIXME: The limit on the number of reconnect attempts is used
* to re-schedule requests that would never be forwarded otherwise.
* Still, attempts to reconnect may be continued in hopes that the
* connection will be established sooner or later. Otherwise thei
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"thei" -> "the"?

*/
static const unsigned long timeouts[] = { 1, 10, 100, 250, 500, 1000 };

if (unlikely(srv_conn->max_attempts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but '0' treated as 'unlimited' is useful for a user, but it's better for computer to initialize max_attempts to ~0 such that you don't need extra check for zero.

struct timer_list timer;
TfwMsg *msg;
TfwMsg *msg_sent; /*srv*/
TfwMsg *msg_resent; /*srv*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you have to comment the members as srv, the better place for them in TfwSrvConnection. Otherwise you just break classes hierarchy: TfwConnection is generic class handling members common for server and client connections. Also keep in mind that we can handle millions of client connections, so it's very undesired to extend TfwConnection, while we hare relatively small number of server connections.

*/
static TfwConnection *
tfw_sched_rr_get_srv_conn(TfwMsg *msg, TfwSrvGroup *sg)
{
int c, s, i;
TfwConnection *conn;
unsigned long idx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't rename simple and widespread name to something more complex and long. i is widespread identifier for index and is frequently used as a loop iterator.

continue;
if (skipnip && tfw_connection_hasnip(conn)) {
if (likely(tfw_connection_live(conn)))
nipconn++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good question and the answer should be commented for the function. RR scheduler must be the fastest one. Usually the optimistic approach gives us the fastest solution: we're optimistic in assumption that there are not too many non-idempotent request or there are enough server connections. So we can save two CMPXCHG instructions on getting and putting the saved connection in quick path.

* Initially, connections with non-idempotent requests are also skipped
* in attempt to increase throughput. However, if all live connections
* contain non-idempotent requests, then re-run the algorithm and get
* the first live connection as it is usually done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is non trivial thing. Please add this to README/Wiki as unique property of RR scheduler.

@@ -433,7 +433,7 @@ ss_droplink(struct sock *sk)
int
__ss_close(struct sock *sk, int flags)
{
if (unlikely(!sk))
if (unlikely(!(sk && ss_sock_live(sk))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we close for example half-open or half-closed sockets? I believe we should be able to close such sockets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to avoid calling close() on opposite courses. One is a close() that comes from Tempesta, and the other is a close() that comes from the other side of the connection. Both would call connection_drop(), and that should not be happening. So if the socket that we want to close is not in established state, then it must be that it's being closed already (apparently on a close request from the other side).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP socket can normally operate in half-closed state for relatively long time. Meantime, the issue of concurrent closing from both sides should be resolved by proper synchronization and reference counting. I'm working on it in context of #116 and #254 (https://github.com/tempesta-tech/tempesta/tree/ak-shutdown).

TFW_ERR("conn_init() hook returned error\n");
return r;
}

/* Let schedulers use the connection hereafter. */
tfw_connection_revive(conn);

/* Repair the connection is necessary. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is" -> "if"

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the QoS logic as in the comment.

* scheduling work flow. When the first request in a session is
* scheduled by the generic logic, TfwSession->srv_conn must be
* initialized to point at the appropriate TfwConnection, so that
* all subsequent session hits are scheduled much faster.
*/
srv_conn = tfw_sched_get_srv_conn((TfwMsg *)req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider this article about QoS. It has sense to add new configuration option for server group which specifies the queue size. If all the server connections are busy (tfw_server_queue_full() returns true) we should put a request to the new queue of configured size. If srv_conn->qsize decrements, then we can schedule next request from head of the new queue. If the new queue is full, then just do as currently - send error response to a client (however, it should be 50x response instead of 40x response.

Also please add sizes of introduced queues (and maybe some existing queues) to /proc/tempesta/perfstat. The queue sizes are very important performance characteristics.

* @seq_queue - queue of client's messages in the order they came;
* @fwd_qlock - lock for accessing @fwd_queue and @nip_queue;
* @seq_qlock - lock for accessing @seq_queue;
* @ret_qlock - lock for accessing @ret_queue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems outdated comment: there is no ret_queue.

tfw_http_send_500(req);
TFW_INC_STAT_BH(clnt.msgs_otherr);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that code complexity grows significantly with the patch. Maybe we win something from the optimization for the one message case, but lets rather keep the code simpler. There are a lot of lists and locks and I think we should pay more attention to optimization of normal work flow: now it seems very complex and probably we find a way to optimize it. Hopefully it'd be easier with smaller and simpler code.

* from the client in the @seq_queue will be handled at
* the time the client connection is released.
*/
if (!tfw_connection_live(cli_conn)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to do this at later state https://github.com/tempesta-tech/tempesta/blob/ak-shutdown/tempesta_fw/sock.c#L282 , leaving more probability that the socket is alive before we enqueue it in ss work queue.

@@ -433,7 +433,7 @@ ss_droplink(struct sock *sk)
int
__ss_close(struct sock *sk, int flags)
{
if (unlikely(!sk))
if (unlikely(!(sk && ss_sock_live(sk))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP socket can normally operate in half-closed state for relatively long time. Meantime, the issue of concurrent closing from both sides should be resolved by proper synchronization and reference counting. I'm working on it in context of #116 and #254 (https://github.com/tempesta-tech/tempesta/tree/ak-shutdown).

Limit the number of reconnect attempts, and use the initial timeout
between the attempts. The timeout still grows exponentially as the
number of attempts increases.
The "on hold" state is now derived on the fly as it can be dynamic.
If another request comes from a client after an non-idempotent
request, then the client knows what it is doing, and these requests
can be pipelined. In that case remove the flag of non-idempotency
from the preceding request.
Non-idempotent requests make up an internal @nip_queue within the
server's fwd_queue. @nipcnt keeps the count of those requests in
@nip_queue that can be used by schedulers when making decision.
Special care is taken for the case where a non-idempotent request
is followed by another requests, which re-enables pipelining of
those messages.
This commit implements the logic of re-sending requests that were
in the server connection's queue when the connection failed. When
the connection is restored, it's not scheduled until all requests
in the forwarding queue are re-sent or sent to the server.
Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please DO EXHAUSTIVE STRESS TESTING AT THE TESTBED, in particlular #383 must be tested with different configurations and load generators.

@@ -528,14 +528,14 @@ static inline void
__tfw_apm_state_next(TfwPcntRanges *rng, TfwApmRBEState *st)
{
int i = st->i, r, b;
unsigned short rtime;
unsigned short rtt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rtt" means "round trip time"? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed it's "round trip time".

@@ -935,24 +935,24 @@ tfw_cache_purge_method(TfwHttpReq *req)

/* Deny PURGE requests by default. */
if (!(cache_cfg.cache && vhost->cache_purge && vhost->cache_purge_acl))
return tfw_http_send_403(req, "unconfigured purge request");
return tfw_http_send_403(req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an inaccurate merge. We need the reason for requests blocking, see #658

@@ -88,6 +88,7 @@
#include <linux/kernel.h>
#include <linux/moduleparam.h>
#include <linux/vmalloc.h>
#include <net/net_namespace.h> /* for sysctl */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like inaccurate merge - now sysctls are in main.c. I also noticed that cfg.c has several comment with sysctl - they are also good to delete, but I think it's better to do in #51.

#define tfw_conn_hook_call(proto, c, f, ...) \
conn_hooks[proto]->f ? conn_hooks[proto]->f(c, ## __VA_ARGS__) : 0
#define TFW_CONN_HOOK_CALL(c, f...) \
tfw_conn_hook_call(TFW_CONN_TYPE2IDX(TFW_CONN_TYPE(c)), c, f)

/*
* Tell if a server connection connection is restricted. A restricted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double "connection"

tfw_cli_conn_get(TfwCliConn *cli_conn)
{
tfw_connection_get((TfwConn *)cli_conn);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed empty line

if (atomic_read(&conn->refcnt) == TFW_CONN_DEATHCNT)
tfw_connection_release(conn);
else
ret = ss_close_sync(conn->sk, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good for me.

@@ -4,25 +4,33 @@

# TAG: sched.
#
# Specifies the scheduler used to distribute load among servers
# Specifies the scheduler used to distribute the load among servers within
# a group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space at the end of the line.

* @refcnt - number of users of the connection structure instance;
* @timer - The keep-alive/retry timer for the connection;
* @msg - message that is currently being processed;
* @peer - TfwClient or TfwServer handler;
* @sk - an appropriate sock handler;
* @destructor - called when a connection is destroyed;
* @forward - called when a request is forwarded to server;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no @forward member in the definition.

}

/*
* Put @req on the list of non-idempotent requests in @srv_conn.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space at the end of the line

*/
static inline bool
tfw_srv_conn_need_resched(TfwSrvConn *srv_conn)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many spaces at the end of line

"an available back end server";
static const char *s_reason_req_common = "request dropped: processing error";
static const char *s_reason_resp_common = "response dropped: processing error";
static const char *s_reason_resp_filter = "response dropped: filtered out";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We define constants in capital case. Secondly, do we really need the constants? Many of them are used only once and basically there is no strict requirements to print exactly the same error messages. Using the constants without visible need just makes the code harder to read since now we have to jump from tfw_http_send_*() to definitions of the strings.

@@ -523,7 +540,7 @@ tfw_http_req_move2equeue(TfwSrvConn *srv_conn, TfwHttpReq *req,
* request and then sent to the client in proper seq order.
*/
static void
tfw_http_req_zap_error(struct list_head *equeue)
tfw_http_req_zap_error(struct list_head *equeue, const char *source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is always called with s_source_proxy as source. We've already discussed that at this iteration we do not propagate the error strings to upper layers. Typically it's better to show bit vague error message, rather than mangle whole the code with additional function arguments. The alternate approach is to keep source in some global state variable. I.e. if we're entering cache workflow, then we assign "proxy cache" to it, but it we forward a message, then we use "proxy forward". Thus please back to original version of the code, when tfw_http_send_*() calls only are aware about error strings.

tfw_http_req_init_ss_flags(TfwSrvConn *srv_conn, TfwHttpReq *req)
{
TfwSrvGroup *sg = ((TfwServer *)(srv_conn->peer))->sg;
if (tfw_cache_msg_cacheable(req) || (req->retries < sg->max_refwd))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to optimize this rare case and we can just always copy skb. It's bad, but we'll address this in #391 and #488.

- Eliminate the "source" part of an error message. Not needed now.
- Eliminate static global constant strings. Use strings in place.
- Rename tfw_http_req_move2equeue() to a shorter tfw_http_req_error().
- tfw_http_req_init_ss_flags() now set to copy all SKBs. Make a TODO.
- Better comment to tfw_http_conn_evict_timeout().
@krizhanovsky
Copy link
Contributor

Good to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants