All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] libceph: send osd requests in tid order
@ 2013-03-26  2:25 Alex Elder
  2013-03-26  2:27 ` [PATCH 1/6] libceph: slightly defer registering osd request Alex Elder
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:25 UTC (permalink / raw)
  To: ceph-devel

This series rearranges the way osd requests are placed onto an
osd client's unsent request list so that they are kept in
order based on their transaction ids.  The osd expects its
requests from a client to have monotonically increasing tids.
Since requests are sent to the osd in the order they are on
the unsent list if we keep them sorted there this property
is preserved.

					-Alex

[PATCH 1/6] libceph: slightly defer registering osd request
[PATCH 2/6] libceph: no more kick_requests() race
[PATCH 3/6] libceph: prepend requests in order when kicking
[PATCH 4/6] libceph: keep request lists in tid order
[PATCH 5/6] libceph: send queued requests when starting new one
[PATCH 6/6] libceph: verify requests queued in order

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] libceph: slightly defer registering osd request
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
@ 2013-03-26  2:27 ` Alex Elder
  2013-03-26  2:27 ` [PATCH 2/6] libceph: no more kick_requests() race Alex Elder
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:27 UTC (permalink / raw)
  To: ceph-devel

One of the first things ceph_osdc_start_request() does is register
the request.  It then acquires the osd client's map semaphore and
request mutex and proceeds to map and send the request.

There is no reason the request has to be registered before acquiring
the map semaphore.  So hold off doing so until after the map
semaphore is held.

Since register_request() is nothing more than a wrapper around
__register_request(), call the latter function instead, after
acquiring the request mutex.

That leaves register_request() unused, so get rid of it.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 7041906..f9276cb 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -831,14 +831,6 @@ static void __register_request(struct
ceph_osd_client *osdc,
 	}
 }

-static void register_request(struct ceph_osd_client *osdc,
-			     struct ceph_osd_request *req)
-{
-	mutex_lock(&osdc->request_mutex);
-	__register_request(osdc, req);
-	mutex_unlock(&osdc->request_mutex);
-}
-
 /*
  * called under osdc->request_mutex
  */
@@ -1785,8 +1777,6 @@ int ceph_osdc_start_request(struct ceph_osd_client
*osdc,
 	ceph_osdc_msg_data_set(req->r_reply, &req->r_data_in);
 	ceph_osdc_msg_data_set(req->r_request, &req->r_data_out);

-	register_request(osdc, req);
-
 	down_read(&osdc->map_sem);
 	mutex_lock(&osdc->request_mutex);
 	/*
@@ -1794,6 +1784,7 @@ int ceph_osdc_start_request(struct ceph_osd_client
*osdc,
 	 * while we dropped request_mutex above, so only send now if
 	 * the request still han't been touched yet.
 	 */
+	__register_request(osdc, req);
 	if (req->r_sent == 0) {
 		rc = __map_request(osdc, req, 0);
 		if (rc < 0) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] libceph: no more kick_requests() race
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
  2013-03-26  2:27 ` [PATCH 1/6] libceph: slightly defer registering osd request Alex Elder
@ 2013-03-26  2:27 ` Alex Elder
  2013-03-26  2:27 ` [PATCH 3/6] libceph: prepend requests in order when kicking Alex Elder
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:27 UTC (permalink / raw)
  To: ceph-devel

Since we no longer drop the request mutex between registering and
mapping an osd request in ceph_osdc_start_request(), there is no
chance of a race with kick_requests().

We can now therefore map and send the new request unconditionally
(but we'll issue a warning should it ever occur).

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |   37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index f9276cb..3723a7f 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1779,31 +1779,24 @@ int ceph_osdc_start_request(struct
ceph_osd_client *osdc,

 	down_read(&osdc->map_sem);
 	mutex_lock(&osdc->request_mutex);
-	/*
-	 * a racing kick_requests() may have sent the message for us
-	 * while we dropped request_mutex above, so only send now if
-	 * the request still han't been touched yet.
-	 */
 	__register_request(osdc, req);
-	if (req->r_sent == 0) {
-		rc = __map_request(osdc, req, 0);
-		if (rc < 0) {
-			if (nofail) {
-				dout("osdc_start_request failed map, "
-				     " will retry %lld\n", req->r_tid);
-				rc = 0;
-			}
-			goto out_unlock;
-		}
-		if (req->r_osd == NULL) {
-			dout("send_request %p no up osds in pg\n", req);
-			ceph_monc_request_next_osdmap(&osdc->client->monc);
-		} else {
-			__send_request(osdc, req);
+	WARN_ON(req->r_sent);
+	rc = __map_request(osdc, req, 0);
+	if (rc < 0) {
+		if (nofail) {
+			dout("osdc_start_request failed map, "
+				" will retry %lld\n", req->r_tid);
+			rc = 0;
 		}
-		rc = 0;
+		goto out_unlock;
 	}
-
+	if (req->r_osd == NULL) {
+		dout("send_request %p no up osds in pg\n", req);
+		ceph_monc_request_next_osdmap(&osdc->client->monc);
+	} else {
+		__send_request(osdc, req);
+	}
+	rc = 0;
 out_unlock:
 	mutex_unlock(&osdc->request_mutex);
 	up_read(&osdc->map_sem);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] libceph: prepend requests in order when kicking
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
  2013-03-26  2:27 ` [PATCH 1/6] libceph: slightly defer registering osd request Alex Elder
  2013-03-26  2:27 ` [PATCH 2/6] libceph: no more kick_requests() race Alex Elder
@ 2013-03-26  2:27 ` Alex Elder
  2013-03-26 14:50   ` [PATCH 3/6, v2] libceph: requeue only sent requests " Alex Elder
  2013-03-26  2:27 ` [PATCH 4/6] libceph: keep request lists in tid order Alex Elder
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:27 UTC (permalink / raw)
  To: ceph-devel

The osd expects incoming requests from a given client to arrive in
order, with the tid for each request being greater than the tid for
requests that have already arrived.  This patch fixes one place
the osd client might not maintain that ordering.

For the osd client, the connection fault method is osd_reset().
That function calls __reset_osd() to close and re-open the
connection, then calls __kick_osd_requests() to cause all
outstanding requests for the affected osd to be re-sent after
the connection has been re-established.

When an osd is reset, both in-flight and unsent messages will need
to be re-sent.  An osd client maintains distinct lists for unsent
and in-flight messages.  Meanwhile, an osd maintains a single list
of call its requests (both sent and un-sent).  (Each message is
linked into two lists--one for the osd client and one list for the
osd.)

To process an osd "kick" operation, the osd's request list is
traversed, and each request is moved off whichever osd *client* list
it was on (unsent or sent) and placed onto the osd client's unsent
list.  (It remains where it is on the osd's request list.)

When that is done, osd_reset() calls __send_queued() to cause each
of the osd client's unsent messages to be sent.

OK, with that background...

As the osd request list is traversed each request is prepended to
the osd client's unsent list in the order they're seen.  The effect
of this is to reverse the order of these requests as they are put
(back) onto the unsent list.

Instead, traverse the osd request list in reverse, so their order is
preserved when prepending them to the unsent list.  We still want
to prepend these requests, because they will have lower tids than
any previously-sent request.

Just below that, traverse the linger list in forward order as
before, but add them to the *tail* of the list rather than the head.
These requests get re-registered, and in the process are give a new
(higher) tid, so the should go at the end.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 3723a7f..707d632 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -577,7 +577,14 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 	if (err)
 		return;

-	list_for_each_entry(req, &osd->o_requests, r_osd_item) {
+	/*
+	 * Traverse the osd's list of requests in reverse, moving
+	 * each entry from whatever osd client list it's on (unsent
+	 * or in-flight/lru) to the front of the osd client's unsent
+	 * list.  When we're done all the osd's requests will all be
+	 * in the osd client unsent list in increasing order of tid.
+	 */
+	list_for_each_entry_reverse(req, &osd->o_requests, r_osd_item) {
 		list_move(&req->r_req_lru_item, &osdc->req_unsent);
 		dout("requeued %p tid %llu osd%d\n", req, req->r_tid,
 		     osd->o_osd);
@@ -585,6 +592,11 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 			req->r_flags |= CEPH_OSD_FLAG_RETRY;
 	}

+	/*
+	 * Linger requests are re-registered before sending, which
+	 * sets up a new tid for each.  We add them to the unsent
+	 * list at the end to keep things in tid order.
+	 */
 	list_for_each_entry_safe(req, nreq, &osd->o_linger_requests,
 				 r_linger_osd) {
 		/*
@@ -593,7 +605,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		 */
 		BUG_ON(!list_empty(&req->r_req_lru_item));
 		__register_request(osdc, req);
-		list_add(&req->r_req_lru_item, &osdc->req_unsent);
+		list_add_tail(&req->r_req_lru_item, &osdc->req_unsent);
 		list_add(&req->r_osd_item, &req->r_osd->o_requests);
 		__unregister_linger_request(osdc, req);
 		dout("requeued lingering %p tid %llu osd%d\n", req, req->r_tid,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] libceph: keep request lists in tid order
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
                   ` (2 preceding siblings ...)
  2013-03-26  2:27 ` [PATCH 3/6] libceph: prepend requests in order when kicking Alex Elder
@ 2013-03-26  2:27 ` Alex Elder
  2013-03-26  2:28 ` [PATCH 5/6] libceph: send queued requests when starting new one Alex Elder
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:27 UTC (permalink / raw)
  To: ceph-devel

In __map_request(), when adding a request to an osd client's unsent
list, add it to the tail rather than the head.  That way the newest
entries (with the highest tid value) will be last.

Maintain an osd's request list in order of increasing tid also.

Finally--to be consistent--maintain an osd client's "notarget" list
in that order as well.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 707d632..202b9cd 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -606,7 +606,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		BUG_ON(!list_empty(&req->r_req_lru_item));
 		__register_request(osdc, req);
 		list_add_tail(&req->r_req_lru_item, &osdc->req_unsent);
-		list_add(&req->r_osd_item, &req->r_osd->o_requests);
+		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
 		__unregister_linger_request(osdc, req);
 		dout("requeued lingering %p tid %llu osd%d\n", req, req->r_tid,
 		     osd->o_osd);
@@ -1022,10 +1022,10 @@ static int __map_request(struct ceph_osd_client
*osdc,

 	if (req->r_osd) {
 		__remove_osd_from_lru(req->r_osd);
-		list_add(&req->r_osd_item, &req->r_osd->o_requests);
-		list_move(&req->r_req_lru_item, &osdc->req_unsent);
+		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
+		list_move_tail(&req->r_req_lru_item, &osdc->req_unsent);
 	} else {
-		list_move(&req->r_req_lru_item, &osdc->req_notarget);
+		list_move_tail(&req->r_req_lru_item, &osdc->req_notarget);
 	}
 	err = 1;   /* osd or pg changed */

-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] libceph: send queued requests when starting new one
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
                   ` (3 preceding siblings ...)
  2013-03-26  2:27 ` [PATCH 4/6] libceph: keep request lists in tid order Alex Elder
@ 2013-03-26  2:28 ` Alex Elder
  2013-03-26  2:28 ` [PATCH 6/6] libceph: verify requests queued in order Alex Elder
  2013-03-26 14:49 ` [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
  6 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:28 UTC (permalink / raw)
  To: ceph-devel

An osd expects the transaction ids of arriving request messages from
a given client to increase monotonically.  So the osd client needs
to send its requests in ascending tid order.

The transaction id for a request is set at the time it is
registered, in __register_request().  This is also where the request
gets placed at the end of the osd client's unsent messages list.

At the end of ceph_osdc_start_request(), the request message for a
newly-mapped osd request is supplied to the messenger to be sent
(via __send_request()).  If any other messages were present in the
osd client's unsent list at that point they would be sent *after*
this new request message.

Because those unsent messages have already been registered, their
tids would be lower than the newly-mapped request message, and
sending that message first violates the tid ordering rule.

Rather than sending the new request only, send all queued requests
(including the new one) at that point in ceph_osdc_start_request().
This ensures the tid ordering property is preserved.

With this in place, all messages should now be sent in tid order
regardless of whether they're being sent for the first time or
re-sent as a result of a call to osd_reset().

This resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 202b9cd..4d98424 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1806,7 +1806,7 @@ int ceph_osdc_start_request(struct ceph_osd_client
*osdc,
 		dout("send_request %p no up osds in pg\n", req);
 		ceph_monc_request_next_osdmap(&osdc->client->monc);
 	} else {
-		__send_request(osdc, req);
+		__send_queued(osdc);
 	}
 	rc = 0;
 out_unlock:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] libceph: verify requests queued in order
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
                   ` (4 preceding siblings ...)
  2013-03-26  2:28 ` [PATCH 5/6] libceph: send queued requests when starting new one Alex Elder
@ 2013-03-26  2:28 ` Alex Elder
  2013-03-26 14:50   ` [PATCH 6/6, v2] " Alex Elder
  2013-03-26 14:49 ` [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
  6 siblings, 1 reply; 10+ messages in thread
From: Alex Elder @ 2013-03-26  2:28 UTC (permalink / raw)
  To: ceph-devel

Add checking to verify all osd requests for an osd are added to the
queue in order of increasing tid.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 4d98424..b059cf6 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -563,6 +563,41 @@ __lookup_request_ge(struct ceph_osd_client *osdc,
 	return NULL;
 }

+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
+/*
+ * Returns true if it's OK to move the given request to the
+ * the osd client's unsent list.  Called before moving the
+ * request to the beginning of the list (prepend == true) or
+ * to the end  (prepend == * false).
+ */
+static bool osdc_unsent_check(struct ceph_osd_client *osdc,
+			struct ceph_osd_request *req,
+			bool prepend)
+{
+	struct ceph_osd_request *list_req;
+	int bad;
+
+	if (list_empty(&osdc->req_unsent))
+		return true;
+
+	if (prepend) {
+		list_req = list_first_entry(&osdc->req_unsent,
+				    struct ceph_osd_request,
+				    r_req_lru_item);
+		bad = WARN_ON(req->r_tid > list_req->r_tid);
+	} else {
+		list_req = list_last_entry(&osdc->req_unsent,
+				    struct ceph_osd_request,
+				    r_req_lru_item);
+		bad = WARN_ON(req->r_tid < list_req->r_tid);
+	}
+	bad = bad || WARN_ON(req->r_tid == list_req->r_tid);
+
+	return !bad;
+}
+
 /*
  * Resubmit requests pending on the given osd.
  */
@@ -585,6 +620,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 	 * in the osd client unsent list in increasing order of tid.
 	 */
 	list_for_each_entry_reverse(req, &osd->o_requests, r_osd_item) {
+		osdc_unsent_check(osdc, req, true);
 		list_move(&req->r_req_lru_item, &osdc->req_unsent);
 		dout("requeued %p tid %llu osd%d\n", req, req->r_tid,
 		     osd->o_osd);
@@ -605,6 +641,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		 */
 		BUG_ON(!list_empty(&req->r_req_lru_item));
 		__register_request(osdc, req);
+		osdc_unsent_check(osdc, req, false);
 		list_add_tail(&req->r_req_lru_item, &osdc->req_unsent);
 		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
 		__unregister_linger_request(osdc, req);
@@ -1023,6 +1060,7 @@ static int __map_request(struct ceph_osd_client *osdc,
 	if (req->r_osd) {
 		__remove_osd_from_lru(req->r_osd);
 		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
+		osdc_unsent_check(osdc, req, false);
 		list_move_tail(&req->r_req_lru_item, &osdc->req_unsent);
 	} else {
 		list_move_tail(&req->r_req_lru_item, &osdc->req_notarget);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/6] libceph: send osd requests in tid order
  2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
                   ` (5 preceding siblings ...)
  2013-03-26  2:28 ` [PATCH 6/6] libceph: verify requests queued in order Alex Elder
@ 2013-03-26 14:49 ` Alex Elder
  6 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26 14:49 UTC (permalink / raw)
  To: ceph-devel

On 03/25/2013 09:25 PM, Alex Elder wrote:
> This series rearranges the way osd requests are placed onto an
> osd client's unsent request list so that they are kept in
> order based on their transaction ids.  The osd expects its
> requests from a client to have monotonically increasing tids.
> Since requests are sent to the osd in the order they are on
> the unsent list if we keep them sorted there this property
> is preserved.
> 
> 					-Alex
> 
> [PATCH 1/6] libceph: slightly defer registering osd request
> [PATCH 2/6] libceph: no more kick_requests() race

I got some early feedback which clarified something for me
and which led me to make a few changes, mostly to the third
patch in this series.

What the osd requires is that the tids for requests to a
given *object* from an osd client be increasing.  I had
previously understood it was all requests from an osd
client (regardless of target object) that needed increasing
tids.  It turns out both properties hold (at least once
this series of patches is applied) so this difference
affects the explanations more than the code.

I also realized that re-queueing unsent requests along
with sent ones in __kick_osd_requests() means that some
unsent requests could be moved ahead of others in the
unsent list--others which have a higher tid.  So now
we only move already-sent requests to the front and
leave any that have not yet been sent where they were.

I'm about to re-post updated versions of these patches
reflecting these changes.  Only patches 3 and 6 were
changed, and they're the only ones I"m going to post.
Patches 4 and 5 were simply rebased for the changes
in patch 3.

These updates patches (along with the others) are available
in branch "review/wip-4392-2" of the ceph-client git repository.

					-Alex

[PATCH 3/6, v2] libceph: requeue only sent requests when kicking
[PATCH 6/6, v2] libceph: verify requests queued in order

> [PATCH 3/6] libceph: prepend requests in order when kicking
> [PATCH 4/6] libceph: keep request lists in tid order
> [PATCH 5/6] libceph: send queued requests when starting new one
> [PATCH 6/6] libceph: verify requests queued in order
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/6, v2] libceph: requeue only sent requests when kicking
  2013-03-26  2:27 ` [PATCH 3/6] libceph: prepend requests in order when kicking Alex Elder
@ 2013-03-26 14:50   ` Alex Elder
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26 14:50 UTC (permalink / raw)
  To: ceph-devel

The osd expects incoming requests for a given object from a given
client to arrive in order, with the tid for each request being
greater than the tid for requests that have already arrived.  This
patch fixes two places the osd client might not maintain that
ordering.

For the osd client, the connection fault method is osd_reset().
That function calls __reset_osd() to close and re-open the
connection, then calls __kick_osd_requests() to cause all
outstanding requests for the affected osd to be re-sent after
the connection has been re-established.

When an osd is reset, any in-flight messages will need to be
re-sent.  An osd client maintains distinct lists for unsent and
in-flight messages.  Meanwhile, an osd maintains a single list of
all its requests (both sent and un-sent).  (Each message is linked
into two lists--one for the osd client and one list for the osd.)

To process an osd "kick" operation, the request list for the *osd*
is traversed, and each request is moved off whichever osd *client*
list it was on (unsent or sent) and placed onto the osd client's
unsent list.  (It remains where it is on the osd's request list.)

When that is done, osd_reset() calls __send_queued() to cause each
of the osd client's unsent messages to be sent.

OK, with that background...

As the osd request list is traversed each request is prepended to
the osd client's unsent list in the order they're seen.  The effect
of this is to reverse the order of these requests as they are put
(back) onto the unsent list.

Instead, build up a list of only the requests for an osd that have
already been sent (by checking their r_sent flag values).  Once an
unsent request is found, stop examining requests and prepend the
requests that need re-sending to the osd client's unsent list.

Preserve the original order of requests in the process (previously
re-queued requests were reversed in this process).  Because they
have already been sent, they will have lower tids than any request
already present on the unsent list.

Just below that, traverse the linger list in forward order as
before, but add them to the *tail* of the list rather than the head.
These requests get re-registered, and in the process are give a new
(higher) tid, so the should go at the end.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
---
v2: Leave unsent requests in place; only requeue those already sent.

 net/ceph/osd_client.c |   33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 3723a7f..8b84fb4 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -570,21 +570,46 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 				struct ceph_osd *osd)
 {
 	struct ceph_osd_request *req, *nreq;
+	LIST_HEAD(resend);
 	int err;

 	dout("__kick_osd_requests osd%d\n", osd->o_osd);
 	err = __reset_osd(osdc, osd);
 	if (err)
 		return;
-
+	/*
+	 * Build up a list of requests to resend by traversing the
+	 * osd's list of requests.  Requests for a given object are
+	 * sent in tid order, and that is also the order they're
+	 * kept on this list.  Therefore all requests that are in
+	 * flight will be found first, followed by all requests that
+	 * have not yet been sent.  And to resend requests while
+	 * preserving this order we will want to put any sent
+	 * requests back on the front of the osd client's unsent
+	 * list.
+	 *
+	 * So we build a separate ordered list of already-sent
+	 * requests for the affected osd and splice it onto the
+	 * front of the osd client's unsent list.  Once we've seen a
+	 * request that has not yet been sent we're done.  Those
+	 * requests are already sitting right where they belong.
+	 */
 	list_for_each_entry(req, &osd->o_requests, r_osd_item) {
-		list_move(&req->r_req_lru_item, &osdc->req_unsent);
-		dout("requeued %p tid %llu osd%d\n", req, req->r_tid,
+		if (!req->r_sent)
+			break;
+		list_move_tail(&req->r_req_lru_item, &resend);
+		dout("requeueing %p tid %llu osd%d\n", req, req->r_tid,
 		     osd->o_osd);
 		if (!req->r_linger)
 			req->r_flags |= CEPH_OSD_FLAG_RETRY;
 	}
+	list_splice(&resend, &osdc->req_unsent);

+	/*
+	 * Linger requests are re-registered before sending, which
+	 * sets up a new tid for each.  We add them to the unsent
+	 * list at the end to keep things in tid order.
+	 */
 	list_for_each_entry_safe(req, nreq, &osd->o_linger_requests,
 				 r_linger_osd) {
 		/*
@@ -593,7 +618,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		 */
 		BUG_ON(!list_empty(&req->r_req_lru_item));
 		__register_request(osdc, req);
-		list_add(&req->r_req_lru_item, &osdc->req_unsent);
+		list_add_tail(&req->r_req_lru_item, &osdc->req_unsent);
 		list_add(&req->r_osd_item, &req->r_osd->o_requests);
 		__unregister_linger_request(osdc, req);
 		dout("requeued lingering %p tid %llu osd%d\n", req, req->r_tid,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6, v2] libceph: verify requests queued in order
  2013-03-26  2:28 ` [PATCH 6/6] libceph: verify requests queued in order Alex Elder
@ 2013-03-26 14:50   ` Alex Elder
  0 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2013-03-26 14:50 UTC (permalink / raw)
  To: ceph-devel

Add checking to verify all osd requests for an osd are added to the
queue in order of increasing tid.

Signed-off-by: Alex Elder <elder@inktank.com>
---
v2: Only check the last of the requests queued to resend.

 net/ceph/osd_client.c |   44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 3b6657f..943400c 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -563,6 +563,41 @@ __lookup_request_ge(struct ceph_osd_client *osdc,
 	return NULL;
 }

+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
+/*
+ * Returns true if it's OK to move the given request to the
+ * the osd client's unsent list.  Called before moving the
+ * request to the beginning of the list (prepend == true) or
+ * to the end  (prepend == * false).
+ */
+static bool osdc_unsent_check(struct ceph_osd_client *osdc,
+			struct ceph_osd_request *req,
+			bool prepend)
+{
+	struct ceph_osd_request *list_req;
+	int bad;
+
+	if (list_empty(&osdc->req_unsent))
+		return true;
+
+	if (prepend) {
+		list_req = list_first_entry(&osdc->req_unsent,
+				    struct ceph_osd_request,
+				    r_req_lru_item);
+		bad = WARN_ON(req->r_tid > list_req->r_tid);
+	} else {
+		list_req = list_last_entry(&osdc->req_unsent,
+				    struct ceph_osd_request,
+				    r_req_lru_item);
+		bad = WARN_ON(req->r_tid < list_req->r_tid);
+	}
+	bad = bad || WARN_ON(req->r_tid == list_req->r_tid);
+
+	return !bad;
+}
+
 /*
  * Resubmit requests pending on the given osd.
  */
@@ -603,7 +638,12 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		if (!req->r_linger)
 			req->r_flags |= CEPH_OSD_FLAG_RETRY;
 	}
-	list_splice(&resend, &osdc->req_unsent);
+	if (!list_empty(&resend)) {
+		req = list_last_entry(&resend, struct ceph_osd_request,
+					r_req_lru_item);
+		osdc_unsent_check(osdc, req, true);
+		list_splice(&resend, &osdc->req_unsent);
+	}

 	/*
 	 * Linger requests are re-registered before sending, which
@@ -618,6 +658,7 @@ static void __kick_osd_requests(struct
ceph_osd_client *osdc,
 		 */
 		BUG_ON(!list_empty(&req->r_req_lru_item));
 		__register_request(osdc, req);
+		osdc_unsent_check(osdc, req, false);
 		list_add_tail(&req->r_req_lru_item, &osdc->req_unsent);
 		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
 		__unregister_linger_request(osdc, req);
@@ -1036,6 +1077,7 @@ static int __map_request(struct ceph_osd_client *osdc,
 	if (req->r_osd) {
 		__remove_osd_from_lru(req->r_osd);
 		list_add_tail(&req->r_osd_item, &req->r_osd->o_requests);
+		osdc_unsent_check(osdc, req, false);
 		list_move_tail(&req->r_req_lru_item, &osdc->req_unsent);
 	} else {
 		list_move_tail(&req->r_req_lru_item, &osdc->req_notarget);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-03-26 14:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-26  2:25 [PATCH 0/6] libceph: send osd requests in tid order Alex Elder
2013-03-26  2:27 ` [PATCH 1/6] libceph: slightly defer registering osd request Alex Elder
2013-03-26  2:27 ` [PATCH 2/6] libceph: no more kick_requests() race Alex Elder
2013-03-26  2:27 ` [PATCH 3/6] libceph: prepend requests in order when kicking Alex Elder
2013-03-26 14:50   ` [PATCH 3/6, v2] libceph: requeue only sent requests " Alex Elder
2013-03-26  2:27 ` [PATCH 4/6] libceph: keep request lists in tid order Alex Elder
2013-03-26  2:28 ` [PATCH 5/6] libceph: send queued requests when starting new one Alex Elder
2013-03-26  2:28 ` [PATCH 6/6] libceph: verify requests queued in order Alex Elder
2013-03-26 14:50   ` [PATCH 6/6, v2] " Alex Elder
2013-03-26 14:49 ` [PATCH 0/6] libceph: send osd requests in tid order Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.