All of lore.kernel.org
 help / color / mirror / Atom feed
* Backport of 2761713d35e3 for < 3.18
@ 2015-08-24  8:31 Ilya Dryomov
  2015-08-24  9:08 ` Jiri Slaby
  2015-08-25 10:06 ` Luis Henriques
  0 siblings, 2 replies; 3+ messages in thread
From: Ilya Dryomov @ 2015-08-24  8:31 UTC (permalink / raw)
  To: stable; +Cc: Greg KH, Jiri Slaby

Hi,

The following is the backport of commit 2761713d35e3 ("rbd: fix copyup
completion race"), which was merged into 4.2-rc6, for >= 3.10 && < 3.18
kernels.  Please apply.

Thanks,

                Ilya


-- >8 --
From: Ilya Dryomov <idryomov@gmail.com>
Date: Thu, 16 Jul 2015 17:36:11 +0300
Subject: [PATCH] rbd: fix copyup completion race

For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback().  The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.

rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run.  Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:

<obj_request-1/2 reply>
handle_reply()
  rbd_osd_req_callback()
    rbd_osd_trivial_callback()
    rbd_obj_request_complete()
    rbd_img_obj_copyup_callback()
    rbd_img_obj_callback()
                                    <obj_request-2/2 reply>
                                    handle_reply()
                                      rbd_osd_req_callback()
                                        rbd_osd_trivial_callback()
      for_each_obj_request(obj_request->img_request) {
        rbd_img_obj_end_request(obj_request-1/2)
        rbd_img_obj_end_request(obj_request-2/2) <--
      }

Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0.  We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on

    rbd_assert(more ^ (which == img_request->obj_request_count));

with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.

To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request).  So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().

Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().

Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
[idryomov@gmail.com: backport to < 3.18: context]
---
 drivers/block/rbd.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 63688d3a6ea0..86c0f676154f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -508,6 +508,7 @@ void rbd_warn(struct rbd_device *rbd_dev, const char *fmt, ...)
 #  define rbd_assert(expr)	((void) 0)
 #endif /* !RBD_DEBUG */
 
+static void rbd_osd_copyup_callback(struct rbd_obj_request *obj_request);
 static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request);
 static void rbd_img_parent_read(struct rbd_obj_request *obj_request);
 static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
@@ -1651,6 +1652,16 @@ static void rbd_osd_stat_callback(struct rbd_obj_request *obj_request)
 	obj_request_done_set(obj_request);
 }
 
+static void rbd_osd_call_callback(struct rbd_obj_request *obj_request)
+{
+	dout("%s: obj %p\n", __func__, obj_request);
+
+	if (obj_request_img_data_test(obj_request))
+		rbd_osd_copyup_callback(obj_request);
+	else
+		obj_request_done_set(obj_request);
+}
+
 static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
 				struct ceph_msg *msg)
 {
@@ -1689,6 +1700,8 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
 		rbd_osd_stat_callback(obj_request);
 		break;
 	case CEPH_OSD_OP_CALL:
+		rbd_osd_call_callback(obj_request);
+		break;
 	case CEPH_OSD_OP_NOTIFY_ACK:
 	case CEPH_OSD_OP_WATCH:
 		rbd_osd_trivial_callback(obj_request);
@@ -2275,13 +2288,15 @@ out_unwind:
 }
 
 static void
-rbd_img_obj_copyup_callback(struct rbd_obj_request *obj_request)
+rbd_osd_copyup_callback(struct rbd_obj_request *obj_request)
 {
 	struct rbd_img_request *img_request;
 	struct rbd_device *rbd_dev;
 	struct page **pages;
 	u32 page_count;
 
+	dout("%s: obj %p\n", __func__, obj_request);
+
 	rbd_assert(obj_request->type == OBJ_REQUEST_BIO);
 	rbd_assert(obj_request_img_data_test(obj_request));
 	img_request = obj_request->img_request;
@@ -2307,9 +2322,7 @@ rbd_img_obj_copyup_callback(struct rbd_obj_request *obj_request)
 	if (!obj_request->result)
 		obj_request->xferred = obj_request->length;
 
-	/* Finish up with the normal image object callback */
-
-	rbd_img_obj_callback(obj_request);
+	obj_request_done_set(obj_request);
 }
 
 static void
@@ -2406,7 +2419,6 @@ rbd_img_obj_parent_read_full_callback(struct rbd_img_request *img_request)
 
 	/* All set, send it off. */
 
-	orig_request->callback = rbd_img_obj_copyup_callback;
 	osdc = &rbd_dev->rbd_client->client->osdc;
 	img_result = rbd_obj_request_submit(osdc, orig_request);
 	if (!img_result)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Backport of 2761713d35e3 for < 3.18
  2015-08-24  8:31 Backport of 2761713d35e3 for < 3.18 Ilya Dryomov
@ 2015-08-24  9:08 ` Jiri Slaby
  2015-08-25 10:06 ` Luis Henriques
  1 sibling, 0 replies; 3+ messages in thread
From: Jiri Slaby @ 2015-08-24  9:08 UTC (permalink / raw)
  To: Ilya Dryomov, stable; +Cc: Greg KH

On 08/24/2015, 10:31 AM, Ilya Dryomov wrote:
> Hi,
> 
> The following is the backport of commit 2761713d35e3 ("rbd: fix copyup
> completion race"), which was merged into 4.2-rc6, for >= 3.10 && < 3.18
> kernels.  Please apply.

Applied, thanks.


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Backport of 2761713d35e3 for < 3.18
  2015-08-24  8:31 Backport of 2761713d35e3 for < 3.18 Ilya Dryomov
  2015-08-24  9:08 ` Jiri Slaby
@ 2015-08-25 10:06 ` Luis Henriques
  1 sibling, 0 replies; 3+ messages in thread
From: Luis Henriques @ 2015-08-25 10:06 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: stable, Greg KH, Jiri Slaby

On Mon, Aug 24, 2015 at 11:31:49AM +0300, Ilya Dryomov wrote:
> Hi,
> 
> The following is the backport of commit 2761713d35e3 ("rbd: fix copyup
> completion race"), which was merged into 4.2-rc6, for >= 3.10 && < 3.18
> kernels.  Please apply.
>

Thanks, queuing this backport for the 3.16 kernel as well.

Cheers,
--
Lu�s


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-08-25 10:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24  8:31 Backport of 2761713d35e3 for < 3.18 Ilya Dryomov
2015-08-24  9:08 ` Jiri Slaby
2015-08-25 10:06 ` Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.