All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rbd: fix some issues around flushing notifies
@ 2020-03-17 12:04 Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
  To: ceph-devel

Hello,

A recent snapshot-based mirroring experiment exposed a deadlock on
header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
map").

Thanks,

                Ilya


Ilya Dryomov (3):
  rbd: avoid a deadlock on header_rwsem when flushing notifies
  rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
  rbd: don't test rbd_dev->opts in rbd_dev_image_release()

 drivers/block/rbd.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

-- 
2.19.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when flushing notifies
  2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
  To: ceph-devel

rbd_unregister_watch() flushes notifies and therefore cannot be called
under header_rwsem because a header update notify takes header_rwsem to
synchronize with "rbd map".  If mapping an image fails after the watch
is established and a header update notify sneaks in, we deadlock when
erroring out from rbd_dev_image_probe().

Move watch registration and unregistration out of the critical section.
The only reason they were put there was to make header_rwsem management
slightly more obvious.

Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index a4e7b494344c..f0ce30a6fc69 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4527,6 +4527,10 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev)
 	cancel_work_sync(&rbd_dev->unlock_work);
 }
 
+/*
+ * header_rwsem must not be held to avoid a deadlock with
+ * rbd_dev_refresh() when flushing notifies.
+ */
 static void rbd_unregister_watch(struct rbd_device *rbd_dev)
 {
 	cancel_tasks_sync(rbd_dev);
@@ -6907,6 +6911,8 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev)
  * device.  If this image is the one being mapped (i.e., not a
  * parent), initiate a watch on its header object before using that
  * object to get detailed information about the rbd image.
+ *
+ * On success, returns with header_rwsem held for write.
  */
 static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
 {
@@ -6936,6 +6942,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
 		}
 	}
 
+	down_write(&rbd_dev->header_rwsem);
 	ret = rbd_dev_header_info(rbd_dev);
 	if (ret) {
 		if (ret == -ENOENT && !need_watch)
@@ -6987,6 +6994,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
 err_out_probe:
 	rbd_dev_unprobe(rbd_dev);
 err_out_watch:
+	up_write(&rbd_dev->header_rwsem);
 	if (need_watch)
 		rbd_unregister_watch(rbd_dev);
 err_out_format:
@@ -7050,12 +7058,9 @@ static ssize_t do_rbd_add(struct bus_type *bus,
 		goto err_out_rbd_dev;
 	}
 
-	down_write(&rbd_dev->header_rwsem);
 	rc = rbd_dev_image_probe(rbd_dev, 0);
-	if (rc < 0) {
-		up_write(&rbd_dev->header_rwsem);
+	if (rc < 0)
 		goto err_out_rbd_dev;
-	}
 
 	if (rbd_dev->opts->alloc_size > rbd_dev->layout.object_size) {
 		rbd_warn(rbd_dev, "alloc_size adjusted to %u",
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
  2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
  2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
  To: ceph-devel

rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
including rbd_dev_header_info(), which means that rbd_dev_header_info()
isn't supposed to be called after rbd_dev_unprobe().

However, rbd_dev_image_release() calls rbd_dev_unprobe() before
rbd_unregister_watch().  This is racy because a header update notify
can sneak in:

  "rbd unmap" thread                   ceph-watch-notify worker

  rbd_dev_image_release()
    rbd_dev_unprobe()
      free and zero out header
                                       rbd_watch_cb()
                                         rbd_dev_refresh()
                                           rbd_dev_header_info()
                                             read in header

The same goes for "rbd map" because rbd_dev_image_probe() calls
rbd_dev_unprobe() on errors.  In both cases this results in a memory
leak.

Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index f0ce30a6fc69..e590dc484c18 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6898,9 +6898,10 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap)
 
 static void rbd_dev_image_release(struct rbd_device *rbd_dev)
 {
-	rbd_dev_unprobe(rbd_dev);
 	if (rbd_dev->opts)
 		rbd_unregister_watch(rbd_dev);
+
+	rbd_dev_unprobe(rbd_dev);
 	rbd_dev->image_format = 0;
 	kfree(rbd_dev->spec->image_id);
 	rbd_dev->spec->image_id = NULL;
@@ -6947,7 +6948,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
 	if (ret) {
 		if (ret == -ENOENT && !need_watch)
 			rbd_print_dne(rbd_dev, false);
-		goto err_out_watch;
+		goto err_out_probe;
 	}
 
 	/*
@@ -6992,11 +6993,10 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
 	return 0;
 
 err_out_probe:
-	rbd_dev_unprobe(rbd_dev);
-err_out_watch:
 	up_write(&rbd_dev->header_rwsem);
 	if (need_watch)
 		rbd_unregister_watch(rbd_dev);
+	rbd_dev_unprobe(rbd_dev);
 err_out_format:
 	rbd_dev->image_format = 0;
 	kfree(rbd_dev->spec->image_id);
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release()
  2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
  2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
  2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
  To: ceph-devel

rbd_dev->opts is used to distinguish between the image that is being
mapped and a parent.  However, because we no longer establish watch for
read-only mappings, this test is imprecise and results in unnecessary
rbd_unregister_watch() calls.

Make it consistent with need_watch in rbd_dev_image_probe().

Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 drivers/block/rbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e590dc484c18..f44ce9ccadd6 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6898,7 +6898,7 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap)
 
 static void rbd_dev_image_release(struct rbd_device *rbd_dev)
 {
-	if (rbd_dev->opts)
+	if (!rbd_is_ro(rbd_dev))
 		rbd_unregister_watch(rbd_dev);
 
 	rbd_dev_unprobe(rbd_dev);
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] rbd: fix some issues around flushing notifies
  2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
                   ` (2 preceding siblings ...)
  2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
@ 2020-03-17 16:41 ` Jason Dillaman
  2020-03-17 17:24   ` Ilya Dryomov
  3 siblings, 1 reply; 6+ messages in thread
From: Jason Dillaman @ 2020-03-17 16:41 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: ceph-devel

On Tue, Mar 17, 2020 at 8:06 AM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> Hello,
>
> A recent snapshot-based mirroring experiment exposed a deadlock on
> header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
> map").
>
> Thanks,
>
>                 Ilya
>
>
> Ilya Dryomov (3):
>   rbd: avoid a deadlock on header_rwsem when flushing notifies
>   rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
>   rbd: don't test rbd_dev->opts in rbd_dev_image_release()
>
>  drivers/block/rbd.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> --
> 2.19.2
>

The "get_snapcontext" call still going to hang (albeit in an
interruptible state) if the image has > 510 snapshots, correct?

Reviewed-by: Jason Dillaman <dillaman@redhat.com>

-- 
Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] rbd: fix some issues around flushing notifies
  2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
@ 2020-03-17 17:24   ` Ilya Dryomov
  0 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 17:24 UTC (permalink / raw)
  To: Jason Dillaman; +Cc: ceph-devel

On Tue, Mar 17, 2020 at 5:41 PM Jason Dillaman <jdillama@redhat.com> wrote:
>
> On Tue, Mar 17, 2020 at 8:06 AM Ilya Dryomov <idryomov@gmail.com> wrote:
> >
> > Hello,
> >
> > A recent snapshot-based mirroring experiment exposed a deadlock on
> > header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
> > map").
> >
> > Thanks,
> >
> >                 Ilya
> >
> >
> > Ilya Dryomov (3):
> >   rbd: avoid a deadlock on header_rwsem when flushing notifies
> >   rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
> >   rbd: don't test rbd_dev->opts in rbd_dev_image_release()
> >
> >  drivers/block/rbd.c | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > --
> > 2.19.2
> >
>
> The "get_snapcontext" call still going to hang (albeit in an
> interruptible state) if the image has > 510 snapshots, correct?

Yes, this has been a limitation of the messenger and its interface
since day one.  This is a krbd ticket specifically about snapshots,
but this limitation affects a lot more than that:

  https://tracker.ceph.com/issues/12874

It is engraved pretty deeply, but I'm planning to address it while
the messenger is opened up for surgery.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-17 17:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
2020-03-17 17:24   ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.