* [PATCH 0/3] rbd: fix some issues around flushing notifies
@ 2020-03-17 12:04 Ilya Dryomov
2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
To: ceph-devel
Hello,
A recent snapshot-based mirroring experiment exposed a deadlock on
header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
map").
Thanks,
Ilya
Ilya Dryomov (3):
rbd: avoid a deadlock on header_rwsem when flushing notifies
rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
rbd: don't test rbd_dev->opts in rbd_dev_image_release()
drivers/block/rbd.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
--
2.19.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when flushing notifies
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
To: ceph-devel
rbd_unregister_watch() flushes notifies and therefore cannot be called
under header_rwsem because a header update notify takes header_rwsem to
synchronize with "rbd map". If mapping an image fails after the watch
is established and a header update notify sneaks in, we deadlock when
erroring out from rbd_dev_image_probe().
Move watch registration and unregistration out of the critical section.
The only reason they were put there was to make header_rwsem management
slightly more obvious.
Fixes: 811c66887746 ("rbd: fix rbd map vs notify races")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
drivers/block/rbd.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index a4e7b494344c..f0ce30a6fc69 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4527,6 +4527,10 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev)
cancel_work_sync(&rbd_dev->unlock_work);
}
+/*
+ * header_rwsem must not be held to avoid a deadlock with
+ * rbd_dev_refresh() when flushing notifies.
+ */
static void rbd_unregister_watch(struct rbd_device *rbd_dev)
{
cancel_tasks_sync(rbd_dev);
@@ -6907,6 +6911,8 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev)
* device. If this image is the one being mapped (i.e., not a
* parent), initiate a watch on its header object before using that
* object to get detailed information about the rbd image.
+ *
+ * On success, returns with header_rwsem held for write.
*/
static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
{
@@ -6936,6 +6942,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
}
}
+ down_write(&rbd_dev->header_rwsem);
ret = rbd_dev_header_info(rbd_dev);
if (ret) {
if (ret == -ENOENT && !need_watch)
@@ -6987,6 +6994,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
err_out_probe:
rbd_dev_unprobe(rbd_dev);
err_out_watch:
+ up_write(&rbd_dev->header_rwsem);
if (need_watch)
rbd_unregister_watch(rbd_dev);
err_out_format:
@@ -7050,12 +7058,9 @@ static ssize_t do_rbd_add(struct bus_type *bus,
goto err_out_rbd_dev;
}
- down_write(&rbd_dev->header_rwsem);
rc = rbd_dev_image_probe(rbd_dev, 0);
- if (rc < 0) {
- up_write(&rbd_dev->header_rwsem);
+ if (rc < 0)
goto err_out_rbd_dev;
- }
if (rbd_dev->opts->alloc_size > rbd_dev->layout.object_size) {
rbd_warn(rbd_dev, "alloc_size adjusted to %u",
--
2.19.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
To: ceph-devel
rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(),
including rbd_dev_header_info(), which means that rbd_dev_header_info()
isn't supposed to be called after rbd_dev_unprobe().
However, rbd_dev_image_release() calls rbd_dev_unprobe() before
rbd_unregister_watch(). This is racy because a header update notify
can sneak in:
"rbd unmap" thread ceph-watch-notify worker
rbd_dev_image_release()
rbd_dev_unprobe()
free and zero out header
rbd_watch_cb()
rbd_dev_refresh()
rbd_dev_header_info()
read in header
The same goes for "rbd map" because rbd_dev_image_probe() calls
rbd_dev_unprobe() on errors. In both cases this results in a memory
leak.
Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
drivers/block/rbd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index f0ce30a6fc69..e590dc484c18 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6898,9 +6898,10 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap)
static void rbd_dev_image_release(struct rbd_device *rbd_dev)
{
- rbd_dev_unprobe(rbd_dev);
if (rbd_dev->opts)
rbd_unregister_watch(rbd_dev);
+
+ rbd_dev_unprobe(rbd_dev);
rbd_dev->image_format = 0;
kfree(rbd_dev->spec->image_id);
rbd_dev->spec->image_id = NULL;
@@ -6947,7 +6948,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
if (ret) {
if (ret == -ENOENT && !need_watch)
rbd_print_dne(rbd_dev, false);
- goto err_out_watch;
+ goto err_out_probe;
}
/*
@@ -6992,11 +6993,10 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
return 0;
err_out_probe:
- rbd_dev_unprobe(rbd_dev);
-err_out_watch:
up_write(&rbd_dev->header_rwsem);
if (need_watch)
rbd_unregister_watch(rbd_dev);
+ rbd_dev_unprobe(rbd_dev);
err_out_format:
rbd_dev->image_format = 0;
kfree(rbd_dev->spec->image_id);
--
2.19.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release()
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
@ 2020-03-17 12:04 ` Ilya Dryomov
2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 12:04 UTC (permalink / raw)
To: ceph-devel
rbd_dev->opts is used to distinguish between the image that is being
mapped and a parent. However, because we no longer establish watch for
read-only mappings, this test is imprecise and results in unnecessary
rbd_unregister_watch() calls.
Make it consistent with need_watch in rbd_dev_image_probe().
Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
drivers/block/rbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e590dc484c18..f44ce9ccadd6 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6898,7 +6898,7 @@ static void rbd_print_dne(struct rbd_device *rbd_dev, bool is_snap)
static void rbd_dev_image_release(struct rbd_device *rbd_dev)
{
- if (rbd_dev->opts)
+ if (!rbd_is_ro(rbd_dev))
rbd_unregister_watch(rbd_dev);
rbd_dev_unprobe(rbd_dev);
--
2.19.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] rbd: fix some issues around flushing notifies
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
` (2 preceding siblings ...)
2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
@ 2020-03-17 16:41 ` Jason Dillaman
2020-03-17 17:24 ` Ilya Dryomov
3 siblings, 1 reply; 6+ messages in thread
From: Jason Dillaman @ 2020-03-17 16:41 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: ceph-devel
On Tue, Mar 17, 2020 at 8:06 AM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> Hello,
>
> A recent snapshot-based mirroring experiment exposed a deadlock on
> header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
> map").
>
> Thanks,
>
> Ilya
>
>
> Ilya Dryomov (3):
> rbd: avoid a deadlock on header_rwsem when flushing notifies
> rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
> rbd: don't test rbd_dev->opts in rbd_dev_image_release()
>
> drivers/block/rbd.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> --
> 2.19.2
>
The "get_snapcontext" call still going to hang (albeit in an
interruptible state) if the image has > 510 snapshots, correct?
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
--
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] rbd: fix some issues around flushing notifies
2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
@ 2020-03-17 17:24 ` Ilya Dryomov
0 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2020-03-17 17:24 UTC (permalink / raw)
To: Jason Dillaman; +Cc: ceph-devel
On Tue, Mar 17, 2020 at 5:41 PM Jason Dillaman <jdillama@redhat.com> wrote:
>
> On Tue, Mar 17, 2020 at 8:06 AM Ilya Dryomov <idryomov@gmail.com> wrote:
> >
> > Hello,
> >
> > A recent snapshot-based mirroring experiment exposed a deadlock on
> > header_rwsem in the error path of rbd_dev_image_probe() (i.e. "rbd
> > map").
> >
> > Thanks,
> >
> > Ilya
> >
> >
> > Ilya Dryomov (3):
> > rbd: avoid a deadlock on header_rwsem when flushing notifies
> > rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
> > rbd: don't test rbd_dev->opts in rbd_dev_image_release()
> >
> > drivers/block/rbd.c | 23 ++++++++++++++---------
> > 1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > --
> > 2.19.2
> >
>
> The "get_snapcontext" call still going to hang (albeit in an
> interruptible state) if the image has > 510 snapshots, correct?
Yes, this has been a limitation of the messenger and its interface
since day one. This is a krbd ticket specifically about snapshots,
but this limitation affects a lot more than that:
https://tracker.ceph.com/issues/12874
It is engraved pretty deeply, but I'm planning to address it while
the messenger is opened up for surgery.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-03-17 17:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 12:04 [PATCH 0/3] rbd: fix some issues around flushing notifies Ilya Dryomov
2020-03-17 12:04 ` [PATCH 1/3] rbd: avoid a deadlock on header_rwsem when " Ilya Dryomov
2020-03-17 12:04 ` [PATCH 2/3] rbd: call rbd_dev_unprobe() after unwatching and " Ilya Dryomov
2020-03-17 12:04 ` [PATCH 3/3] rbd: don't test rbd_dev->opts in rbd_dev_image_release() Ilya Dryomov
2020-03-17 16:41 ` [PATCH 0/3] rbd: fix some issues around flushing notifies Jason Dillaman
2020-03-17 17:24 ` Ilya Dryomov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.