All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] md: fix bug due to nested suspend
@ 2015-11-25 17:20 Mikulas Patocka
  2015-12-16  3:57 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2015-11-25 17:20 UTC (permalink / raw)
  To: Dan Williams, NeilBrown, Marian Csontos, Heinz Mauelshagen
  Cc: Jens Axboe, dm-devel

The patch c7bfced9a6716ff66c9d61f934bb60af08d4688c committed to 4.4-rc 
causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with 
this BUG, the reason is that we attempt to suspend a device that is 
already suspended. See also 
https://bugzilla.redhat.com/show_bug.cgi?id=1283491

This patch fixes the bug by introducing functions mddev_nested_suspend and
mddev_nested_resume that can be called when the device is already
suspended. The number of calls to mddev_nested_suspend is kept in the
variable mddev->suspended.

kernel BUG at drivers/md/md.c:317!
CPU: 3 PID: 32754 Comm: lvm Not tainted 4.4.0-rc2 #1
task: 0000000047076040 ti: 0000000047014000 task.ti: 0000000047014000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001000000000000001111 Not tainted
r00-03  000000000804000f 00000000102c5280 0000000010c7522c 000000007e3d1810
r04-07  0000000010c6f000 000000004ef37f20 000000007e3d1dd0 000000007e3d1810
r08-11  000000007c9f1600 0000000000000000 0000000000000001 ffffffffffffffff
r12-15  0000000010c1d000 0000000000000041 00000000f98d63c8 00000000f98e49e4
r16-19  00000000f98e49e4 00000000c138fd06 00000000f98d63c8 0000000000000001
r20-23  0000000000000002 000000004ef37f00 00000000000000b0 00000000000001d1
r24-27  00000000424783a0 000000007e3d1dd0 000000007e3d1810 00000000102b2000
r28-31  0000000000000001 0000000047014840 0000000047014930 0000000000000001
sr00-03  0000000007040800 0000000000000000 0000000000000000 0000000007040800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000102c538c 00000000102c5390
 IIR: 03ffe01f    ISR: 0000000000000000  IOR: 00000000102b2748
 CPU:        3   CR30: 0000000047014000 CR31: 0000000000000000
 ORIG_R28: 00000000000000b0
 IAOQ[0]: mddev_suspend+0x10c/0x160 [md_mod]
 IAOQ[1]: mddev_suspend+0x110/0x160 [md_mod]
 RP(r2): raid1_add_disk+0xd4/0x2c0 [raid1]
Backtrace:
 [<0000000010c7522c>] raid1_add_disk+0xd4/0x2c0 [raid1]
 [<0000000010c20078>] raid_resume+0x390/0x418 [dm_raid]
 [<00000000105833e8>] dm_table_resume_targets+0xc0/0x188 [dm_mod]
 [<000000001057f784>] dm_resume+0x144/0x1e0 [dm_mod]
 [<0000000010587dd4>] dev_suspend+0x1e4/0x568 [dm_mod]
 [<0000000010589278>] ctl_ioctl+0x1e8/0x428 [dm_mod]
 [<0000000010589518>] dm_compat_ctl_ioctl+0x18/0x68 [dm_mod]
 [<0000000040377b88>] compat_SyS_ioctl+0xd0/0x1558

Fixes: c7bfced9a671 ("md: suspend i/o during runtime blk_integrity_unregister")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/md.c        |   19 +++++++++++++++++++
 drivers/md/md.h        |    2 ++
 drivers/md/multipath.c |    4 ++--
 drivers/md/raid1.c     |    4 ++--
 drivers/md/raid10.c    |    4 ++--
 5 files changed, 27 insertions(+), 6 deletions(-)

Index: linux-4.4-rc2/drivers/md/md.c
===================================================================
--- linux-4.4-rc2.orig/drivers/md/md.c	2015-11-25 18:05:11.000000000 +0100
+++ linux-4.4-rc2/drivers/md/md.c	2015-11-25 18:05:31.000000000 +0100
@@ -336,6 +336,25 @@ void mddev_resume(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(mddev_resume);
 
+void mddev_nested_suspend(struct mddev *mddev)
+{
+	if (mddev->suspended)
+		mddev->suspended++;
+	else
+		mddev_suspend(mddev);
+}
+EXPORT_SYMBOL_GPL(mddev_nested_suspend);
+
+void mddev_nested_resume(struct mddev *mddev)
+{
+	BUG_ON(!mddev->suspended);
+	if (mddev->suspended > 1)
+		mddev->suspended--;
+	else
+		mddev_resume(mddev);
+}
+EXPORT_SYMBOL_GPL(mddev_nested_resume);
+
 int mddev_congested(struct mddev *mddev, int bits)
 {
 	struct md_personality *pers = mddev->pers;
Index: linux-4.4-rc2/drivers/md/md.h
===================================================================
--- linux-4.4-rc2.orig/drivers/md/md.h	2015-11-25 18:04:08.000000000 +0100
+++ linux-4.4-rc2/drivers/md/md.h	2015-11-25 18:05:21.000000000 +0100
@@ -665,6 +665,8 @@ extern void md_rdev_clear(struct md_rdev
 
 extern void mddev_suspend(struct mddev *mddev);
 extern void mddev_resume(struct mddev *mddev);
+extern void mddev_nested_suspend(struct mddev *mddev);
+extern void mddev_nested_resume(struct mddev *mddev);
 extern struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
 				   struct mddev *mddev);
 extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
Index: linux-4.4-rc2/drivers/md/multipath.c
===================================================================
--- linux-4.4-rc2.orig/drivers/md/multipath.c	2015-11-25 18:05:11.000000000 +0100
+++ linux-4.4-rc2/drivers/md/multipath.c	2015-11-25 18:05:21.000000000 +0100
@@ -264,9 +264,9 @@ static int multipath_add_disk(struct mdd
 			spin_unlock_irq(&conf->device_lock);
 			rcu_assign_pointer(p->rdev, rdev);
 			err = 0;
-			mddev_suspend(mddev);
+			mddev_nested_suspend(mddev);
 			md_integrity_add_rdev(rdev, mddev);
-			mddev_resume(mddev);
+			mddev_nested_resume(mddev);
 			break;
 		}
 
Index: linux-4.4-rc2/drivers/md/raid1.c
===================================================================
--- linux-4.4-rc2.orig/drivers/md/raid1.c	2015-11-25 18:05:11.000000000 +0100
+++ linux-4.4-rc2/drivers/md/raid1.c	2015-11-25 18:05:21.000000000 +0100
@@ -1632,9 +1632,9 @@ static int raid1_add_disk(struct mddev *
 			break;
 		}
 	}
-	mddev_suspend(mddev);
+	mddev_nested_suspend(mddev);
 	md_integrity_add_rdev(rdev, mddev);
-	mddev_resume(mddev);
+	mddev_nested_resume(mddev);
 	if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
 	print_conf(conf);
Index: linux-4.4-rc2/drivers/md/raid10.c
===================================================================
--- linux-4.4-rc2.orig/drivers/md/raid10.c	2015-11-25 18:05:11.000000000 +0100
+++ linux-4.4-rc2/drivers/md/raid10.c	2015-11-25 18:05:21.000000000 +0100
@@ -1739,9 +1739,9 @@ static int raid10_add_disk(struct mddev 
 		rcu_assign_pointer(p->rdev, rdev);
 		break;
 	}
-	mddev_suspend(mddev);
+	mddev_nested_suspend(mddev);
 	md_integrity_add_rdev(rdev, mddev);
-	mddev_resume(mddev);
+	mddev_nested_resume(mddev);
 	if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: fix bug due to nested suspend
  2015-11-25 17:20 [PATCH] md: fix bug due to nested suspend Mikulas Patocka
@ 2015-12-16  3:57 ` NeilBrown
  2015-12-16 14:00   ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2015-12-16  3:57 UTC (permalink / raw)
  To: Mikulas Patocka, Dan Williams, Marian Csontos, Heinz Mauelshagen
  Cc: Jens Axboe, dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1115 bytes --]

On Thu, Nov 26 2015, Mikulas Patocka wrote:

> The patch c7bfced9a6716ff66c9d61f934bb60af08d4688c committed to 4.4-rc 
> causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with 
> this BUG, the reason is that we attempt to suspend a device that is 
> already suspended. See also 
> https://bugzilla.redhat.com/show_bug.cgi?id=1283491
>
> This patch fixes the bug by introducing functions mddev_nested_suspend and
> mddev_nested_resume that can be called when the device is already
> suspended. The number of calls to mddev_nested_suspend is kept in the
> variable mddev->suspended.

Hi,
 thanks for the report and patch.

I think I would rather just make mddev_suspend() always nest.
It is always called under ->reconfig_mutex or some similar guarentee of
being single-threaded in dm-raid, so we don't need an atomic_t.

Just
  mddev_suspend()
	if (mddev->suspended++)
        	return;
        ...do the suspend.

and
  mddev_resume()
       if (--mddev->suspended)
       		return;
       ... to the resume

Does that seem reasonable to you?

Thanks,
NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: fix bug due to nested suspend
  2015-12-16  3:57 ` NeilBrown
@ 2015-12-16 14:00   ` Mikulas Patocka
  2015-12-17  4:09     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2015-12-16 14:00 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Heinz Mauelshagen, Dan Williams, Marian Csontos, dm-devel



On Wed, 16 Dec 2015, NeilBrown wrote:

> On Thu, Nov 26 2015, Mikulas Patocka wrote:
> 
> > The patch c7bfced9a6716ff66c9d61f934bb60af08d4688c committed to 4.4-rc 
> > causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with 
> > this BUG, the reason is that we attempt to suspend a device that is 
> > already suspended. See also 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1283491
> >
> > This patch fixes the bug by introducing functions mddev_nested_suspend and
> > mddev_nested_resume that can be called when the device is already
> > suspended. The number of calls to mddev_nested_suspend is kept in the
> > variable mddev->suspended.
> 
> Hi,
>  thanks for the report and patch.
> 
> I think I would rather just make mddev_suspend() always nest.
> It is always called under ->reconfig_mutex or some similar guarentee of
> being single-threaded in dm-raid, so we don't need an atomic_t.
> 
> Just
>   mddev_suspend()
> 	if (mddev->suspended++)
>         	return;
>         ...do the suspend.
> 
> and
>   mddev_resume()
>        if (--mddev->suspended)
>        		return;
>        ... to the resume
> 
> Does that seem reasonable to you?

Yes, it seems OK.

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] md: fix bug due to nested suspend
  2015-12-16 14:00   ` Mikulas Patocka
@ 2015-12-17  4:09     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2015-12-17  4:09 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jens Axboe, Heinz Mauelshagen, Dan Williams, Marian Csontos, dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1450 bytes --]

On Thu, Dec 17 2015, Mikulas Patocka wrote:

> On Wed, 16 Dec 2015, NeilBrown wrote:
>
>> On Thu, Nov 26 2015, Mikulas Patocka wrote:
>> 
>> > The patch c7bfced9a6716ff66c9d61f934bb60af08d4688c committed to 4.4-rc 
>> > causes crash in LVM test shell/lvchange-raid.sh. The kernel crashes with 
>> > this BUG, the reason is that we attempt to suspend a device that is 
>> > already suspended. See also 
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1283491
>> >
>> > This patch fixes the bug by introducing functions mddev_nested_suspend and
>> > mddev_nested_resume that can be called when the device is already
>> > suspended. The number of calls to mddev_nested_suspend is kept in the
>> > variable mddev->suspended.
>> 
>> Hi,
>>  thanks for the report and patch.
>> 
>> I think I would rather just make mddev_suspend() always nest.
>> It is always called under ->reconfig_mutex or some similar guarentee of
>> being single-threaded in dm-raid, so we don't need an atomic_t.
>> 
>> Just
>>   mddev_suspend()
>> 	if (mddev->suspended++)
>>         	return;
>>         ...do the suspend.
>> 
>> and
>>   mddev_resume()
>>        if (--mddev->suspended)
>>        		return;
>>        ... to the resume
>> 
>> Does that seem reasonable to you?
>
> Yes, it seems OK.
>
> Mikulas

Thanks.  I've revised your original patch to behave as above and will
send to Linus today or tomorrow.

Thanks,
NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-12-17  4:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-25 17:20 [PATCH] md: fix bug due to nested suspend Mikulas Patocka
2015-12-16  3:57 ` NeilBrown
2015-12-16 14:00   ` Mikulas Patocka
2015-12-17  4:09     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.