All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
@ 2009-08-21 16:41 Andy Whitcroft
  2009-08-21 16:41 ` [PATCH 1/1] " Andy Whitcroft
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andy Whitcroft @ 2009-08-21 16:41 UTC (permalink / raw)
  To: Ed L. Cashin, Andrew Morton, Andy Whitcroft, linux-kernel; +Cc: Andy Whitcroft

We have been seeing oopses in very recent kernels when using the AOE driver.
When attempting to mount remote devices we get a warning from the kobject
layer:

  [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
	an uninitialized object, something is seriously wrong.

Looking at the driver it seems to have always had an embedded request_queue
and it is this that is throwing the error.  It appears the intent is
tha these would be allocated and released using helpers, and the lack of
these leaves the object uninitialised and throws the error.  It is unclear
how this could ever have worked.  Anyhow the following email contains a
patch to allocate and release this request_queue via the standard helpers.
This has bene shown to fix the issue in testing.

Comments.

-apw

Andy Whitcroft (1):
  aoe: ensure we initialise the request_queue correctly

 drivers/block/aoe/aoe.h    |    2 +-
 drivers/block/aoe/aoeblk.c |    6 +++---
 drivers/block/aoe/aoedev.c |   11 ++++++++++-
 3 files changed, 14 insertions(+), 5 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] aoe: ensure we initialise the request_queue correctly
  2009-08-21 16:41 [PATCH 0/1] aoe: ensure we initialise the request_queue correctly Andy Whitcroft
@ 2009-08-21 16:41 ` Andy Whitcroft
  2009-08-21 18:58 ` [PATCH 0/1] " Ed Cashin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Andy Whitcroft @ 2009-08-21 16:41 UTC (permalink / raw)
  To: Ed L. Cashin, Andrew Morton, Andy Whitcroft, linux-kernel; +Cc: Andy Whitcroft

BugLink: http://bugs.launchpad.net/bugs/410198

When using the aoe driver we see an oops triggered by use of an
incorrectly initialised request_queue object:

  [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
		an uninitialized object, something is seriously wrong.
  [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
  [ 2645.959107] Call Trace:
  [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
  [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
  [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
  [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]

It seems this driver mearly embeds a request_queue object in its main
control structure and does not attempt to initialise it.  Pull this out
and use blk_init_queue/blk_cleanup_queue to handle initialisation and
teardown of this object.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
---
 drivers/block/aoe/aoe.h    |    2 +-
 drivers/block/aoe/aoeblk.c |    6 +++---
 drivers/block/aoe/aoedev.c |   11 ++++++++++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 5e41e6d..db195ab 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -155,7 +155,7 @@ struct aoedev {
 	u16 fw_ver;		/* version of blade's firmware */
 	struct work_struct work;/* disk create work struct */
 	struct gendisk *gd;
-	struct request_queue blkq;
+	struct request_queue *blkq;
 	struct hd_geometry geo; 
 	sector_t ssize;
 	struct timer_list timer;
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 2307a27..83e0dfd 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -264,8 +264,8 @@ aoeblk_gdalloc(void *vp)
 		goto err_disk;
 	}
 
-	blk_queue_make_request(&d->blkq, aoeblk_make_request);
-	if (bdi_init(&d->blkq.backing_dev_info))
+	blk_queue_make_request(d->blkq, aoeblk_make_request);
+	if (bdi_init(&d->blkq->backing_dev_info))
 		goto err_mempool;
 	spin_lock_irqsave(&d->lock, flags);
 	gd->major = AOE_MAJOR;
@@ -276,7 +276,7 @@ aoeblk_gdalloc(void *vp)
 	snprintf(gd->disk_name, sizeof gd->disk_name, "etherd/e%ld.%d",
 		d->aoemajor, d->aoeminor);
 
-	gd->queue = &d->blkq;
+	gd->queue = d->blkq;
 	d->gd = gd;
 	d->flags &= ~DEVFL_GDALLOC;
 	d->flags |= DEVFL_UP;
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index eeea477..0add442 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -113,6 +113,7 @@ aoedev_freedev(struct aoedev *d)
 	if (d->bufpool)
 		mempool_destroy(d->bufpool);
 	skbpoolfree(d);
+	blk_cleanup_queue(d->blkq);
 	kfree(d);
 }
 
@@ -202,6 +203,7 @@ aoedev_by_sysminor_m(ulong sysminor)
 {
 	struct aoedev *d;
 	ulong flags;
+	struct request_queue *rq;
 
 	spin_lock_irqsave(&devlist_lock, flags);
 
@@ -210,9 +212,13 @@ aoedev_by_sysminor_m(ulong sysminor)
 			break;
 	if (d)
 		goto out;
+	rq = blk_init_queue(NULL, NULL);
+	if (!rq)
+		goto out;
 	d = kcalloc(1, sizeof *d, GFP_ATOMIC);
 	if (!d)
-		goto out;
+		goto out_rq;
+	d->blkq = rq;
 	INIT_WORK(&d->work, aoecmd_sleepwork);
 	spin_lock_init(&d->lock);
 	skb_queue_head_init(&d->sendq);
@@ -234,6 +240,9 @@ aoedev_by_sysminor_m(ulong sysminor)
  out:
 	spin_unlock_irqrestore(&devlist_lock, flags);
 	return d;
+ out_rq:
+	blk_cleanup_queue(rq);
+	goto out;
 }
 
 static void
-- 
1.6.3.rc3.199.g24398


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-21 16:41 [PATCH 0/1] aoe: ensure we initialise the request_queue correctly Andy Whitcroft
  2009-08-21 16:41 ` [PATCH 1/1] " Andy Whitcroft
@ 2009-08-21 18:58 ` Ed Cashin
  2009-08-22  9:21 ` Bruno Prémont
  2009-08-24 14:27 ` Ed Cashin
  3 siblings, 0 replies; 14+ messages in thread
From: Ed Cashin @ 2009-08-21 18:58 UTC (permalink / raw)
  To: apw, ecashin, akpm, linux-kernel

It would be very interesting to find out what the results of a git
bisect would show.  Is that something you could do?

Thanks for the report and the patch.  I'm looking at the changes now.

-- 
  Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-21 16:41 [PATCH 0/1] aoe: ensure we initialise the request_queue correctly Andy Whitcroft
  2009-08-21 16:41 ` [PATCH 1/1] " Andy Whitcroft
  2009-08-21 18:58 ` [PATCH 0/1] " Ed Cashin
@ 2009-08-22  9:21 ` Bruno Prémont
  2009-08-22 21:52   ` Rafael J. Wysocki
  2009-08-24 14:27 ` Ed Cashin
  3 siblings, 1 reply; 14+ messages in thread
From: Bruno Prémont @ 2009-08-22  9:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Whitcroft, Ed L. Cashin, Andrew Morton, Andy Whitcroft,
	linux-kernel

Rafael, this matches regression bug #13942 "Troubles with AoE and
uninitialized object".

Will test and report how it works out here.

Bruno


On Fri, 21 August 2009 Andy Whitcroft <apw@canonical.com> wrote:
> We have been seeing oopses in very recent kernels when using the AOE
> driver. When attempting to mount remote devices we get a warning from
> the kobject layer:
> 
>   [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
> 	an uninitialized object, something is seriously wrong.
> 
> Looking at the driver it seems to have always had an embedded
> request_queue and it is this that is throwing the error.  It appears
> the intent is tha these would be allocated and released using
> helpers, and the lack of these leaves the object uninitialised and
> throws the error.  It is unclear how this could ever have worked.
> Anyhow the following email contains a patch to allocate and release
> this request_queue via the standard helpers. This has bene shown to
> fix the issue in testing.
> 
> Comments.
> 
> -apw
> 
> Andy Whitcroft (1):
>   aoe: ensure we initialise the request_queue correctly
> 
>  drivers/block/aoe/aoe.h    |    2 +-
>  drivers/block/aoe/aoeblk.c |    6 +++---
>  drivers/block/aoe/aoedev.c |   11 ++++++++++-
>  3 files changed, 14 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-22  9:21 ` Bruno Prémont
@ 2009-08-22 21:52   ` Rafael J. Wysocki
  2009-08-23  9:00     ` Bruno Prémont
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2009-08-22 21:52 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Andy Whitcroft, Ed L. Cashin, Andrew Morton, linux-kernel

On Saturday 22 August 2009, Bruno Prémont wrote:
> Rafael, this matches regression bug #13942 "Troubles with AoE and
> uninitialized object".
> 
> Will test and report how it works out here.

Great, thanks for the information.

Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-22 21:52   ` Rafael J. Wysocki
@ 2009-08-23  9:00     ` Bruno Prémont
  2009-08-23 20:16       ` Bruno Prémont
  0 siblings, 1 reply; 14+ messages in thread
From: Bruno Prémont @ 2009-08-23  9:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Whitcroft, Ed L. Cashin, Andrew Morton, linux-kernel, Jens Axboe

On Sat, 22 August 2009 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> On Saturday 22 August 2009, Bruno Prémont wrote:
> > Rafael, this matches regression bug #13942 "Troubles with AoE and
> > uninitialized object".
> > 
> > Will test and report how it works out here.
> 
> Great, thanks for the information.
> 
> Rafael
> 
For me this makes the kobject <NULL> message go away when new AoE
device is discovered.
At least this uninitialized object warning already existed with
2.6.31-rc1. More bisect-like details later today.


It does not fix the BUG(), WARN() triggered when unmounting XFS from
an AoE device as visible below. (linux-2.6.31-rc7 + patch of this thread)

Jens, is bio->bi_io_vec allowed to be NULL? AoE BUG()s in that case.



[  104.703175] aoe: 00089bb442e6 e0.0 v400c has 6291456 sectors
[  104.703321]  etherd/e0.0: unknown partition table
[  223.687021] XFS mounting filesystem etherd/e0.0
[  224.858261] Ending clean XFS mount for filesystem: etherd/e0.0
[  373.566294] aoe: bi_io_vec is NULL
[  373.566334] ------------[ cut here ]------------
[  373.568919] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
[  373.571672] invalid opcode: 0000 [#1] 
[  373.574484] last sysfs file: /sys/devices/virtual/hwmon/hwmon0/temp1_input
[  373.576194] Modules linked in: squashfs zlib_inflate nfs lockd nfs_acl sunrpc 8021q snd_pcm_oss snd_mixer_oss xfs exportfs usb_storage loop nsc_ircc snd_intel8x0 irda ehci_hcd snd_ac97_codec ac97_bus uhci_hcd snd_pcm snd_timer usbcore snd snd_page_alloc i2c_i801 pcspkr crc_ccitt
[  373.576194] 
[  373.576194] Pid: 2205, comm: umount Not tainted (2.6.31-rc7 #2) TravelMate 660
[  373.576194] EIP: 0060:[<c11eff07>] EFLAGS: 00010296 CPU: 0
[  373.576194] EIP is at aoeblk_make_request+0x1b7/0x1d0
[  373.576194] EAX: 0000002c EBX: 00000000 ECX: ffffffff EDX: c13c6584
[  373.576194] ESI: da531000 EDI: dd394580 EBP: db7fcdac ESP: db7fcd84
[  373.576194]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  373.576194] Process umount (pid: 2205, ti=db7fc000 task=dd91b850 task.ti=db7fc000)
[  373.576194] Stack:
[  373.576194]  c1396425 dd80a258 db7fcd94 dd394580 00000246 00000400 dd394580 000000ff
[  373.576194] <0> dd402540 dd394580 db7fce54 c10f2dcd dd115480 00000000 dd394580 00000056
[  373.576194] <0> fa4273d3 4aff5460 00000056 fa483f4b 4aff5460 00000000 00000488 00000000
[  373.576194] Call Trace:
[  373.576194]  [<c10f2dcd>] ? generic_make_request+0x28d/0x360
[  373.576194]  [<c101f93e>] ? set_next_entity+0x2e/0x70
[  373.576194]  [<c12b7463>] ? schedule+0x203/0x340
[  373.576194]  [<c1050d8e>] ? mempool_alloc_slab+0xe/0x10
[  373.576194]  [<c1050d8e>] ? mempool_alloc_slab+0xe/0x10
[  373.576194]  [<c10f2ee2>] ? submit_bio+0x42/0xb0
[  373.576194]  [<c1096d2b>] ? bio_alloc_bioset+0x2b/0xe0
[  373.576194]  [<c10f5004>] ? blkdev_issue_flush+0x74/0xb0
[  373.576194]  [<e0c2a8cd>] ? xfs_blkdev_issue_flush+0xd/0x10 [xfs]
[  373.576194]  [<e0c23d8d>] ? xfs_free_buftarg+0x2d/0x60 [xfs]
[  373.576194]  [<e0c2a570>] ? xfs_close_devices+0x50/0x60 [xfs]
[  373.576194]  [<e0c2a601>] ? xfs_fs_put_super+0x81/0xc0 [xfs]
[  373.576194]  [<c107687b>] ? generic_shutdown_super+0x4b/0xd0
[  373.576194]  [<c1076925>] ? kill_block_super+0x25/0x40
[  373.576194]  [<c1076c37>] ? deactivate_super+0x37/0x50
[  373.576194]  [<c10897d0>] ? mntput_no_expire+0x50/0x60
[  373.576194]  [<c1089a4f>] ? sys_umount+0x4f/0x2d0
[  373.576194]  [<c1089ce7>] ? sys_oldumount+0x17/0x20
[  373.576194]  [<c1002e08>] ? sysenter_do_call+0x12/0x26
[  373.576194] Code: ff c7 04 24 fc 0e 38 c1 e8 41 72 0c 00 8b 45 e4 ba f4 ff ff ff e8 da 67 ea ff e9 77 ff ff ff c7 04 24 25 64 39 c1 e8 23 72 0c 00 <0f> 0b eb fe 90 8d 74 26 00 c7 04 24 d4 0e 38 c1 e8 0e 72 0c 00 
[  373.576194] EIP: [<c11eff07>] aoeblk_make_request+0x1b7/0x1d0 SS:ESP 0068:db7fcd84
[  373.776694] ---[ end trace 1449bb87303f798f ]---
[  373.783838] ------------[ cut here ]------------
[  373.791050] WARNING: at /usr/src/linux-2.6/kernel/exit.c:895 do_exit+0x5a7/0x630()
[  373.798475] Hardware name: TravelMate 660
[  373.805987] Modules linked in: squashfs zlib_inflate nfs lockd nfs_acl sunrpc 8021q snd_pcm_oss snd_mixer_oss xfs exportfs usb_storage loop nsc_ircc snd_intel8x0 irda ehci_hcd snd_ac97_codec ac97_bus uhci_hcd snd_pcm snd_timer usbcore snd snd_page_alloc i2c_i801 pcspkr crc_ccitt
[  373.823149] Pid: 2205, comm: umount Tainted: G      D    2.6.31-rc7 #2
[  373.831726] Call Trace:
[  373.840265]  [<c12b7142>] ? printk+0x18/0x1e
[  373.848886]  [<c1026847>] ? do_exit+0x5a7/0x630
[  373.857585]  [<c102371c>] warn_slowpath_common+0x6c/0xc0
[  373.866349]  [<c1026847>] ? do_exit+0x5a7/0x630
[  373.875122]  [<c1023785>] warn_slowpath_null+0x15/0x20
[  373.883949]  [<c1026847>] do_exit+0x5a7/0x630
[  373.892826]  [<c10034ca>] ? apic_timer_interrupt+0x2a/0x30
[  373.901834]  [<c12b7142>] ? printk+0x18/0x1e
[  373.910890]  [<c102364f>] ? oops_exit+0x2f/0x40
[  373.919964]  [<c10060d5>] oops_end+0x85/0x90
[  373.929114]  [<c1006250>] die+0x50/0x70
[  373.938284]  [<c10039a1>] do_trap+0x91/0xd0
[  373.947465]  [<c1003da0>] ? do_invalid_op+0x0/0xa0
[  373.956718]  [<c1003e27>] do_invalid_op+0x87/0xa0
[  373.965984]  [<c11eff07>] ? aoeblk_make_request+0x1b7/0x1d0
[  373.975343]  [<c12b8bee>] error_code+0x5e/0x64
[  373.984644]  [<c1003da0>] ? do_invalid_op+0x0/0xa0
[  373.993877]  [<c11eff07>] ? aoeblk_make_request+0x1b7/0x1d0
[  374.003202]  [<c10f2dcd>] generic_make_request+0x28d/0x360
[  374.012341]  [<c101f93e>] ? set_next_entity+0x2e/0x70
[  374.021191]  [<c12b7463>] ? schedule+0x203/0x340
[  374.029675]  [<c1050d8e>] ? mempool_alloc_slab+0xe/0x10
[  374.038047]  [<c1050d8e>] ? mempool_alloc_slab+0xe/0x10
[  374.046294]  [<c10f2ee2>] submit_bio+0x42/0xb0
[  374.054450]  [<c1096d2b>] ? bio_alloc_bioset+0x2b/0xe0
[  374.062617]  [<c10f5004>] blkdev_issue_flush+0x74/0xb0
[  374.070794]  [<e0c2a8cd>] xfs_blkdev_issue_flush+0xd/0x10 [xfs]
[  374.078994]  [<e0c23d8d>] xfs_free_buftarg+0x2d/0x60 [xfs]
[  374.087187]  [<e0c2a570>] xfs_close_devices+0x50/0x60 [xfs]
[  374.095304]  [<e0c2a601>] xfs_fs_put_super+0x81/0xc0 [xfs]
[  374.103296]  [<c107687b>] generic_shutdown_super+0x4b/0xd0
[  374.111242]  [<c1076925>] kill_block_super+0x25/0x40
[  374.119071]  [<c1076c37>] deactivate_super+0x37/0x50
[  374.126875]  [<c10897d0>] mntput_no_expire+0x50/0x60
[  374.134570]  [<c1089a4f>] sys_umount+0x4f/0x2d0
[  374.142222]  [<c1089ce7>] sys_oldumount+0x17/0x20
[  374.149826]  [<c1002e08>] sysenter_do_call+0x12/0x26
[  374.157463] ---[ end trace 1449bb87303f7990 ]---

Note: after these a clean shutdown is no more possible as some tasks
of the shutdown process en up in D state (some lock around umount?)...

Bruno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-23  9:00     ` Bruno Prémont
@ 2009-08-23 20:16       ` Bruno Prémont
  0 siblings, 0 replies; 14+ messages in thread
From: Bruno Prémont @ 2009-08-23 20:16 UTC (permalink / raw)
  Cc: Rafael J. Wysocki, Andy Whitcroft, Ed L. Cashin, Andrew Morton,
	linux-kernel, Jens Axboe

On Sun, 23 August 2009 Bruno Prémont <bonbons@linux-vserver.org> wrote:
> At least this uninitialized object warning already existed with
> 2.6.31-rc1. More bisect-like details later today.

Hm, the NULL bio->bi_io_vec on unmount seems picky as to when it wants
to be reproduced, half a day bisecting on a spare system without
result...

I got it on one system with -rc1, on spare system a bisect
v2.6.30..v2.6.31-rc1 was always good. Will have to retry with complete,
not simplified config on one of the two systems where I got that trace.

FYI: someone else did report it on 2.6.30-gentoo-r4 which is
similar to 2.6.30.x, see http://bugs.gentoo.org/show_bug.cgi?id=281929

Bruno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-21 16:41 [PATCH 0/1] aoe: ensure we initialise the request_queue correctly Andy Whitcroft
                   ` (2 preceding siblings ...)
  2009-08-22  9:21 ` Bruno Prémont
@ 2009-08-24 14:27 ` Ed Cashin
  2009-08-29 13:43   ` Bruno Prémont
  3 siblings, 1 reply; 14+ messages in thread
From: Ed Cashin @ 2009-08-24 14:27 UTC (permalink / raw)
  To: apw, ecashin, akpm, linux-kernel

On Fri Aug 21 12:41:59 EDT 2009, apw@canonical.com wrote:
> We have been seeing oopses in very recent kernels when using the AOE driver.
> When attempting to mount remote devices we get a warning from the kobject
> layer:
> 
>   [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
> 	an uninitialized object, something is seriously wrong.
> 
> Looking at the driver it seems to have always had an embedded request_queue
> and it is this that is throwing the error.  It appears the intent is
> tha these would be allocated and released using helpers, and the lack of
> these leaves the object uninitialised and throws the error.  It is unclear
> how this could ever have worked.

This aoe driver does not handle I/O requests but provides its own
make_request function to blk_queue_make_request and handles bios
instead.

The reason I was interested in a git bisect is that I suspect that
before nobody was interested in the request_queue in the aoe driver
until recent changes to kobject code.  A couple people have indicated
that they'd be doing such a git bisect, so if anybody has done that,
please Cc me.

-- 
  Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-24 14:27 ` Ed Cashin
@ 2009-08-29 13:43   ` Bruno Prémont
  2009-09-01 19:15     ` Ed Cashin
  0 siblings, 1 reply; 14+ messages in thread
From: Bruno Prémont @ 2009-08-29 13:43 UTC (permalink / raw)
  To: Ed Cashin; +Cc: apw, akpm, linux-kernel

On Mon, 24 August 2009 Ed Cashin <ecashin@coraid.com> wrote:
> On Fri Aug 21 12:41:59 EDT 2009, apw@canonical.com wrote:
> > We have been seeing oopses in very recent kernels when using the
> > AOE driver. When attempting to mount remote devices we get a
> > warning from the kobject layer:
> > 
> >   [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
> > 	an uninitialized object, something is seriously wrong.
> > 
> > Looking at the driver it seems to have always had an embedded
> > request_queue and it is this that is throwing the error.  It
> > appears the intent is tha these would be allocated and released
> > using helpers, and the lack of these leaves the object
> > uninitialised and throws the error.  It is unclear how this could
> > ever have worked.
> 
> This aoe driver does not handle I/O requests but provides its own
> make_request function to blk_queue_make_request and handles bios
> instead.
> 
> The reason I was interested in a git bisect is that I suspect that
> before nobody was interested in the request_queue in the aoe driver
> until recent changes to kobject code.  A couple people have indicated
> that they'd be doing such a git bisect, so if anybody has done that,
> please Cc me.
> 
I finished bisecting the NULL object and ended up at this commit:
  cd43e26f071524647e660706b784ebcbefbd2e44

  block: Expose stacked device queues in sysfs

  Currently stacking devices do not have a queue directory in sysfs.
  However, many of the I/O characteristics like sector size, maximum
  request size, etc. are queue properties.

  This patch enables the queue directory for MD/DM devices.  The elevator
  code has been modified to deal with queues that do not have an I/O
  scheduler.

  Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
  Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


This seems to generate /sys/block/$device/queue and its contents for
everyone who is using queues, not just for those queues that have a
non-NULL queue->request_fn.

Bruno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-08-29 13:43   ` Bruno Prémont
@ 2009-09-01 19:15     ` Ed Cashin
  2009-09-01 20:31       ` Bruno Prémont
  2009-09-02 19:55       ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Ed Cashin @ 2009-09-01 19:15 UTC (permalink / raw)
  To: bonbons, ecashin, apw, akpm, linux-kernel

On Sat Aug 29 09:43:54 EDT 2009, bonbons@linux-vserver.org wrote:
...
> I finished bisecting the NULL object and ended up at this commit:
>   cd43e26f071524647e660706b784ebcbefbd2e44
> 
>   block: Expose stacked device queues in sysfs
> 
>   Currently stacking devices do not have a queue directory in sysfs.
>   However, many of the I/O characteristics like sector size, maximum
>   request size, etc. are queue properties.
> 
>   This patch enables the queue directory for MD/DM devices.  The elevator
>   code has been modified to deal with queues that do not have an I/O
>   scheduler.
> 
>   Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>   Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
> 
> 
> This seems to generate /sys/block/$device/queue and its contents for
> everyone who is using queues, not just for those queues that have a
> non-NULL queue->request_fn.

Thanks much for doing that.  It makes sense that this change would
have caused it to suddenly matter whether the unused queue is
initialized.

The patch looks fine to me.

I don't think it should go in my aoe tree for linux-next, since the
patch addresses a regression.

Based on the series file in mmotm, I don't think this patch is in mm
at the moment.  Andrew Morton, do you think this patch should go
through your mm tree?

-- 
  Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-09-01 19:15     ` Ed Cashin
@ 2009-09-01 20:31       ` Bruno Prémont
  2009-09-02 13:31         ` Ed Cashin
  2009-09-02 19:55       ` Andrew Morton
  1 sibling, 1 reply; 14+ messages in thread
From: Bruno Prémont @ 2009-09-01 20:31 UTC (permalink / raw)
  To: Ed Cashin; +Cc: apw, akpm, linux-kernel

On Tue, 01 September 2009 Ed Cashin <ecashin@coraid.com> wrote:

> On Sat Aug 29 09:43:54 EDT 2009, bonbons@linux-vserver.org wrote:
> ...
> > I finished bisecting the NULL object and ended up at this commit:
> >   cd43e26f071524647e660706b784ebcbefbd2e44
> > 
> >   block: Expose stacked device queues in sysfs
> > 
> >   Currently stacking devices do not have a queue directory in sysfs.
> >   However, many of the I/O characteristics like sector size, maximum
> >   request size, etc. are queue properties.
> > 
> >   This patch enables the queue directory for MD/DM devices.  The
> > elevator code has been modified to deal with queues that do not
> > have an I/O scheduler.
> > 
> >   Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> >   Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
> > 
> > 
> > This seems to generate /sys/block/$device/queue and its contents for
> > everyone who is using queues, not just for those queues that have a
> > non-NULL queue->request_fn.
> 
> Thanks much for doing that.  It makes sense that this change would
> have caused it to suddenly matter whether the unused queue is
> initialized.
> 
> The patch looks fine to me.

Would it make sense to fine-tune the values reported by sys-fs in the
queue details so they match with the AoE device (as a separate
enhancement patch)?

Sure the queue itself won't use them, but some user-space tools might
be interested in this data.

I've not checked what information is provided by AoE protocol or can
be deducted from interface MTU.

Bruno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-09-01 20:31       ` Bruno Prémont
@ 2009-09-02 13:31         ` Ed Cashin
  0 siblings, 0 replies; 14+ messages in thread
From: Ed Cashin @ 2009-09-02 13:31 UTC (permalink / raw)
  To: bonbons, ecashin, apw, akpm, linux-kernel

On Tue Sep  1 16:31:58 EDT 2009, bonbons@linux-vserver.org wrote:
> On Tue, 01 September 2009 Ed Cashin <ecashin@coraid.com> wrote:
...
> > Thanks much for doing that.  It makes sense that this change would
> > have caused it to suddenly matter whether the unused queue is
> > initialized.
> > 
> > The patch looks fine to me.
> 
> Would it make sense to fine-tune the values reported by sys-fs in the
> queue details so they match with the AoE device (as a separate
> enhancement patch)?
> 
> Sure the queue itself won't use them, but some user-space tools might
> be interested in this data.
> 
> I've not checked what information is provided by AoE protocol or can
> be deducted from interface MTU.

The AoE protocol does allow a storage target to say, "I can handle
this certain number of sectors in a single read or write command." But
the user space tools cannot really benefit from that information,
since they can send larger read/write operations with no loss of
efficiency.

So I suspect that it would not make much sense, but if you decide to
look into it further, see whether you can identify an extant user
space tool that benefits from the information.

-- 
  Ed Cashin
  http://noserose.net/e/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-09-01 19:15     ` Ed Cashin
  2009-09-01 20:31       ` Bruno Prémont
@ 2009-09-02 19:55       ` Andrew Morton
  2009-09-02 20:16         ` Ed Cashin
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2009-09-02 19:55 UTC (permalink / raw)
  To: Ed Cashin; +Cc: bonbons, ecashin, apw, linux-kernel

On Tue, 1 Sep 2009 15:15:20 -0400
Ed Cashin <ecashin@coraid.com> wrote:

> The patch looks fine to me.

I translated that into Acked-by:.

> I don't think it should go in my aoe tree for linux-next, since the
> patch addresses a regression.
> 
> Based on the series file in mmotm, I don't think this patch is in mm
> at the moment.  Andrew Morton, do you think this patch should go
> through your mm tree?

I have it now.  Please check that the below is the right patch and that
my cobbled-together changelog makes sense (if not, please send
replacement changelog and/or patch).


From: Andy Whitcroft <apw@canonical.com>

When using the aoe driver we see an oops triggered by use of an
incorrectly initialised request_queue object:

  [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
		an uninitialized object, something is seriously wrong.
  [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
  [ 2645.959107] Call Trace:
  [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
  [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
  [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
  [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]

It seems this driver mearly embeds a request_queue object in its main
control structure and does not attempt to initialise it.  Pull this out
and use blk_init_queue/blk_cleanup_queue to handle initialisation and
teardown of this object.

Bruno bisected this regression down to

  cd43e26f071524647e660706b784ebcbefbd2e44

  block: Expose stacked device queues in sysfs

"This seems to generate /sys/block/$device/queue and its contents for
 everyone who is using queues, not just for those queues that have a
 non-NULL queue->request_fn."

Addresses http://bugs.launchpad.net/bugs/410198
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Ed Cashin <ecashin@coraid.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Bruno Premont <bonbons@linux-vserver.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/aoe/aoe.h    |    2 +-
 drivers/block/aoe/aoeblk.c |    6 +++---
 drivers/block/aoe/aoedev.c |   11 ++++++++++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff -puN drivers/block/aoe/aoe.h~aoe-ensure-we-initialise-the-request_queue-correctly drivers/block/aoe/aoe.h
--- a/drivers/block/aoe/aoe.h~aoe-ensure-we-initialise-the-request_queue-correctly
+++ a/drivers/block/aoe/aoe.h
@@ -155,7 +155,7 @@ struct aoedev {
 	u16 fw_ver;		/* version of blade's firmware */
 	struct work_struct work;/* disk create work struct */
 	struct gendisk *gd;
-	struct request_queue blkq;
+	struct request_queue *blkq;
 	struct hd_geometry geo; 
 	sector_t ssize;
 	struct timer_list timer;
diff -puN drivers/block/aoe/aoeblk.c~aoe-ensure-we-initialise-the-request_queue-correctly drivers/block/aoe/aoeblk.c
--- a/drivers/block/aoe/aoeblk.c~aoe-ensure-we-initialise-the-request_queue-correctly
+++ a/drivers/block/aoe/aoeblk.c
@@ -264,8 +264,8 @@ aoeblk_gdalloc(void *vp)
 		goto err_disk;
 	}
 
-	blk_queue_make_request(&d->blkq, aoeblk_make_request);
-	if (bdi_init(&d->blkq.backing_dev_info))
+	blk_queue_make_request(d->blkq, aoeblk_make_request);
+	if (bdi_init(&d->blkq->backing_dev_info))
 		goto err_mempool;
 	spin_lock_irqsave(&d->lock, flags);
 	gd->major = AOE_MAJOR;
@@ -276,7 +276,7 @@ aoeblk_gdalloc(void *vp)
 	snprintf(gd->disk_name, sizeof gd->disk_name, "etherd/e%ld.%d",
 		d->aoemajor, d->aoeminor);
 
-	gd->queue = &d->blkq;
+	gd->queue = d->blkq;
 	d->gd = gd;
 	d->flags &= ~DEVFL_GDALLOC;
 	d->flags |= DEVFL_UP;
diff -puN drivers/block/aoe/aoedev.c~aoe-ensure-we-initialise-the-request_queue-correctly drivers/block/aoe/aoedev.c
--- a/drivers/block/aoe/aoedev.c~aoe-ensure-we-initialise-the-request_queue-correctly
+++ a/drivers/block/aoe/aoedev.c
@@ -113,6 +113,7 @@ aoedev_freedev(struct aoedev *d)
 	if (d->bufpool)
 		mempool_destroy(d->bufpool);
 	skbpoolfree(d);
+	blk_cleanup_queue(d->blkq);
 	kfree(d);
 }
 
@@ -202,6 +203,7 @@ aoedev_by_sysminor_m(ulong sysminor)
 {
 	struct aoedev *d;
 	ulong flags;
+	struct request_queue *rq;
 
 	spin_lock_irqsave(&devlist_lock, flags);
 
@@ -210,9 +212,13 @@ aoedev_by_sysminor_m(ulong sysminor)
 			break;
 	if (d)
 		goto out;
+	rq = blk_init_queue(NULL, NULL);
+	if (!rq)
+		goto out;
 	d = kcalloc(1, sizeof *d, GFP_ATOMIC);
 	if (!d)
-		goto out;
+		goto out_rq;
+	d->blkq = rq;
 	INIT_WORK(&d->work, aoecmd_sleepwork);
 	spin_lock_init(&d->lock);
 	skb_queue_head_init(&d->sendq);
@@ -234,6 +240,9 @@ aoedev_by_sysminor_m(ulong sysminor)
  out:
 	spin_unlock_irqrestore(&devlist_lock, flags);
 	return d;
+ out_rq:
+	blk_cleanup_queue(rq);
+	goto out;
 }
 
 static void
_


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/1] aoe: ensure we initialise the request_queue correctly
  2009-09-02 19:55       ` Andrew Morton
@ 2009-09-02 20:16         ` Ed Cashin
  0 siblings, 0 replies; 14+ messages in thread
From: Ed Cashin @ 2009-09-02 20:16 UTC (permalink / raw)
  To: akpm, ecashin, bonbons, apw, linux-kernel

On Wed Sep  2 15:55:47 EDT 2009, akpm@linux-foundation.org wrote:
> On Tue, 1 Sep 2009 15:15:20 -0400
> Ed Cashin <ecashin@coraid.com> wrote:
> 
> > The patch looks fine to me.
> 
> I translated that into Acked-by:.
> 
> > I don't think it should go in my aoe tree for linux-next, since the
> > patch addresses a regression.
> > 
> > Based on the series file in mmotm, I don't think this patch is in mm
> > at the moment.  Andrew Morton, do you think this patch should go
> > through your mm tree?
> 
> I have it now.  Please check that the below is the right patch and that
> my cobbled-together changelog makes sense (if not, please send
> replacement changelog and/or patch).

Thanks.  It looks good to me.

-- 
  Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-09-02 20:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-21 16:41 [PATCH 0/1] aoe: ensure we initialise the request_queue correctly Andy Whitcroft
2009-08-21 16:41 ` [PATCH 1/1] " Andy Whitcroft
2009-08-21 18:58 ` [PATCH 0/1] " Ed Cashin
2009-08-22  9:21 ` Bruno Prémont
2009-08-22 21:52   ` Rafael J. Wysocki
2009-08-23  9:00     ` Bruno Prémont
2009-08-23 20:16       ` Bruno Prémont
2009-08-24 14:27 ` Ed Cashin
2009-08-29 13:43   ` Bruno Prémont
2009-09-01 19:15     ` Ed Cashin
2009-09-01 20:31       ` Bruno Prémont
2009-09-02 13:31         ` Ed Cashin
2009-09-02 19:55       ` Andrew Morton
2009-09-02 20:16         ` Ed Cashin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.