linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
@ 2003-06-03 18:33 Lou Langholtz
  2003-06-03 19:07 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Lou Langholtz @ 2003-06-03 18:33 UTC (permalink / raw)
  To: linux-kernel

What am I missing (or is something in the block handling code actually 
seriously wrong)?

In linux-2.5.70/drivers/block/genhd.c the add_disk(disk) logic seems 
wrong because if disk->queue is shared by a few disk's, then that shared 
queue's elevator is re-registered via the call to 
elv_register_queue(disk) which calls down to 
kobject_register(disk->queue->elevator.kobj). Or perhaps the block 
handling logic was changed such that disks don't share the same 
request_queue anymore. If so, then a few drivers (like nbd) need to be 
updated to use a seperate request_queue per disk. If the block handling 
code wasn't however changed this way though and struct gendisk objects 
can still share the same request_queue then it seems a lot of other 
logic todo with elv_register_queue() is also scrogged. Like shouldn't 
disk->kobj.parent be set to kobject_get(&disk->queue->elevator.kobj) 
instead of the reverse that the 2.5.70 code does? If so, then all the 
deallocation code (called from del_gendisk(disk)) needs to be reversed 
and changed around too. Seems like so much wrong logic that it's my 
reading of the code that's at fault instead. Help!?

CC me on any email as I don't read the lkml regularly. ldl@aros.net

Thanks!!

Louis D. Langholtz


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-03 18:33 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) Lou Langholtz
@ 2003-06-03 19:07 ` Andrew Morton
  2003-06-04  0:29   ` Lou Langholtz
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-06-03 19:07 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: linux-kernel

Lou Langholtz <ldl@aros.net> wrote:
>
> Or perhaps the block 
> handling logic was changed such that disks don't share the same 
> request_queue anymore. If so, then a few drivers (like nbd) need to be 
> updated to use a seperate request_queue per disk.

The ramdisk driver was recently changed to do exactly this.  From what
you say it appears that nbd needs the same treatment.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-03 19:07 ` Andrew Morton
@ 2003-06-04  0:29   ` Lou Langholtz
  2003-06-04  0:56     ` viro
  2003-06-04  1:00     ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Lou Langholtz @ 2003-06-04  0:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:

>Lou Langholtz <ldl@aros.net> wrote:
>  
>
>>Or perhaps the block 
>>handling logic was changed such that disks don't share the same 
>>request_queue anymore. If so, then a few drivers (like nbd) need to be 
>>updated to use a seperate request_queue per disk.
>>    
>>
>
>The ramdisk driver was recently changed to do exactly this.  From what
>you say it appears that nbd needs the same treatment.
>  
>
I noticed that too but thought surely that couldn't be why the rd driver 
was changes. Cause... then it would seem via 'grep blk_init_queue 
drivers/block/*.c' that most of the block drivers need to be changed. 
And having a request_queue structure for every disk that's often (in 
these drivers) every minor device, seems like a lot of unneeded memory 
usage too. I'm afraid to ask this, but are you sure that each disk 
really is supposed to have its own request queue now? That seems less 
sensible than inverting the kobject parenting logic so that the 
request_queue.elevator kobject is the parent of the disk kobject. After 
all, makes more sense for multiple gen_disk objects to belong to the 
same elevator than for multiple elevators to belong to the same gen_disk 
no???

Anyways.... thanks for setting me straight ;-)

Lou


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-04  0:29   ` Lou Langholtz
@ 2003-06-04  0:56     ` viro
  2003-06-04 16:08       ` Patrick Mochel
  2003-06-04  1:00     ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: viro @ 2003-06-04  0:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

On Tue, Jun 03, 2003 at 06:29:19PM -0600, Lou Langholtz wrote:
> Andrew Morton wrote:
> >The ramdisk driver was recently changed to do exactly this.  From what
> >you say it appears that nbd needs the same treatment.

This is utterly ridiculous.  I realize that sysfs is fashionable, but
it should reflect the existing logics, not the other way round.

Guys, there are very valid reasons to have a queue shared by several
disks.  E.g. it can very well act as a serialization mechanism - and
as the matter of fact does in quite a few drivers.

What the fuck is going on?  It's what, the fifth case when we have
somebody export something in sysfs, ignore the lifetime rules for
objects and go "reality doesn't match my theory, too bad for reality"?

Linus, could we *please* put a moratorium on use of sysfs unless the
persons using it had proven that they understand how the objects they
export are used in the tree?  Enough is enough - we already have netdev
drivers to deal with and have to do that *now* thanks to blind sysfs
export.  Now we get random bdev stuff on top of that?  Absofuckinglutely
marvelous.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-04  0:29   ` Lou Langholtz
  2003-06-04  0:56     ` viro
@ 2003-06-04  1:00     ` Andrew Morton
  2003-06-04  1:06       ` viro
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2003-06-04  1:00 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: linux-kernel, Jens Axboe, Greg KH, Patrick Mochel

Lou Langholtz <ldl@aros.net> wrote:
>
> >The ramdisk driver was recently changed to do exactly this.  From what
>  >you say it appears that nbd needs the same treatment.
>  >  
>  >
>  I noticed that too but thought surely that couldn't be why the rd driver 
>  was changes. Cause... then it would seem via 'grep blk_init_queue 
>  drivers/block/*.c' that most of the block drivers need to be changed. 
>  And having a request_queue structure for every disk that's often (in 
>  these drivers) every minor device, seems like a lot of unneeded memory 
>  usage too. I'm afraid to ask this, but are you sure that each disk 
>  really is supposed to have its own request queue now? That seems less 
>  sensible than inverting the kobject parenting logic so that the 
>  request_queue.elevator kobject is the parent of the disk kobject. After 
>  all, makes more sense for multiple gen_disk objects to belong to the 
>  same elevator than for multiple elevators to belong to the same gen_disk 
>  no???

According to Al, we have a significant number of drivers in the tree in
which multiple gendisks shared the same queue.  Sometimes because that's a
logical mapping onto how the hardware behaves.

As far as I know, the new queue-per-gendisk requirement is purely because
of this sysfs hierarchy problem.

So yes, perhaps you are right and we need to rethink things from the sysfs
angle rather than reworking the drivers.  Along the lines which you
mention.

This isn't my area.  Perhaps Jens, Greg or Pat could comment?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-04  1:00     ` Andrew Morton
@ 2003-06-04  1:06       ` viro
  2003-06-04 16:07         ` Lou Langholtz
  0 siblings, 1 reply; 8+ messages in thread
From: viro @ 2003-06-04  1:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lou Langholtz, linux-kernel, Jens Axboe, Greg KH, Patrick Mochel

On Tue, Jun 03, 2003 at 06:00:02PM -0700, Andrew Morton wrote:
 
> According to Al, we have a significant number of drivers in the tree in
> which multiple gendisks shared the same queue.  Sometimes because that's a
> logical mapping onto how the hardware behaves.

... on top of that, we have queues with no gendisk ever registered.
SCSI tapes, for one things.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-04  1:06       ` viro
@ 2003-06-04 16:07         ` Lou Langholtz
  0 siblings, 0 replies; 8+ messages in thread
From: Lou Langholtz @ 2003-06-04 16:07 UTC (permalink / raw)
  To: viro; +Cc: Andrew Morton, linux-kernel, Jens Axboe, Greg KH, Patrick Mochel

allviro@parcelfarce.linux.theplanet.co.uk wrote:

>. . .
>... on top of that, we have queues with no gendisk ever registered.
>SCSI tapes, for one things.
>  
>
This may be unrelated but I think not... kobject refcounting for the 
elevators seems wrong in the current implementation if we're saying that 
in fact the request_queue should be sharable by zero or more devices. 
Here's extra debugging output from the nbd module I've been hacking on 
that's the result of doing an rmmod operation on the module. Note the 
BUG notice and the values of "q.e.ref#" that precede the BUG. "q.e.ref" 
is just the value of atomic_read(&disk->queue->elevator.kobj.refcount)). 
Running through zero as we go through calling del_gendisk(disk) for each 
disk belonging to each nbd_devs[] just seems wrong but this is what 
happens when the request_queue is shared by all the nbd_devs:

Jun  2 23:09:36 cyprus kernel: nbd: nbd_exit called.
Jun  2 23:09:36 cyprus kernel: nb0: calling del_gendisk(d56f1dc0): 43 
0-0 ref#=4 q=e3268560 q.e.ref#=2
Jun  2 23:09:36 cyprus kernel: nb1: calling del_gendisk(d56f1cc0): 43 
1-1 ref#=4 q=e3268560 q.e.ref#=0
Jun  2 23:09:36 cyprus kernel: nb2: calling del_gendisk(d56f3c80): 43 
2-2 ref#=4 q=e3268560 q.e.ref#=-2
Jun  2 23:09:36 cyprus kernel: nb3: calling del_gendisk(d56f1ac0): 43 
3-3 ref#=4 q=e3268560 q.e.ref#=-4
Jun  2 23:09:36 cyprus kernel: nb4: calling del_gendisk(d56f15c0): 43 
4-4 ref#=4 q=e3268560 q.e.ref#=-6
Jun  2 23:09:36 cyprus kernel: nb5: calling del_gendisk(d56f16c0): 43 
5-5 ref#=4 q=e3268560 q.e.ref#=-8
Jun  2 23:09:36 cyprus kernel: nb6: calling del_gendisk(d56f3980): 43 
6-6 ref#=4 q=e3268560 q.e.ref#=-10
Jun  2 23:09:36 cyprus kernel: ------------[ cut here ]------------
Jun  2 23:09:36 cyprus kernel: kernel BUG at include/linux/dcache.h:271!
Jun  2 23:09:36 cyprus kernel: invalid operand: 0000 [#1]
Jun  2 23:09:36 cyprus kernel: CPU:    0
Jun  2 23:09:36 cyprus kernel: EIP:    0060:[<c0176b1d>]    Tainted: GF
Jun  2 23:09:36 cyprus kernel: EFLAGS: 00210246
Jun  2 23:09:36 cyprus kernel: EIP is at sysfs_remove_dir+0x1d/0x13f
Jun  2 23:09:36 cyprus kernel: eax: 00000000   ebx: e32685a4   ecx: 
c027a0a0   edx: 00000000
Jun  2 23:09:36 cyprus kernel: esi: 00000006   edi: d5cad5c0   ebp: 
d8eadec0   esp: d8eadea8
Jun  2 23:09:36 cyprus kernel: ds: 007b   es: 007b   ss: 0068
Jun  2 23:09:36 cyprus kernel: Process rmmod (pid: 1732, 
threadinfo=d8eac000 task=d6224d80)
Jun  2 23:09:36 cyprus kernel: Stack: d5cb9340 dfd6fbc0 dfd6fbe8 
e32685a4 00000006 d56f3a48 d8eaded8 c0204e53
Jun  2 23:09:36 cyprus kernel:        e32685a4 c047e740 e32685a4 
e32685a4 d8eadee8 c0204e94 e32685a4 d56f3980
Jun  2 23:09:36 cyprus kernel:        d8eadef8 c0275eae e32685a4 
d56f3980 d8eadf0c c0279bd4 d56f3980 d56f3980
Jun  2 23:09:36 cyprus kernel: Call Trace:
Jun  2 23:09:36 cyprus kernel:  [<e32685a4>] nbd_queue+0x44/0x14c [nbd]
Jun  2 23:09:36 cyprus kernel:  [<c0204e53>] kobject_del+0x43/0x70
Jun  2 23:09:36 cyprus kernel:  [<e32685a4>] nbd_queue+0x44/0x14c [nbd]
Jun  2 23:09:36 cyprus last message repeated 2 times
Jun  2 23:09:36 cyprus kernel:  [<c0204e94>] kobject_unregister+0x14/0x20
Jun  2 23:09:36 cyprus kernel:  [<e32685a4>] nbd_queue+0x44/0x14c [nbd]
Jun  2 23:09:36 cyprus kernel:  [<c0275eae>] elv_unregister_queue+0x1e/0x40
Jun  2 23:09:36 cyprus kernel:  [<e32685a4>] nbd_queue+0x44/0x14c [nbd]
Jun  2 23:09:36 cyprus kernel:  [<c0279bd4>] unlink_gendisk+0x14/0x40
Jun  2 23:09:36 cyprus kernel:  [<c0175951>] del_gendisk+0x61/0x110
Jun  2 23:09:36 cyprus kernel:  [<e32617a4>] +0x644/0x7400 [nbd]
Jun  2 23:09:36 cyprus kernel:  [<e325efec>] nbd_exit+0x7c/0x157 [nbd]
Jun  2 23:09:36 cyprus kernel:  [<e3268560>] nbd_queue+0x0/0x14c [nbd]
Jun  2 23:09:36 cyprus kernel:  [<e3268800>] +0x0/0x140 [nbd]
Jun  2 23:09:36 cyprus kernel:  [<c012c30e>] sys_delete_module+0x12e/0x170
Jun  2 23:09:36 cyprus kernel:  [<e3268800>] +0x0/0x140 [nbd]
Jun  2 23:09:36 cyprus kernel:  [<c010c73c>] do_IRQ+0xcc/0xe0
Jun  2 23:09:36 cyprus kernel:  [<c010ac4d>] sysenter_past_esp+0x52/0x71
Jun  2 23:09:36 cyprus kernel:
Jun  2 23:09:36 cyprus kernel: Code: 0f 0b 0f 01 db 9a 3c c0 ff 07 85 ff 
0f 84 08 01 00 00 8b 47

Now if the whole block system has gotten redesigned to the point where 
each request_queue could be run through in multi-threaded fassion such 
that waiting within the queue handling q->request_fn won't block other 
running q->request_fn's then it makes sense in some block device driver 
cases to use a request_queue & lock per device but I'm still concerned 
by the direction of parenting of associated kobjects between gen_disk's 
and request_queue.elevator's. Should gen_disk's be the parent's of 
elevator's or the other way around? Is it ever the case that a driver 
needs multiple elevators for the same gen_disk? Seems like allviro is 
establishing that some drivers need multiple gen_disk's to use the same 
elevator.

Looking forward to more direction on this. Thanks!!!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!)
  2003-06-04  0:56     ` viro
@ 2003-06-04 16:08       ` Patrick Mochel
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Mochel @ 2003-06-04 16:08 UTC (permalink / raw)
  To: viro; +Cc: Linus Torvalds, Andrew Morton, linux-kernel


> This is utterly ridiculous.  I realize that sysfs is fashionable, but
> it should reflect the existing logics, not the other way round.

I completely agree. When converting a subsystem to use the kobject model,
there are essentially three paths to choose:

- Convert the subsystem to use the kobject semantics 
- Convert the kobject model to tolerate the semantics of the subsystem 
- Something else

The first is very attractive because it requires little effort to get an
appealing layout in the filesystem and export some attributes for a class
of objects. It's worked on some cases so far, but it's also proven to not
work for all classes of objects, _especially_ when they start interacting
with each other.

Converting the kobject model to tolerate other semantics is fine, and I'm
happy to see it happen. However, it must not make special cases for random
subsystems. Which is where the 'something else' comes in.

The object lifetime and interaction rules must be understood for each
object that is converted to use kobjects. Once they are, as well as their
violations of the current kobject semantics, we can work to improve both.


The fact that kobjects make it easy to shoot ourselves in the head is bad.  
But, the fact that they make long-existing bugs easier to trigger and
force people to deal with them and fix them is a good thing, in the long
run. There's a lot of code out there, much of which not a lot of people
understand. If we're forced to audit it, understand it, and fix it, then
we all win in the end.

HOWEVER, the current development model does not jive very well with the
fact that we're quickly approaching 2.6. The temporary instability
introduced by a kobject transition does not help the immediate cause. If
conversions are going to happen in the near future, they need to be
completely understood, well-thought out, and stable.

I realize that it helps to do it gradually, and not everyone is as
talented as Al in doing atomic large-scope gradual sets of patches. Also,
that many people have the time/energy to continue working on conversions
and kobject infrastructure. So, as a harbor for this continued work, and a
pre-cursor to 2.7, I've set up a tree at
bk://ldm.bkbits.net/linux-2.5-sysfs People are welcome to send me patches,
which I'll keep there and push forward once they become stable and/or
necessary.



	-pat


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-06-04 15:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-03 18:33 2.5.70 add_disk(disk) re-registering disk->queue->elevator.kobj (bug?!) Lou Langholtz
2003-06-03 19:07 ` Andrew Morton
2003-06-04  0:29   ` Lou Langholtz
2003-06-04  0:56     ` viro
2003-06-04 16:08       ` Patrick Mochel
2003-06-04  1:00     ` Andrew Morton
2003-06-04  1:06       ` viro
2003-06-04 16:07         ` Lou Langholtz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).