All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
@ 2011-01-12 17:34 Valdis.Kletnieks
  2011-01-13  0:23 ` Milan Broz
  0 siblings, 1 reply; 44+ messages in thread
From: Valdis.Kletnieks @ 2011-01-12 17:34 UTC (permalink / raw)
  To: Alexander Viro, Neil Brown; +Cc: linux-kernel, linux-fsdevel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

Seen in a boot of yesterday's linux-next.   The 'W' flag was due to the
already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups.  I wonder
if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
dracut before and after the warning).

[   16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
[   16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found.  Falling back to direct link creation.
[   16.912627] ------------[ cut here ]------------
[   16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
[   16.912638] Hardware name: Latitude E6500
[   16.912640] Modules linked in:
[   16.912644] Pid: 1920, comm: lvm Tainted: G        W   2.6.37-next-20110111 #1
[   16.912647] Call Trace:
[   16.912653]  [<ffffffff8103a989>] ? warn_slowpath_common+0x80/0x98
[   16.912657]  [<ffffffff8103a9b6>] ? warn_slowpath_null+0x15/0x17
[   16.912660]  [<ffffffff8111fb03>] ? bd_link_disk_holder+0x92/0x1ac
[   16.912665]  [<ffffffff8139b875>] ? open_dev+0x76/0x9e
[   16.912668]  [<ffffffff8139bc9d>] ? dm_get_device+0x147/0x1dc
[   16.912672]  [<ffffffff8139d088>] ? linear_ctr+0x98/0xd4
[   16.912676]  [<ffffffff8139c289>] ? dm_table_add_target+0x149/0x1d4
[   16.912679]  [<ffffffff8139e57c>] ? table_load+0xf0/0x27b
[   16.912683]  [<ffffffff8139e48c>] ? table_load+0x0/0x27b
[   16.912686]  [<ffffffff8139f326>] ? ctl_ioctl+0x1c6/0x21e
[   16.912690]  [<ffffffff8139f38c>] ? dm_ctl_ioctl+0xe/0x12
[   16.912695]  [<ffffffff8110258e>] ? do_vfs_ioctl+0x4c4/0x505
[   16.912699]  [<ffffffff81102626>] ? sys_ioctl+0x57/0x95
[   16.912703]  [<ffffffff8100253b>] ? system_call_fastpath+0x16/0x1b
[   16.912706] ---[ end trace 4eaa2a86a8e2da24 ]---
[   16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found.  Falling back to direct link creation.


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-12 17:34 linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() Valdis.Kletnieks
@ 2011-01-13  0:23 ` Milan Broz
  2011-01-13  2:19   ` Jun'ichi Nomura
  0 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13  0:23 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel,
	linux-raid, device-mapper development, Tejun Heo

On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote:
> Seen in a boot of yesterday's linux-next.   The 'W' flag was due to the
> already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
> Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups.  I wonder
> if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
> dracut before and after the warning).
> 
> [   16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
> [   16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found.  Falling back to direct link creation.
> [   16.912627] ------------[ cut here ]------------
> [   16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()

That seems to be
WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
added in patch
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895
"block: simplify holder symlink handling"

dm linear just claims device in table constructor, I don't think it is bug in DM code.

> [   16.912638] Hardware name: Latitude E6500
> [   16.912640] Modules linked in:
> [   16.912644] Pid: 1920, comm: lvm Tainted: G        W   2.6.37-next-20110111 #1
> [   16.912647] Call Trace:
> [   16.912653]  [<ffffffff8103a989>] ? warn_slowpath_common+0x80/0x98
> [   16.912657]  [<ffffffff8103a9b6>] ? warn_slowpath_null+0x15/0x17
> [   16.912660]  [<ffffffff8111fb03>] ? bd_link_disk_holder+0x92/0x1ac
> [   16.912665]  [<ffffffff8139b875>] ? open_dev+0x76/0x9e
> [   16.912668]  [<ffffffff8139bc9d>] ? dm_get_device+0x147/0x1dc
> [   16.912672]  [<ffffffff8139d088>] ? linear_ctr+0x98/0xd4
> [   16.912676]  [<ffffffff8139c289>] ? dm_table_add_target+0x149/0x1d4
> [   16.912679]  [<ffffffff8139e57c>] ? table_load+0xf0/0x27b
> [   16.912683]  [<ffffffff8139e48c>] ? table_load+0x0/0x27b
> [   16.912686]  [<ffffffff8139f326>] ? ctl_ioctl+0x1c6/0x21e
> [   16.912690]  [<ffffffff8139f38c>] ? dm_ctl_ioctl+0xe/0x12
> [   16.912695]  [<ffffffff8110258e>] ? do_vfs_ioctl+0x4c4/0x505
> [   16.912699]  [<ffffffff81102626>] ? sys_ioctl+0x57/0x95
> [   16.912703]  [<ffffffff8100253b>] ? system_call_fastpath+0x16/0x1b
> [   16.912706] ---[ end trace 4eaa2a86a8e2da24 ]---
> [   16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found.  Falling back to direct link creation.
> 

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13  0:23 ` Milan Broz
@ 2011-01-13  2:19   ` Jun'ichi Nomura
  2011-01-13 11:06     ` Tejun Heo
  2011-01-13 17:21     ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo
  0 siblings, 2 replies; 44+ messages in thread
From: Jun'ichi Nomura @ 2011-01-13  2:19 UTC (permalink / raw)
  To: Milan Broz, Tejun Heo
  Cc: Valdis.Kletnieks, Alexander Viro, Neil Brown, linux-kernel,
	linux-fsdevel, linux-raid, device-mapper development

On 01/13/11 09:23, Milan Broz wrote:
> On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote:
>> Seen in a boot of yesterday's linux-next.   The 'W' flag was due to the
>> already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
>> Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups.  I wonder
>> if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
>> dracut before and after the warning).
>>
>> [   16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
>> [   16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found.  Falling back to direct link creation.
>> [   16.912627] ------------[ cut here ]------------
>> [   16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
> 
> That seems to be
> WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
> added in patch
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895
> "block: simplify holder symlink handling"
> 
> dm linear just claims device in table constructor, I don't think it is bug in DM code.

The patch assumes only one holder disk for a claimed dev, which is not true.
E.g. if there are multiple LVs on a PV.

In addition to that, since claiming is done in table constructor,
there can be 2 claim instances for a slave/holder pair at a time
when you load a table while there's already an active one.
E.g. if you do lvresize.
We need consideration for this, too.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13  2:19   ` Jun'ichi Nomura
@ 2011-01-13 11:06     ` Tejun Heo
  2011-01-13 11:26       ` [dm-devel] " Milan Broz
  2011-01-13 17:21     ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo
  1 sibling, 1 reply; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 11:06 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Milan Broz, Valdis.Kletnieks, Alexander Viro, Neil Brown,
	linux-kernel, linux-fsdevel, linux-raid,
	device-mapper development

Hello,

On Thu, Jan 13, 2011 at 11:19:21AM +0900, Jun'ichi Nomura wrote:
> > "block: simplify holder symlink handling"
> > 
> > dm linear just claims device in table constructor, I don't think it is bug in DM code.
> 
> The patch assumes only one holder disk for a claimed dev, which is not true.
> E.g. if there are multiple LVs on a PV.
> 
> In addition to that, since claiming is done in table constructor,
> there can be 2 claim instances for a slave/holder pair at a time
> when you load a table while there's already an active one.
> E.g. if you do lvresize.
> We need consideration for this, too.

Urgh... gees, so there actually was a user using all that cruft.
Sorry about the breakage, I'll see how multiple symlinks can be
restored.  I'm curious why this was added at all tho.  What was the
rationalization?  It's not like two subsystems can share the same
block device so marking the currently owning subsystem should have
been enough at the block layer.  There is no reason for block devices
to present information which is of no use to itself.  All that's
necessary is "this is taken by dm or md for more information, query
those".  dm and md need their own conf/representation layer anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 11:06     ` Tejun Heo
@ 2011-01-13 11:26       ` Milan Broz
  2011-01-13 12:27         ` Karel Zak
  0 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 11:26 UTC (permalink / raw)
  To: device-mapper development
  Cc: Tejun Heo, Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel,
	linux-raid, Alexander Viro, linux-fsdevel, Karel Zak

On 01/13/2011 12:06 PM, Tejun Heo wrote:
> Urgh... gees, so there actually was a user using all that cruft.
> Sorry about the breakage, I'll see how multiple symlinks can be
> restored.  I'm curious why this was added at all tho.  What was the
> rationalization?  It's not like two subsystems can share the same
> block device so marking the currently owning subsystem should have
> been enough at the block layer.  There is no reason for block devices
> to present information which is of no use to itself.  All that's
> necessary is "this is taken by dm or md for more information, query
> those".  dm and md need their own conf/representation layer anyway.

I am not sure if I understand it correctly, but multiple holders/slaves
links are very useful in userspace to present device dependences.

We just implemented lsblk command in util-linux which simple prints
tree according to these links, so dependendes of DM/MD/whatever devices
can be printed without using any system specific callbacks, just using
sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html

Whatever changes are needed, please keep this functionality, it can be useful.

Thanks,
Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 11:26       ` [dm-devel] " Milan Broz
@ 2011-01-13 12:27         ` Karel Zak
  2011-01-13 13:12           ` Tejun Heo
  0 siblings, 1 reply; 44+ messages in thread
From: Karel Zak @ 2011-01-13 12:27 UTC (permalink / raw)
  To: Milan Broz
  Cc: device-mapper development, Tejun Heo, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel

On Thu, Jan 13, 2011 at 12:26:14PM +0100, Milan Broz wrote:
> On 01/13/2011 12:06 PM, Tejun Heo wrote:
> > Urgh... gees, so there actually was a user using all that cruft.
> > Sorry about the breakage, I'll see how multiple symlinks can be
> > restored.  I'm curious why this was added at all tho.  What was the
> > rationalization?  It's not like two subsystems can share the same
> > block device so marking the currently owning subsystem should have
> > been enough at the block layer.  There is no reason for block devices
> > to present information which is of no use to itself.  All that's
> > necessary is "this is taken by dm or md for more information, query
> > those".  dm and md need their own conf/representation layer anyway.
> 
> I am not sure if I understand it correctly, but multiple holders/slaves
> links are very useful in userspace to present device dependences.
> 
> We just implemented lsblk command in util-linux which simple prints
> tree according to these links, so dependendes of DM/MD/whatever devices
> can be printed without using any system specific callbacks, just using
> sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html

 We also use holders/slaves links in libblkid to evaluate dependencies
 between devices (since 2008).

> Whatever changes are needed, please keep this functionality, it can be useful.

 Definitely.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 12:27         ` Karel Zak
@ 2011-01-13 13:12           ` Tejun Heo
  2011-01-13 13:26             ` Karel Zak
  0 siblings, 1 reply; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 13:12 UTC (permalink / raw)
  To: Karel Zak
  Cc: Milan Broz, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel

Hello,

On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote:
>  We also use holders/slaves links in libblkid to evaluate dependencies
>  between devices (since 2008).
> 
> > Whatever changes are needed, please keep this functionality, it
> > can be useful.
> 
>  Definitely.

Yeah, sure, it's not like I can afford to avoid fixing it at this
point anyway, but I still want to point out it's at the wrong layer
and shouldn't have been added like this, really.  If you want blkid to
identify it, the proper thing to do would be querying blk device for
the claimer and then use claimer-specific method to query them.  It's
not like the current method would make sense for btrfs or whatnot.

So, yeap, I definitely am fixing it but let's not do things like this
in the future.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 13:12           ` Tejun Heo
@ 2011-01-13 13:26             ` Karel Zak
  2011-01-13 13:37               ` Tejun Heo
  0 siblings, 1 reply; 44+ messages in thread
From: Karel Zak @ 2011-01-13 13:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Milan Broz, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

On Thu, Jan 13, 2011 at 02:12:16PM +0100, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote:
> >  We also use holders/slaves links in libblkid to evaluate dependencies
> >  between devices (since 2008).
> > 
> > > Whatever changes are needed, please keep this functionality, it
> > > can be useful.
> > 
> >  Definitely.
> 
> Yeah, sure, it's not like I can afford to avoid fixing it at this
> point anyway, but I still want to point out it's at the wrong layer
> and shouldn't have been added like this, really.  If you want blkid to
> identify it, the proper thing to do would be querying blk device for
> the claimer and then use claimer-specific method to query them.  It's

 It seems that dependencies (holders/slaves) between devices is pretty
 generic thing. Why do you think that we need claimer-specific method?
 The /sys filesystem is better that ictls in many ways.

> not like the current method would make sense for btrfs or whatnot.

 Could you be more verbose, please?

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 13:26             ` Karel Zak
@ 2011-01-13 13:37               ` Tejun Heo
  2011-01-13 13:52                 ` Tejun Heo
  2011-01-13 13:58                 ` Milan Broz
  0 siblings, 2 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 13:37 UTC (permalink / raw)
  To: Karel Zak
  Cc: Milan Broz, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

Hello, Karel.

On Thu, Jan 13, 2011 at 02:26:37PM +0100, Karel Zak wrote:
> > Yeah, sure, it's not like I can afford to avoid fixing it at this
> > point anyway, but I still want to point out it's at the wrong layer
> > and shouldn't have been added like this, really.  If you want blkid to
> > identify it, the proper thing to do would be querying blk device for
> > the claimer and then use claimer-specific method to query them.  It's
> 
>  It seems that dependencies (holders/slaves) between devices is pretty
>  generic thing. Why do you think that we need claimer-specific method?
>  The /sys filesystem is better that ictls in many ways.

Because sysfs is already complex enough without representing
dependency information without strictly defined strcture in it.  It's
for exporting the device tree to users.  We have developed interesting
ways to exploit it but it generally has proven to be a bad idea to add
symlinks without clearly defined structure beneath it.  People end up
using them differently and often they don't understand how the kobject
sysfs things are wired and it ends up with a lot of cruft exporting
something which is designed by small number of people without really
considering what's going on in the rest of the hierarchy.

> > not like the current method would make sense for btrfs or whatnot.
> 
>  Could you be more verbose, please?

If the symlinkery was something which could properly replace
configuration and query, sure, let's standardize it and share it among
all possible claimers of block devices, but it's not.  md, dm, btrfs
need to export and process way more information than can be represnted
in these symlinks and it's there more as a pretty thing than anything
which is actually necessary and useful.  And the original
implementation directly played with kobjects (in an unnecessarily
complicated way too) and allowed any kobject to be linked.  Things
like that just never end up pretty.

So, just don't do it.  Sysfs is for device hierarchy.  Don't try to
shove pretty looking things there (unless it's something widely agreed
on and necessary, of course).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 13:37               ` Tejun Heo
@ 2011-01-13 13:52                 ` Tejun Heo
  2011-01-13 13:58                 ` Milan Broz
  1 sibling, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 13:52 UTC (permalink / raw)
  To: Karel Zak
  Cc: Milan Broz, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

On Thu, Jan 13, 2011 at 02:37:22PM +0100, Tejun Heo wrote:
> >  It seems that dependencies (holders/slaves) between devices is pretty
> >  generic thing. Why do you think that we need claimer-specific method?
> >  The /sys filesystem is better that ictls in many ways.

Also, one more thing.  In btrfs, there's no block device for the
master device.  You can add the slaves directory somewhere and your
choice would be as good as any other's but it just will not be generic
while being superflous exactly the same way it's superflous for md and
dm devices.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 13:37               ` Tejun Heo
  2011-01-13 13:52                 ` Tejun Heo
@ 2011-01-13 13:58                 ` Milan Broz
  2011-01-13 14:11                   ` Tejun Heo
  1 sibling, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 13:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

On 01/13/2011 02:37 PM, Tejun Heo wrote:

> So, just don't do it.  Sysfs is for device hierarchy.  Don't try to
> shove pretty looking things there (unless it's something widely agreed
> on and necessary, of course).

Hi,

I think that it is exactly what holders/slaves do - displaying device
hierarchy. So application can check which underlying device are related
and ask them for more info if needed (=> with system specific call,
it can be simple sysfs attribute, ioctl, whatever).

So the only request here is to keep these symlinks correct, nothing more.
Or am I missing anything?

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 13:58                 ` Milan Broz
@ 2011-01-13 14:11                   ` Tejun Heo
  2011-01-13 14:25                     ` Milan Broz
  0 siblings, 1 reply; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 14:11 UTC (permalink / raw)
  To: Milan Broz
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

Hello,

On Thu, Jan 13, 2011 at 02:58:23PM +0100, Milan Broz wrote:
> On 01/13/2011 02:37 PM, Tejun Heo wrote:
> 
> > So, just don't do it.  Sysfs is for device hierarchy.  Don't try to
> > shove pretty looking things there (unless it's something widely agreed
> > on and necessary, of course).
> 
> I think that it is exactly what holders/slaves do - displaying device
> hierarchy. So application can check which underlying device are related
> and ask them for more info if needed (=> with system specific call,
> it can be simple sysfs attribute, ioctl, whatever).

Yeah, sure but in a completely unrestrained and non-standard way.
First of all, it wasn't even necessary to begin with and I don't
really see anyone else other than md/dm using it.  I mean, where are
you gonna you put that slaves directory?  Sure you can put it
somewhere but really it would be just that - somewhere.  All this
doesn't even matter.  It wasn't even necessary to begin with.

> So the only request here is to keep these symlinks correct, nothing more.
> Or am I missing anything?

Yeah, I'm fixing that.  Don't worry.  I just wanna say it wasn't such
a brilliant idea to add it in the first place and hope that people
would restrain from doing similar things in the future.

So, as a general rule, when in doubt, just create an attribute.  Let's
refrain from custom symlinkery in sysfs, please.  In this case too, a
holder attribute containing strings like ext[3|4], md, dm or whatnot
would have been _much_ simpler and actually more useful.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:11                   ` Tejun Heo
@ 2011-01-13 14:25                     ` Milan Broz
  2011-01-13 14:30                       ` Tejun Heo
  0 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 14:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

On 01/13/2011 03:11 PM, Tejun Heo wrote:

> So, as a general rule, when in doubt, just create an attribute.  Let's
> refrain from custom symlinkery in sysfs, please.  In this case too, a
> holder attribute containing strings like ext[3|4], md, dm or whatnot
> would have been _much_ simpler and actually more useful.

Maybe, but this was not invented in DM/MD camp:-)
Probably Kay or Greg can answer why it was done this way?

For DM it just added links to be proper user of it, see
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1

Anyway, it is /sys/block - so it represents block devices.

If btrfs internally creates some virtual _block_ device for its pool, it should
present it here too with slaves/holders. If not, why it should create any links there?

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:25                     ` Milan Broz
@ 2011-01-13 14:30                       ` Tejun Heo
  2011-01-13 14:43                         ` Kay Sievers
                                           ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 14:30 UTC (permalink / raw)
  To: Milan Broz
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

Hello,

On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
> Maybe, but this was not invented in DM/MD camp:-)
> Probably Kay or Greg can answer why it was done this way?

Let's not play the dig the history and blame game if possible.  We
(including me, of course) all did a lot of horrible things in the
past.  :-)

> For DM it just added links to be proper user of it, see
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1
>
> Anyway, it is /sys/block - so it represents block devices.
>
> If btrfs internally creates some virtual _block_ device for its pool, it should
> present it here too with slaves/holders. If not, why it should create any links there?

Yeah, that's the most bothering part for me.  The biggest customers of
bd_claim are filesystems and all these custom symlinkeries don't do
nothing for them.  It just doesn't make a whole lot of sense to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:30                       ` Tejun Heo
@ 2011-01-13 14:43                         ` Kay Sievers
  2011-01-13 15:03                           ` Milan Broz
  2011-01-13 15:59                             ` [dm-devel] " Karel Zak
  2011-01-13 14:45                         ` Theodore Tso
  2011-01-13 14:49                         ` Milan Broz
  2 siblings, 2 replies; 44+ messages in thread
From: Kay Sievers @ 2011-01-13 14:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Milan Broz, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>> Maybe, but this was not invented in DM/MD camp:-)
>> Probably Kay or Greg can answer why it was done this way?

It's not from Greg or Kay. It just appeared some day in the context of dm. :)

And yes, symlinks *look* nice and simple for the outside, but they are
not, and have all sorts of problems like non-atomic updates, make it
impossible to ever rename a device (as long as they copy the device
name), and and and .... we should not add more of this.

>> If btrfs internally creates some virtual _block_ device for its pool, it should
>> present it here too with slaves/holders. If not, why it should create any links there?
>
> Yeah, that's the most bothering part for me.  The biggest customers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them.  It just doesn't make a whole lot of sense to me.

Btrfs does not use any blockdev as the master for good reason, and it
can never map its slaves inside of /sys/block. Simple meta-blockdevs
like md/dm just don't fit into modern requirements of a filesystem
(directory snapshots, directory subvolumes, complex raids, hassle-free
resizing, ...)  -- hence btrfs is much more like a network-filesystem
mount than a stream of blocks like a disk, and does not fit at all
into this model.

Kay

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:30                       ` Tejun Heo
  2011-01-13 14:43                         ` Kay Sievers
@ 2011-01-13 14:45                         ` Theodore Tso
  2011-01-13 20:18                           ` NeilBrown
  2011-01-13 14:49                         ` Milan Broz
  2 siblings, 1 reply; 44+ messages in thread
From: Theodore Tso @ 2011-01-13 14:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Milan Broz, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel, Kay Sievers


On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote:
>> 
> 
> Yeah, that's the most bothering part for me.  The biggest customers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them.  It just doesn't make a whole lot of sense to me.
> 

Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel.

The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.)

If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system.  So what we want is a very easy way of determining, "is this device being used by the device mapper layer"?   If it is, then it's probably not the droids we are looking for.   We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it.

Can you suggest a better way doing what it is we need to do?

-- Ted


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:30                       ` Tejun Heo
  2011-01-13 14:43                         ` Kay Sievers
  2011-01-13 14:45                         ` Theodore Tso
@ 2011-01-13 14:49                         ` Milan Broz
  2011-01-14 16:35                           ` Tejun Heo
  2 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 14:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

On 01/13/2011 03:30 PM, Tejun Heo wrote:

Hi,

>> Anyway, it is /sys/block - so it represents block devices.
>>
>> If btrfs internally creates some virtual _block_ device for its pool, it should
>> present it here too with slaves/holders. If not, why it should create any links there?
> 
> Yeah, that's the most bothering part for me.  The biggest customers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them.  It just doesn't make a whole lot of sense to me.

Well, then it is not optimal for fs and should it be separated?
There is also /sys/fs now...

I think represent dependences of block devices in some generic (and exactly defined)
way is useful. So if there is something missing, lets define it properly.

The whole idea behind lsblk was practical: we responded many questions why
"device is busy" and what keeps it open in stacked subsystem.
Just try fdisk, mdadm, dmsetup table, losetup -a, kpartx, cryptsetup, etc ...

So listing block device tree using ONE interface (sysfs) to check and display
device tree for ALL subsystems seems to be good idea for me.
In this exactly defined case.

I am not saying that symlinks are perfect, just that generic interface here
is not superfluous.

Thanks,
Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:43                         ` Kay Sievers
@ 2011-01-13 15:03                           ` Milan Broz
  2011-01-14  7:38                             ` Jun'ichi Nomura
  2011-01-13 15:59                             ` [dm-devel] " Karel Zak
  1 sibling, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 15:03 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Tejun Heo, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On 01/13/2011 03:43 PM, Kay Sievers wrote:

>> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>>> Maybe, but this was not invented in DM/MD camp:-)
>>> Probably Kay or Greg can answer why it was done this way?
> 
> It's not from Greg or Kay. It just appeared some day in the context of dm. :)

ah, then sorry, I am just confused:-)
But DM does not need it for operation at all so it must had some other reason.
(We have dmsetup ls --tree using dm-ioctls for years.)

> And yes, symlinks *look* nice and simple for the outside, but they are
> not, and have all sorts of problems like non-atomic updates, make it
> impossible to ever rename a device (as long as they copy the device
> name), and and and .... we should not add more of this.

Yes, I agree, if there is better way, let's switch to that.

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:43                         ` Kay Sievers
@ 2011-01-13 15:59                             ` Karel Zak
  2011-01-13 15:59                             ` [dm-devel] " Karel Zak
  1 sibling, 0 replies; 44+ messages in thread
From: Karel Zak @ 2011-01-13 15:59 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Valdis.Kletnieks, linux-kernel, linux-raid, Jun'ichi Nomura,
	device-mapper development, Alexander Viro, Tejun Heo,
	linux-fsdevel, Milan Broz

On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
> >> Maybe, but this was not invented in DM/MD camp:-)
> >> Probably Kay or Greg can answer why it was done this way?
> 
> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
> 
> And yes, symlinks *look* nice and simple for the outside, but they are
> not, and have all sorts of problems like non-atomic updates, make it

 Sounds like sysfs implementation problem, right?

 If there is noway to fix sysfs then we can add a generic ioctl or
 /sys/block/<device>/{slave,holder}_list files with list of
 holders/slaves.
 
 But please, don't force userspace to use *claimer-specific*
 methods to answer *generic questions* like slave/holder dependencies
 between devices.
 
> impossible to ever rename a device (as long as they copy the device
> name), and and and .... we should not add more of this.
> 
> >> If btrfs internally creates some virtual _block_ device for its pool, it should
> >> present it here too with slaves/holders. If not, why it should create any links there?
> >
> > Yeah, that's the most bothering part for me.  The biggest customers of
> > bd_claim are filesystems and all these custom symlinkeries don't do
> > nothing for them.  It just doesn't make a whole lot of sense to me.
> 
> Btrfs does not use any blockdev as the master for good reason, and it
> can never map its slaves inside of /sys/block. 

 Yep, expected and correct response :-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
@ 2011-01-13 15:59                             ` Karel Zak
  0 siblings, 0 replies; 44+ messages in thread
From: Karel Zak @ 2011-01-13 15:59 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
> >> Maybe, but this was not invented in DM/MD camp:-)
> >> Probably Kay or Greg can answer why it was done this way?
> 
> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
> 
> And yes, symlinks *look* nice and simple for the outside, but they are
> not, and have all sorts of problems like non-atomic updates, make it

 Sounds like sysfs implementation problem, right?

 If there is noway to fix sysfs then we can add a generic ioctl or
 /sys/block/<device>/{slave,holder}_list files with list of
 holders/slaves.
 
 But please, don't force userspace to use *claimer-specific*
 methods to answer *generic questions* like slave/holder dependencies
 between devices.
 
> impossible to ever rename a device (as long as they copy the device
> name), and and and .... we should not add more of this.
> 
> >> If btrfs internally creates some virtual _block_ device for its pool, it should
> >> present it here too with slaves/holders. If not, why it should create any links there?
> >
> > Yeah, that's the most bothering part for me.  The biggest customers of
> > bd_claim are filesystems and all these custom symlinkeries don't do
> > nothing for them.  It just doesn't make a whole lot of sense to me.
> 
> Btrfs does not use any blockdev as the master for good reason, and it
> can never map its slaves inside of /sys/block. 

 Yep, expected and correct response :-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 15:59                             ` [dm-devel] " Karel Zak
  (?)
@ 2011-01-13 16:10                             ` Kay Sievers
  2011-01-14 15:07                                 ` Karel Zak
  -1 siblings, 1 reply; 44+ messages in thread
From: Kay Sievers @ 2011-01-13 16:10 UTC (permalink / raw)
  To: Karel Zak
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote:
> On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
>> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>> >> Maybe, but this was not invented in DM/MD camp:-)
>> >> Probably Kay or Greg can answer why it was done this way?
>>
>> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
>>
>> And yes, symlinks *look* nice and simple for the outside, but they are
>> not, and have all sorts of problems like non-atomic updates, make it
>
>  Sounds like sysfs implementation problem, right?

It's a normal multi-file problem. It can by-definition not be atomic
without doing really weird locking things.

>  If there is noway to fix sysfs then we can add a generic ioctl or
>  /sys/block/<device>/{slave,holder}_list files with list of
>  holders/slaves.

Yeah, we've been there with the btrfs problem. For btrfs it woud
probably need to be something statfs()-like.

>  But please, don't force userspace to use *claimer-specific*
>  methods to answer *generic questions* like slave/holder dependencies
>  between devices.

The links exist only for dm and md so far, I think. It's the classical
multiple-parents-in-a-tree problem. We have that for bonded network
devices and some IO buses too. There is no nice representation for
these reversed-trees-in-the-tree so far.

Kay

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] block: restore multiple bd_link_disk_holder() support
  2011-01-13  2:19   ` Jun'ichi Nomura
  2011-01-13 11:06     ` Tejun Heo
@ 2011-01-13 17:21     ` Tejun Heo
  2011-01-13 18:42       ` Milan Broz
  1 sibling, 1 reply; 44+ messages in thread
From: Tejun Heo @ 2011-01-13 17:21 UTC (permalink / raw)
  To: Jens Axboe, Jun'ichi Nomura
  Cc: Milan Broz, Valdis.Kletnieks, Alexander Viro, Neil Brown,
	linux-kernel, linux-fsdevel, linux-raid,
	device-mapper development, Kay Sievers

Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one holder at maximum.  dm may use multiple
holders.  Remove the single holder assumption and automatic removal of
the link.  Let the callers explicitly remove them.  This change makes
it even more alien from the rest of the block layer.

While at it, note that this facility should not be used by anyone else
than the current ones.  Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Milan Broz <mbroz@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers <kay.sievers@vrfy.org>
---
Milan, Jun, can you guys please verify this works correctly for the
multi holder dm case?  Thank you.

 drivers/md/dm-table.c |    1 +
 drivers/md/md.c       |    1 +
 fs/block_dev.c        |   28 +++++++++++++++-------------
 include/linux/fs.h    |    9 ++++++---
 4 files changed, 23 insertions(+), 16 deletions(-)

Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -663,9 +663,6 @@ struct block_device {
 	void *			bd_holder;
 	int			bd_holders;
 	bool			bd_write_holder;
-#ifdef CONFIG_SYSFS
-	struct gendisk *	bd_holder_disk;	/* for sysfs slave linkng */
-#endif
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
 	struct hd_struct *	bd_part;
@@ -2045,12 +2042,18 @@ extern struct block_device *blkdev_get_b
 extern int blkdev_put(struct block_device *bdev, fmode_t mode);
 #ifdef CONFIG_SYSFS
 extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct block_device *bdev,
+				  struct gendisk *disk);
 #else
 static inline int bd_link_disk_holder(struct block_device *bdev,
 				      struct gendisk *disk)
 {
 	return 0;
 }
+static inline void bd_unlink_disk_holder(struct block_device *bdev,
+					 struct gendisk *disk)
+{
+}
 #endif
 #endif
 
Index: work/drivers/md/dm-table.c
===================================================================
--- work.orig/drivers/md/dm-table.c
+++ work/drivers/md/dm-table.c
@@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte
 	if (!d->dm_dev.bdev)
 		return;
 
+	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
 	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
 	d->dm_dev.bdev = NULL;
 }
Index: work/drivers/md/md.c
===================================================================
--- work.orig/drivers/md/md.c
+++ work/drivers/md/md.c
@@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r
 		MD_BUG();
 		return;
 	}
+	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;
Index: work/fs/block_dev.c
===================================================================
--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -788,6 +788,8 @@ static void del_symlink(struct kobject *
  * @bdev: the claimed slave bdev
  * @disk: the holding disk
  *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
  * This functions creates the following sysfs symlinks.
  *
  * - from "slaves" directory of the holder @disk to the claimed @bdev
@@ -815,7 +817,7 @@ int bd_link_disk_holder(struct block_dev
 
 	mutex_lock(&bdev->bd_mutex);
 
-	WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
+	WARN_ON_ONCE(!bdev->bd_holder);
 
 	/* FIXME: remove the following once add_disk() handles errors */
 	if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
@@ -831,27 +833,28 @@ int bd_link_disk_holder(struct block_dev
 		goto out_unlock;
 	}
 
-	bdev->bd_holder_disk = disk;
 out_unlock:
 	mutex_unlock(&bdev->bd_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
 
-static void bd_unlink_disk_holder(struct block_device *bdev)
+/**
+ * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
+ * @bdev: the calimed slave bdev
+ * @disk: the holding disk
+ *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
-	struct gendisk *disk = bdev->bd_holder_disk;
-
-	bdev->bd_holder_disk = NULL;
-	if (!disk)
-		return;
-
 	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
 	del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
 }
-#else
-static inline void bd_unlink_disk_holder(struct block_device *bdev)
-{ }
+EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
 #endif
 
 /**
@@ -1374,7 +1377,6 @@ int blkdev_put(struct block_device *bdev
 		 * unblock evpoll if it was a write holder.
 		 */
 		if (bdev_free) {
-			bd_unlink_disk_holder(bdev);
 			if (bdev->bd_write_holder) {
 				disk_unblock_events(bdev->bd_disk);
 				bdev->bd_write_holder = false;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] block: restore multiple bd_link_disk_holder() support
  2011-01-13 17:21     ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo
@ 2011-01-13 18:42       ` Milan Broz
  2011-01-14  7:31         ` Jun'ichi Nomura
  0 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-13 18:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Jun'ichi Nomura, Valdis.Kletnieks,
	Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel,
	linux-raid, device-mapper development, Kay Sievers,
	Alasdair G Kergon

On 01/13/2011 06:21 PM, Tejun Heo wrote:
> Commit e09b457b (block: simplify holder symlink handling) incorrectly
> assumed that there is only one holder at maximum.  dm may use multiple
> holders.  Remove the single holder assumption and automatic removal of
> the link.  Let the callers explicitly remove them.  This change makes
> it even more alien from the rest of the block layer.
> 
> While at it, note that this facility should not be used by anyone else
> than the current ones.  Sysfs symlinks shouldn't be abused like this
> and the whole thing doesn't belong in the block layer at all.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Milan Broz <mbroz@redhat.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: linux-raid@vger.kernel.org
> Cc: Kay Sievers <kay.sievers@vrfy.org>
> ---
> Milan, Jun, can you guys please verify this works correctly for the
> multi holder dm case?  Thank you.

Hi,

unfortunately not. And the problem is much worse,
it breaks lvm resize operation completely.

I cherry-picked you patch to my tree, it fails, reverting helps.

Following test is over linux-next with your last patch applied:

Test case (lvresize of volume over 3 disks sdb,c,d):

# pvcreate /dev/sd[bcd]
  Physical volume "/dev/sdb" successfully created
  Physical volume "/dev/sdc" successfully created
  Physical volume "/dev/sdd" successfully created

# vgcreate vg_test /dev/sd[bcd]
  Volume group "vg_test" successfully created

# lvcreate -l100%FREE -n lv vg_test
  Logical volume "lv" created

# dmsetup table
vg_test-lv: 0 409600 linear 8:16 2048
vg_test-lv: 409600 409600 linear 8:32 2048
vg_test-lv: 819200 409600 linear 8:48 2048

so we have active LV mapped to 3 devices now.

Now online resize it. I means that it will create inactive table,
then switch active (so the magic with claiming disks applies):

# lvresize -L-10M vg_test/lv
  Rounding up size to full physical extent 8.00 MiB
  WARNING: Reducing active logical volume to 592.00 MiB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce lv? [y/n]: y
  Reducing logical volume lv to 592.00 MiB
  device-mapper: reload ioctl failed: Invalid argument
  Failed to suspend lv

and in syslog:
[  291.221081] ------------[ cut here ]------------
[  291.221111] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x6b/0x80()
[  291.221130] Hardware name: VMware Virtual Platform
[  291.221140] sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0/slaves/sdb'
[  291.221162] Modules linked in: usbcore dm_mod
[  291.221265] Pid: 4074, comm: lvresize Tainted: G        W   2.6.37-next-20110113+ #2
[  291.221269] Call Trace:
[  291.221286]  [<c102e58f>] ? warn_slowpath_common+0x65/0x7a
[  291.221290]  [<c10f8911>] ? sysfs_add_one+0x6b/0x80
[  291.221295]  [<c102e608>] ? warn_slowpath_fmt+0x26/0x2a
[  291.221299]  [<c10f8911>] ? sysfs_add_one+0x6b/0x80
[  291.221304]  [<c10f9482>] ? sysfs_do_create_link+0xda/0x164
[  291.221308]  [<c10f9522>] ? sysfs_create_link+0xa/0xc
[  291.221314]  [<c10db85c>] ? bd_link_disk_holder+0x66/0xad
[  291.221484]  [<d08289ea>] ? open_dev+0x47/0x67 [dm_mod]
[  291.221492]  [<d0828aff>] ? dm_get_device+0xf5/0x1d5 [dm_mod]
[  291.221502]  [<c11bf168>] ? vsscanf+0x364/0x3ee
[  291.221510]  [<c10b250a>] ? cache_alloc_debugcheck_after+0xf/0x180
[  291.221523]  [<d0829812>] ? linear_ctr+0x86/0xc0 [dm_mod]
[  291.221531]  [<d0829287>] ? dm_table_add_target+0x153/0x1d3 [dm_mod]
[  291.221538]  [<d082adea>] ? table_load+0x1fd/0x21e [dm_mod]
[  291.221545]  [<d082b9f8>] ? dm_ctl_ioctl+0x188/0x1c8 [dm_mod]
[  291.221551]  [<d082abed>] ? table_load+0x0/0x21e [dm_mod]
[  291.221558]  [<d082b870>] ? dm_ctl_ioctl+0x0/0x1c8 [dm_mod]
[  291.221565]  [<c10c38ab>] ? do_vfs_ioctl+0x493/0x4d8
[  291.221573]  [<c1313511>] ? do_page_fault+0x3ee/0x418
[  291.221578]  [<c10c391e>] ? sys_ioctl+0x2e/0x48
[  291.221583]  [<c1002853>] ? sysenter_do_call+0x12/0x32
[  291.221609] ---[ end trace c21a5bad605a4a87 ]---
[  291.221760] device-mapper: table: 254:0: linear: dm-linear: Device lookup failed
[  291.221782] device-mapper: ioctl: error adding target to table

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:45                         ` Theodore Tso
@ 2011-01-13 20:18                           ` NeilBrown
  2011-01-13 20:41                             ` Ted Ts'o
  0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2011-01-13 20:18 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Tejun Heo, Milan Broz, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel, Kay Sievers

On Thu, 13 Jan 2011 09:45:54 -0500 Theodore Tso <tytso@MIT.EDU> wrote:

> 
> On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote:
> >> 
> > 
> > Yeah, that's the most bothering part for me.  The biggest customers of
> > bd_claim are filesystems and all these custom symlinkeries don't do
> > nothing for them.  It just doesn't make a whole lot of sense to me.
> > 
> 
> Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel.
> 
> The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.)
> 
> If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system.  So what we want is a very easy way of determining, "is this device being used by the device mapper layer"?   If it is, then it's probably not the droids we are looking for.   We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it.
> 
> Can you suggest a better way doing what it is we need to do?

open(O_EXCL) will fail on a block device if it is being used by anything else
- a filesystem or a dm target or an md array or ....

So if the *only* thing you want is "is this currently an active part of
something else", then O_EXCL works since 2.6.0 (I think).

NeilBrown


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 20:18                           ` NeilBrown
@ 2011-01-13 20:41                             ` Ted Ts'o
  2011-01-14 16:20                                 ` Tejun Heo
  2011-01-14 16:20                               ` Tejun Heo
  0 siblings, 2 replies; 44+ messages in thread
From: Ted Ts'o @ 2011-01-13 20:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: Tejun Heo, Milan Broz, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel, Kay Sievers

On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
> 
> open(O_EXCL) will fail on a block device if it is being used by anything else
> - a filesystem or a dm target or an md array or ....
> 
> So if the *only* thing you want is "is this currently an active part of
> something else", then O_EXCL works since 2.6.0 (I think).

Unfortunately, that won't distinguish between a currently active file
system, and a device which is being used by a dm target, which is what
we want to do.

						- Ted

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] block: restore multiple bd_link_disk_holder() support
  2011-01-13 18:42       ` Milan Broz
@ 2011-01-14  7:31         ` Jun'ichi Nomura
  2011-01-14 16:10             ` Tejun Heo
  0 siblings, 1 reply; 44+ messages in thread
From: Jun'ichi Nomura @ 2011-01-14  7:31 UTC (permalink / raw)
  To: Milan Broz, Tejun Heo
  Cc: Jens Axboe, Valdis.Kletnieks, Alexander Viro, Neil Brown,
	linux-kernel, linux-fsdevel, linux-raid,
	device-mapper development, Kay Sievers, Alasdair G Kergon

Hi,

On 01/14/11 03:42, Milan Broz wrote:
>> Milan, Jun, can you guys please verify this works correctly for the
>> multi holder dm case?  Thank you.
> 
> Hi,
> 
> unfortunately not. And the problem is much worse,
> it breaks lvm resize operation completely.

For quick testing, try this:
("/dev/sdf" can be other block device.
 "testA" and "testB" can be other dm names which aren't used in
 your machine.)

# echo "0 10 linear /dev/sdf 0" | dmsetup create testA
# echo "0 10 linear /dev/sdf 10" | dmsetup create testB
(at this point, /dev/sdf has multiple holders)
# echo "0 20 linear /dev/sdf 10" | dmsetup load testB
(at this point, /dev/mapper/testB has 2 claims on /dev/sdf)
# dmsetup resume testB
(/dev/mapper/testB has 1 claim on /dev/sdf)
# dmsetup remove testA
# dmsetup remove testB


Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 15:03                           ` Milan Broz
@ 2011-01-14  7:38                             ` Jun'ichi Nomura
  0 siblings, 0 replies; 44+ messages in thread
From: Jun'ichi Nomura @ 2011-01-14  7:38 UTC (permalink / raw)
  To: Milan Broz, Kay Sievers, Tejun Heo
  Cc: Karel Zak, device-mapper development, Valdis.Kletnieks,
	linux-kernel, linux-raid, Alexander Viro, linux-fsdevel

On 01/14/11 00:03, Milan Broz wrote:
> On 01/13/2011 03:43 PM, Kay Sievers wrote:
> 
>>> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>>>> Maybe, but this was not invented in DM/MD camp:-)
>>>> Probably Kay or Greg can answer why it was done this way?
>>
>> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
> 
> ah, then sorry, I am just confused:-)
> But DM does not need it for operation at all so it must had some other reason.
> (We have dmsetup ls --tree using dm-ioctls for years.)

Sorry. It's from me 5 years ago. :)

See this for backgrounds:
  [PATCH 0/3] sysfs representation of stacked devices (dm/md)
  http://lwn.net/Articles/172689/

And I don't adhere to my implementation if there's better one.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 16:10                             ` Kay Sievers
@ 2011-01-14 15:07                                 ` Karel Zak
  0 siblings, 0 replies; 44+ messages in thread
From: Karel Zak @ 2011-01-14 15:07 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
> On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote:
> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
> >> >> Maybe, but this was not invented in DM/MD camp:-)
> >> >> Probably Kay or Greg can answer why it was done this way?
> >>
> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
> >>
> >> And yes, symlinks *look* nice and simple for the outside, but they are
> >> not, and have all sorts of problems like non-atomic updates, make it
> >
> >  Sounds like sysfs implementation problem, right?
> 
> It's a normal multi-file problem. It can by-definition not be atomic
> without doing really weird locking things.

 BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders 
 files are symlinks.
 
 The important thing is the filename (/sys/block/.../slaves/<name>) 
 only. We don't follow the symlinks and we don't use readlink() there.
 
 It means that you can replace the symlinks with regular files where
 in the file contents is for example maj:min, etc.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
@ 2011-01-14 15:07                                 ` Karel Zak
  0 siblings, 0 replies; 44+ messages in thread
From: Karel Zak @ 2011-01-14 15:07 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
> On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote:
> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
> >> >> Maybe, but this was not invented in DM/MD camp:-)
> >> >> Probably Kay or Greg can answer why it was done this way?
> >>
> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
> >>
> >> And yes, symlinks *look* nice and simple for the outside, but they are
> >> not, and have all sorts of problems like non-atomic updates, make it
> >
> >  Sounds like sysfs implementation problem, right?
> 
> It's a normal multi-file problem. It can by-definition not be atomic
> without doing really weird locking things.

 BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders 
 files are symlinks.
 
 The important thing is the filename (/sys/block/.../slaves/<name>) 
 only. We don't follow the symlinks and we don't use readlink() there.
 
 It means that you can replace the symlinks with regular files where
 in the file contents is for example maj:min, etc.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 15:07                                 ` Karel Zak
@ 2011-01-14 15:23                                   ` Kay Sievers
  -1 siblings, 0 replies; 44+ messages in thread
From: Kay Sievers @ 2011-01-14 15:23 UTC (permalink / raw)
  To: Karel Zak
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Fri, Jan 14, 2011 at 16:07, Karel Zak <kzak@redhat.com> wrote:
> On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote:
>> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
>> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
>> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>> >> >> Maybe, but this was not invented in DM/MD camp:-)
>> >> >> Probably Kay or Greg can answer why it was done this way?
>> >>
>> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
>> >>
>> >> And yes, symlinks *look* nice and simple for the outside, but they are
>> >> not, and have all sorts of problems like non-atomic updates, make it
>> >
>> >  Sounds like sysfs implementation problem, right?
>>
>> It's a normal multi-file problem. It can by-definition not be atomic
>> without doing really weird locking things.
>
>  BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders
>  files are symlinks.
>
>  The important thing is the filename (/sys/block/.../slaves/<name>)
>  only. We don't follow the symlinks and we don't use readlink() there.
>
>  It means that you can replace the symlinks with regular files where
>  in the file contents is for example maj:min, etc.

I don't think we really can change anything here, there are more users of it.

If we would go changing things here, the file names should never be
the device name but the format "b8:32" for blockdevs, "n22" for the
network ifindex, ... which is the unchangeable unique kernel id -- not
depending on any possible device rename, or fast destroy/re-create or
anything like that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
@ 2011-01-14 15:23                                   ` Kay Sievers
  0 siblings, 0 replies; 44+ messages in thread
From: Kay Sievers @ 2011-01-14 15:23 UTC (permalink / raw)
  To: Karel Zak
  Cc: Tejun Heo, Milan Broz, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel

On Fri, Jan 14, 2011 at 16:07, Karel Zak <kzak@redhat.com> wrote:
> On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 16:59, Karel Zak <kzak@redhat.com> wrote:
>> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
>> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo <tj@kernel.org> wrote:
>> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz <mbroz@redhat.com> wrote:
>> >> >> Maybe, but this was not invented in DM/MD camp:-)
>> >> >> Probably Kay or Greg can answer why it was done this way?
>> >>
>> >> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
>> >>
>> >> And yes, symlinks *look* nice and simple for the outside, but they are
>> >> not, and have all sorts of problems like non-atomic updates, make it
>> >
>> >  Sounds like sysfs implementation problem, right?
>>
>> It's a normal multi-file problem. It can by-definition not be atomic
>> without doing really weird locking things.
>
>  BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holders
>  files are symlinks.
>
>  The important thing is the filename (/sys/block/.../slaves/<name>)
>  only. We don't follow the symlinks and we don't use readlink() there.
>
>  It means that you can replace the symlinks with regular files where
>  in the file contents is for example maj:min, etc.

I don't think we really can change anything here, there are more users of it.

If we would go changing things here, the file names should never be
the device name but the format "b8:32" for blockdevs, "n22" for the
network ifindex, ... which is the unchangeable unique kernel id -- not
depending on any possible device rename, or fast destroy/re-create or
anything like that.

Kay

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support
  2011-01-14  7:31         ` Jun'ichi Nomura
@ 2011-01-14 16:10             ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:10 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Jens Axboe, Valdis.Kletnieks, linux-kernel, linux-raid,
	device-mapper development, Alexander Viro, linux-fsdevel,
	Alasdair G Kergon, Milan Broz

Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one link at maximum.  dm may use multiple
links and expects block layer to track reference count for each link,
which is different from and unrelated to the exclusive device holder
identified by @holder when the device is opened.

Remove the single holder assumption and automatic removal of the link
and revive the per-link reference count tracking.  The code
essentially behaves the same as before commit e09b457b sans the
unnecessary kobject reference count dancing.

While at it, note that this facility should not be used by anyone else
than the current ones.  Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Milan Broz <mbroz@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers <kay.sievers@vrfy.org>
---
Thanks for the test commands.  They were very helpful.  Can you please
test this one?

Thanks.

 drivers/md/dm-table.c |    1 
 drivers/md/md.c       |    1 
 fs/block_dev.c        |   93 ++++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h    |    8 +++-
 4 files changed, 84 insertions(+), 19 deletions(-)

Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -664,7 +664,7 @@ struct block_device {
 	int			bd_holders;
 	bool			bd_write_holder;
 #ifdef CONFIG_SYSFS
-	struct gendisk *	bd_holder_disk;	/* for sysfs slave linkng */
+	struct list_head	bd_holder_disks;
 #endif
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
@@ -2045,12 +2045,18 @@ extern struct block_device *blkdev_get_b
 extern int blkdev_put(struct block_device *bdev, fmode_t mode);
 #ifdef CONFIG_SYSFS
 extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct block_device *bdev,
+				  struct gendisk *disk);
 #else
 static inline int bd_link_disk_holder(struct block_device *bdev,
 				      struct gendisk *disk)
 {
 	return 0;
 }
+static inline void bd_unlink_disk_holder(struct block_device *bdev,
+					 struct gendisk *disk)
+{
+}
 #endif
 #endif
 
Index: work/drivers/md/dm-table.c
===================================================================
--- work.orig/drivers/md/dm-table.c
+++ work/drivers/md/dm-table.c
@@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte
 	if (!d->dm_dev.bdev)
 		return;
 
+	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
 	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
 	d->dm_dev.bdev = NULL;
 }
Index: work/drivers/md/md.c
===================================================================
--- work.orig/drivers/md/md.c
+++ work/drivers/md/md.c
@@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r
 		MD_BUG();
 		return;
 	}
+	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;
Index: work/fs/block_dev.c
===================================================================
--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -426,6 +426,9 @@ static void init_once(void *foo)
 	mutex_init(&bdev->bd_mutex);
 	INIT_LIST_HEAD(&bdev->bd_inodes);
 	INIT_LIST_HEAD(&bdev->bd_list);
+#ifdef CONFIG_SYSFS
+	INIT_LIST_HEAD(&bdev->bd_holder_disks);
+#endif
 	inode_init_once(&ei->vfs_inode);
 	/* Initialize mutex for freeze. */
 	mutex_init(&bdev->bd_fsfreeze_mutex);
@@ -773,6 +776,23 @@ static struct block_device *bd_start_cla
 }
 
 #ifdef CONFIG_SYSFS
+struct bd_holder_disk {
+	struct list_head	list;
+	struct gendisk		*disk;
+	int			refcnt;
+};
+
+static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
+						  struct gendisk *disk)
+{
+	struct bd_holder_disk *holder;
+
+	list_for_each_entry(holder, &bdev->bd_holder_disks, list)
+		if (holder->disk == disk)
+			return holder;
+	return NULL;
+}
+
 static int add_symlink(struct kobject *from, struct kobject *to)
 {
 	return sysfs_create_link(from, to, kobject_name(to));
@@ -788,6 +808,8 @@ static void del_symlink(struct kobject *
  * @bdev: the claimed slave bdev
  * @disk: the holding disk
  *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
  * This functions creates the following sysfs symlinks.
  *
  * - from "slaves" directory of the holder @disk to the claimed @bdev
@@ -811,47 +833,83 @@ static void del_symlink(struct kobject *
  */
 int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
+	struct bd_holder_disk *holder;
 	int ret = 0;
 
 	mutex_lock(&bdev->bd_mutex);
 
-	WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
+	WARN_ON_ONCE(!bdev->bd_holder);
 
 	/* FIXME: remove the following once add_disk() handles errors */
 	if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
 		goto out_unlock;
 
-	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-	if (ret)
+	holder = bd_find_holder_disk(bdev, disk);
+	if (holder) {
+		holder->refcnt++;
 		goto out_unlock;
+	}
 
-	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
-	if (ret) {
-		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	holder = kzalloc(sizeof(*holder), GFP_KERNEL);
+	if (!holder) {
+		ret = -ENOMEM;
 		goto out_unlock;
 	}
 
-	bdev->bd_holder_disk = disk;
+	INIT_LIST_HEAD(&holder->list);
+	holder->disk = disk;
+	holder->refcnt = 1;
+
+	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	if (ret)
+		goto out_free;
+
+	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	if (ret)
+		goto out_del;
+
+	list_add(&holder->list, &bdev->bd_holder_disks);
+	goto out_unlock;
+
+out_del:
+	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+out_free:
+	kfree(holder);
 out_unlock:
 	mutex_unlock(&bdev->bd_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
 
-static void bd_unlink_disk_holder(struct block_device *bdev)
+/**
+ * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
+ * @bdev: the calimed slave bdev
+ * @disk: the holding disk
+ *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
-	struct gendisk *disk = bdev->bd_holder_disk;
+	struct bd_holder_disk *holder;
 
-	bdev->bd_holder_disk = NULL;
-	if (!disk)
-		return;
+	mutex_lock(&bdev->bd_mutex);
 
-	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-	del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	holder = bd_find_holder_disk(bdev, disk);
+
+	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
+		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+		del_symlink(bdev->bd_part->holder_dir,
+			    &disk_to_dev(disk)->kobj);
+		list_del_init(&holder->list);
+		kfree(holder);
+	}
+
+	mutex_unlock(&bdev->bd_mutex);
 }
-#else
-static inline void bd_unlink_disk_holder(struct block_device *bdev)
-{ }
+EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
 #endif
 
 /**
@@ -1374,7 +1432,6 @@ int blkdev_put(struct block_device *bdev
 		 * unblock evpoll if it was a write holder.
 		 */
 		if (bdev_free) {
-			bd_unlink_disk_holder(bdev);
 			if (bdev->bd_write_holder) {
 				disk_unblock_events(bdev->bd_disk);
 				bdev->bd_write_holder = false;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support
@ 2011-01-14 16:10             ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:10 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Milan Broz, Jens Axboe, Valdis.Kletnieks, Alexander Viro,
	Neil Brown, linux-kernel, linux-fsdevel, linux-raid,
	device-mapper development, Kay Sievers, Alasdair G Kergon

Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one link at maximum.  dm may use multiple
links and expects block layer to track reference count for each link,
which is different from and unrelated to the exclusive device holder
identified by @holder when the device is opened.

Remove the single holder assumption and automatic removal of the link
and revive the per-link reference count tracking.  The code
essentially behaves the same as before commit e09b457b sans the
unnecessary kobject reference count dancing.

While at it, note that this facility should not be used by anyone else
than the current ones.  Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Milan Broz <mbroz@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers <kay.sievers@vrfy.org>
---
Thanks for the test commands.  They were very helpful.  Can you please
test this one?

Thanks.

 drivers/md/dm-table.c |    1 
 drivers/md/md.c       |    1 
 fs/block_dev.c        |   93 ++++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h    |    8 +++-
 4 files changed, 84 insertions(+), 19 deletions(-)

Index: work/include/linux/fs.h
===================================================================
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -664,7 +664,7 @@ struct block_device {
 	int			bd_holders;
 	bool			bd_write_holder;
 #ifdef CONFIG_SYSFS
-	struct gendisk *	bd_holder_disk;	/* for sysfs slave linkng */
+	struct list_head	bd_holder_disks;
 #endif
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
@@ -2045,12 +2045,18 @@ extern struct block_device *blkdev_get_b
 extern int blkdev_put(struct block_device *bdev, fmode_t mode);
 #ifdef CONFIG_SYSFS
 extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct block_device *bdev,
+				  struct gendisk *disk);
 #else
 static inline int bd_link_disk_holder(struct block_device *bdev,
 				      struct gendisk *disk)
 {
 	return 0;
 }
+static inline void bd_unlink_disk_holder(struct block_device *bdev,
+					 struct gendisk *disk)
+{
+}
 #endif
 #endif
 
Index: work/drivers/md/dm-table.c
===================================================================
--- work.orig/drivers/md/dm-table.c
+++ work/drivers/md/dm-table.c
@@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte
 	if (!d->dm_dev.bdev)
 		return;
 
+	bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
 	blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
 	d->dm_dev.bdev = NULL;
 }
Index: work/drivers/md/md.c
===================================================================
--- work.orig/drivers/md/md.c
+++ work/drivers/md/md.c
@@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r
 		MD_BUG();
 		return;
 	}
+	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;
Index: work/fs/block_dev.c
===================================================================
--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -426,6 +426,9 @@ static void init_once(void *foo)
 	mutex_init(&bdev->bd_mutex);
 	INIT_LIST_HEAD(&bdev->bd_inodes);
 	INIT_LIST_HEAD(&bdev->bd_list);
+#ifdef CONFIG_SYSFS
+	INIT_LIST_HEAD(&bdev->bd_holder_disks);
+#endif
 	inode_init_once(&ei->vfs_inode);
 	/* Initialize mutex for freeze. */
 	mutex_init(&bdev->bd_fsfreeze_mutex);
@@ -773,6 +776,23 @@ static struct block_device *bd_start_cla
 }
 
 #ifdef CONFIG_SYSFS
+struct bd_holder_disk {
+	struct list_head	list;
+	struct gendisk		*disk;
+	int			refcnt;
+};
+
+static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
+						  struct gendisk *disk)
+{
+	struct bd_holder_disk *holder;
+
+	list_for_each_entry(holder, &bdev->bd_holder_disks, list)
+		if (holder->disk == disk)
+			return holder;
+	return NULL;
+}
+
 static int add_symlink(struct kobject *from, struct kobject *to)
 {
 	return sysfs_create_link(from, to, kobject_name(to));
@@ -788,6 +808,8 @@ static void del_symlink(struct kobject *
  * @bdev: the claimed slave bdev
  * @disk: the holding disk
  *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
  * This functions creates the following sysfs symlinks.
  *
  * - from "slaves" directory of the holder @disk to the claimed @bdev
@@ -811,47 +833,83 @@ static void del_symlink(struct kobject *
  */
 int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
+	struct bd_holder_disk *holder;
 	int ret = 0;
 
 	mutex_lock(&bdev->bd_mutex);
 
-	WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
+	WARN_ON_ONCE(!bdev->bd_holder);
 
 	/* FIXME: remove the following once add_disk() handles errors */
 	if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
 		goto out_unlock;
 
-	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-	if (ret)
+	holder = bd_find_holder_disk(bdev, disk);
+	if (holder) {
+		holder->refcnt++;
 		goto out_unlock;
+	}
 
-	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
-	if (ret) {
-		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	holder = kzalloc(sizeof(*holder), GFP_KERNEL);
+	if (!holder) {
+		ret = -ENOMEM;
 		goto out_unlock;
 	}
 
-	bdev->bd_holder_disk = disk;
+	INIT_LIST_HEAD(&holder->list);
+	holder->disk = disk;
+	holder->refcnt = 1;
+
+	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	if (ret)
+		goto out_free;
+
+	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	if (ret)
+		goto out_del;
+
+	list_add(&holder->list, &bdev->bd_holder_disks);
+	goto out_unlock;
+
+out_del:
+	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+out_free:
+	kfree(holder);
 out_unlock:
 	mutex_unlock(&bdev->bd_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
 
-static void bd_unlink_disk_holder(struct block_device *bdev)
+/**
+ * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
+ * @bdev: the calimed slave bdev
+ * @disk: the holding disk
+ *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
 {
-	struct gendisk *disk = bdev->bd_holder_disk;
+	struct bd_holder_disk *holder;
 
-	bdev->bd_holder_disk = NULL;
-	if (!disk)
-		return;
+	mutex_lock(&bdev->bd_mutex);
 
-	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-	del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	holder = bd_find_holder_disk(bdev, disk);
+
+	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
+		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+		del_symlink(bdev->bd_part->holder_dir,
+			    &disk_to_dev(disk)->kobj);
+		list_del_init(&holder->list);
+		kfree(holder);
+	}
+
+	mutex_unlock(&bdev->bd_mutex);
 }
-#else
-static inline void bd_unlink_disk_holder(struct block_device *bdev)
-{ }
+EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
 #endif
 
 /**
@@ -1374,7 +1432,6 @@ int blkdev_put(struct block_device *bdev
 		 * unblock evpoll if it was a write holder.
 		 */
 		if (bdev_free) {
-			bd_unlink_disk_holder(bdev);
 			if (bdev->bd_write_holder) {
 				disk_unblock_events(bdev->bd_disk);
 				bdev->bd_write_holder = false;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 20:41                             ` Ted Ts'o
@ 2011-01-14 16:20                                 ` Tejun Heo
  2011-01-14 16:20                               ` Tejun Heo
  1 sibling, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:20 UTC (permalink / raw)
  To: Ted Ts'o, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development, Jun'ichi Nomura

Hello, Ted, Neil.

On Thu, Jan 13, 2011 at 03:41:45PM -0500, Ted Ts'o wrote:
> On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
> > 
> > open(O_EXCL) will fail on a block device if it is being used by anything else
> > - a filesystem or a dm target or an md array or ....
> > 
> > So if the *only* thing you want is "is this currently an active part of
> > something else", then O_EXCL works since 2.6.0 (I think).
> 
> Unfortunately, that won't distinguish between a currently active file
> system, and a device which is being used by a dm target, which is what
> we want to do.

Hmmm... that's already possible with the existing holders symlinks,
right?  As Kay said in another message, I don't think we can do
anything about the symlinks at this point.  It already has userland
users, so we'll have to maintain them and there's no reason to create
something else for the same functionality.

It's silly that there's no way to tell whether the device is mounted
from a given block device but then again we've been working around it
by reverse mapping it till now so unless there's a new requirement
maybe it's okay as it is now.

Ideally, it would have been nice and more fitting with the whole bd
claim API if we just exported single attribute which identifies the
current holder supplied at the time of claiming (just the kernel
identifier in string), so that we can forward-map the exclusive opener
in general instead of having to reverse-map it for everything other
than md/dm.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
@ 2011-01-14 16:20                                 ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:20 UTC (permalink / raw)
  To: Ted Ts'o, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks,
	linux-kernel, linux-raid, Alexander Viro, linux-fsdevel,
	Kay Sievers

Hello, Ted, Neil.

On Thu, Jan 13, 2011 at 03:41:45PM -0500, Ted Ts'o wrote:
> On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
> > 
> > open(O_EXCL) will fail on a block device if it is being used by anything else
> > - a filesystem or a dm target or an md array or ....
> > 
> > So if the *only* thing you want is "is this currently an active part of
> > something else", then O_EXCL works since 2.6.0 (I think).
> 
> Unfortunately, that won't distinguish between a currently active file
> system, and a device which is being used by a dm target, which is what
> we want to do.

Hmmm... that's already possible with the existing holders symlinks,
right?  As Kay said in another message, I don't think we can do
anything about the symlinks at this point.  It already has userland
users, so we'll have to maintain them and there's no reason to create
something else for the same functionality.

It's silly that there's no way to tell whether the device is mounted
from a given block device but then again we've been working around it
by reverse mapping it till now so unless there's a new requirement
maybe it's okay as it is now.

Ideally, it would have been nice and more fitting with the whole bd
claim API if we just exported single attribute which identifies the
current holder supplied at the time of claiming (just the kernel
identifier in string), so that we can forward-map the exclusive opener
in general instead of having to reverse-map it for everything other
than md/dm.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 20:41                             ` Ted Ts'o
  2011-01-14 16:20                                 ` Tejun Heo
@ 2011-01-14 16:20                               ` Tejun Heo
  1 sibling, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:20 UTC (permalink / raw)
  To: Ted Ts'o, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development, Jun'ichi Nomura

Hello, Ted, Neil.

On Thu, Jan 13, 2011 at 03:41:45PM -0500, Ted Ts'o wrote:
> On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
> > 
> > open(O_EXCL) will fail on a block device if it is being used by anything else
> > - a filesystem or a dm target or an md array or ....
> > 
> > So if the *only* thing you want is "is this currently an active part of
> > something else", then O_EXCL works since 2.6.0 (I think).
> 
> Unfortunately, that won't distinguish between a currently active file
> system, and a device which is being used by a dm target, which is what
> we want to do.

Hmmm... that's already possible with the existing holders symlinks,
right?  As Kay said in another message, I don't think we can do
anything about the symlinks at this point.  It already has userland
users, so we'll have to maintain them and there's no reason to create
something else for the same functionality.

It's silly that there's no way to tell whether the device is mounted
from a given block device but then again we've been working around it
by reverse mapping it till now so unless there's a new requirement
maybe it's okay as it is now.

Ideally, it would have been nice and more fitting with the whole bd
claim API if we just exported single attribute which identifies the
current holder supplied at the time of claiming (just the kernel
identifier in string), so that we can forward-map the exclusive opener
in general instead of having to reverse-map it for everything other
than md/dm.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-13 14:49                         ` Milan Broz
@ 2011-01-14 16:35                           ` Tejun Heo
  0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 16:35 UTC (permalink / raw)
  To: Milan Broz
  Cc: Karel Zak, device-mapper development, Jun'ichi Nomura,
	Valdis.Kletnieks, linux-kernel, linux-raid, Alexander Viro,
	linux-fsdevel, Kay Sievers

Hello,

On Thu, Jan 13, 2011 at 03:49:50PM +0100, Milan Broz wrote:
> So listing block device tree using ONE interface (sysfs) to check
> and display device tree for ALL subsystems seems to be good idea for
> me.  In this exactly defined case.
> 
> I am not saying that symlinks are perfect, just that generic
> interface here is not superfluous.

I'm not saying finding out the current owner is superflous.  That
actually is useful but what's implemented now doesn't satisfy that in
any generic manner.  It only works for md/dm and is designed in such
way that it can't be used out of that scope.

So, it's only useful for md/dm and for those two cases it tries to
show information which doesn't really belong there using an overly
complex mechanism.

Block device open allows _nesting_ by the claiming device.  It is
never meant to track different nested claimers and it shows in the
implementation and interface.  The holder tracking doesn't fit
anywhere.  It's a completely separate piece of logic which doesn't mix
with anything else in the block layer.

If you consider its only user - dm-table, the whole picture is
somewhat uglier.  dm-table tracks how it uses block devices but only
within a single target, so there are two separate layers of reference
counting going on.  This again is caused by the above stated design
inconsistency.

As block device only considers single claimer which doesn't have to be
another device, the owner naturally should have been something which
only points to single in-kernel entity in a way which doesn't require
link to another sysfs entry.

Karel pointed out that having different methods between md and dm
would be silly; however, that is a problem which should be solved
between dm and md and actually is a quite artificial problem (or
solution).  It only exists because dm and md do very similar things in
many cases.  A block device can be used by many more things which are
way out of the scope which can be represented in the sysfs hierarchy.
It simply is impossible to represent that relationship with symlinks
under block device sysfs hierarchy.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 16:20                                 ` Tejun Heo
  (?)
@ 2011-01-14 17:59                                 ` Ted Ts'o
  2011-01-14 18:23                                   ` Tejun Heo
                                                     ` (3 more replies)
  -1 siblings, 4 replies; 44+ messages in thread
From: Ted Ts'o @ 2011-01-14 17:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: NeilBrown, Milan Broz, Karel Zak, device-mapper development,
	Jun'ichi Nomura, Valdis.Kletnieks, linux-kernel, linux-raid,
	Alexander Viro, linux-fsdevel, Kay Sievers

On Fri, Jan 14, 2011 at 05:20:22PM +0100, Tejun Heo wrote:
> 
> Hmmm... that's already possible with the existing holders symlinks,
> right?  As Kay said in another message, I don't think we can do
> anything about the symlinks at this point.  It already has userland
> users, so we'll have to maintain them and there's no reason to create
> something else for the same functionality.

Yes, agreed.  My point was that if we have a better system, we could
migrate blkid and other users to new interface, and maybe in a few
years, we can deprecate that interface.  So we do have to maintain
them for at least the medium term, but that shouldn't stop us from
divising a better interface, and then gradually migrating systems to
use that newer and better interface.

						- Ted

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 17:59                                 ` Ted Ts'o
  2011-01-14 18:23                                   ` Tejun Heo
@ 2011-01-14 18:23                                   ` Tejun Heo
  2011-01-14 18:23                                   ` Tejun Heo
  2011-01-14 18:23                                   ` [dm-devel] " Tejun Heo
  3 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 18:23 UTC (permalink / raw)
  To: Ted Ts'o, Tejun Heo, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development

Hello, Ted.

On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Yes, agreed.  My point was that if we have a better system, we could
> migrate blkid and other users to new interface, and maybe in a few
> years, we can deprecate that interface.  So we do have to maintain
> them for at least the medium term, but that shouldn't stop us from
> divising a better interface, and then gradually migrating systems to
> use that newer and better interface.

Oh yeah, fully agreed there and the implementation at the block layer
isn't even gonna be difficult.  Just a single attribute which contains
a string which can be specified at the time of claiming.  We probably
can introduce a new interface where the @holder is always a string the
content of which should uniquely identify the holder in human/machine
readable form and add a few helpers to format those strings so that
the same type of usages use consistently formatted strings.  For
nested devices, kdev should do.  For filesystems, the mount point
maybe (it should be something which identifies it absolutely)?  For
specific programs (cd burner, whatnot), pid and so on.

The only question is, would that be actually necessary?  By now, we
already have mostly working reverse mapping for most things which
matter through blkid, fs information, fd listing under proc and so on.
 They sure are ugly and probably unreliable at times but is it
pressing enough to introduce new interface and migrate things over or
would we be just creating churns?  I don't really know how these
things look from userland so am genuinely unsure what the answer is.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 17:59                                 ` Ted Ts'o
@ 2011-01-14 18:23                                   ` Tejun Heo
  2011-01-14 18:23                                   ` Tejun Heo
                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 18:23 UTC (permalink / raw)
  To: Ted Ts'o, Tejun Heo, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development, Jun'ichi Nomura, Valdis.Kletnieks,
	linux-kernel, linux-raid, Alexander Viro, linux-fsdevel,
	Kay Sievers

Hello, Ted.

On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Yes, agreed.  My point was that if we have a better system, we could
> migrate blkid and other users to new interface, and maybe in a few
> years, we can deprecate that interface.  So we do have to maintain
> them for at least the medium term, but that shouldn't stop us from
> divising a better interface, and then gradually migrating systems to
> use that newer and better interface.

Oh yeah, fully agreed there and the implementation at the block layer
isn't even gonna be difficult.  Just a single attribute which contains
a string which can be specified at the time of claiming.  We probably
can introduce a new interface where the @holder is always a string the
content of which should uniquely identify the holder in human/machine
readable form and add a few helpers to format those strings so that
the same type of usages use consistently formatted strings.  For
nested devices, kdev should do.  For filesystems, the mount point
maybe (it should be something which identifies it absolutely)?  For
specific programs (cd burner, whatnot), pid and so on.

The only question is, would that be actually necessary?  By now, we
already have mostly working reverse mapping for most things which
matter through blkid, fs information, fd listing under proc and so on.
 They sure are ugly and probably unreliable at times but is it
pressing enough to introduce new interface and migrate things over or
would we be just creating churns?  I don't really know how these
things look from userland so am genuinely unsure what the answer is.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 17:59                                 ` Ted Ts'o
                                                     ` (2 preceding siblings ...)
  2011-01-14 18:23                                   ` Tejun Heo
@ 2011-01-14 18:23                                   ` Tejun Heo
  3 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 18:23 UTC (permalink / raw)
  To: Ted Ts'o, Tejun Heo, NeilBrown, Milan Broz, Karel Zak,
	device-mapper development

Hello, Ted.

On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Yes, agreed.  My point was that if we have a better system, we could
> migrate blkid and other users to new interface, and maybe in a few
> years, we can deprecate that interface.  So we do have to maintain
> them for at least the medium term, but that shouldn't stop us from
> divising a better interface, and then gradually migrating systems to
> use that newer and better interface.

Oh yeah, fully agreed there and the implementation at the block layer
isn't even gonna be difficult.  Just a single attribute which contains
a string which can be specified at the time of claiming.  We probably
can introduce a new interface where the @holder is always a string the
content of which should uniquely identify the holder in human/machine
readable form and add a few helpers to format those strings so that
the same type of usages use consistently formatted strings.  For
nested devices, kdev should do.  For filesystems, the mount point
maybe (it should be something which identifies it absolutely)?  For
specific programs (cd burner, whatnot), pid and so on.

The only question is, would that be actually necessary?  By now, we
already have mostly working reverse mapping for most things which
matter through blkid, fs information, fd listing under proc and so on.
 They sure are ugly and probably unreliable at times but is it
pressing enough to introduce new interface and migrate things over or
would we be just creating churns?  I don't really know how these
things look from userland so am genuinely unsure what the answer is.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
  2011-01-14 17:59                                 ` Ted Ts'o
  2011-01-14 18:23                                   ` Tejun Heo
  2011-01-14 18:23                                   ` Tejun Heo
@ 2011-01-14 18:23                                   ` Tejun Heo
  2011-01-14 18:23                                   ` [dm-devel] " Tejun Heo
  3 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2011-01-14 18:23 UTC (permalink / raw)
  To: Ted Ts'o, Tejun Heo, NeilBrown, Milan Broz, Karel Zak, device-mapper

Hello, Ted.

On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o <tytso@mit.edu> wrote:
> Yes, agreed.  My point was that if we have a better system, we could
> migrate blkid and other users to new interface, and maybe in a few
> years, we can deprecate that interface.  So we do have to maintain
> them for at least the medium term, but that shouldn't stop us from
> divising a better interface, and then gradually migrating systems to
> use that newer and better interface.

Oh yeah, fully agreed there and the implementation at the block layer
isn't even gonna be difficult.  Just a single attribute which contains
a string which can be specified at the time of claiming.  We probably
can introduce a new interface where the @holder is always a string the
content of which should uniquely identify the holder in human/machine
readable form and add a few helpers to format those strings so that
the same type of usages use consistently formatted strings.  For
nested devices, kdev should do.  For filesystems, the mount point
maybe (it should be something which identifies it absolutely)?  For
specific programs (cd burner, whatnot), pid and so on.

The only question is, would that be actually necessary?  By now, we
already have mostly working reverse mapping for most things which
matter through blkid, fs information, fd listing under proc and so on.
 They sure are ugly and probably unreliable at times but is it
pressing enough to introduce new interface and migrate things over or
would we be just creating churns?  I don't really know how these
things look from userland so am genuinely unsure what the answer is.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support
  2011-01-14 16:10             ` Tejun Heo
  (?)
@ 2011-01-14 21:09             ` Milan Broz
  2011-01-17  0:18               ` Jun'ichi Nomura
  -1 siblings, 1 reply; 44+ messages in thread
From: Milan Broz @ 2011-01-14 21:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jun'ichi Nomura, Jens Axboe, Valdis.Kletnieks,
	Alexander Viro, Neil Brown, linux-kernel, linux-fsdevel,
	linux-raid, device-mapper development, Kay Sievers,
	Alasdair G Kergon

On 01/14/2011 05:10 PM, Tejun Heo wrote:
> Commit e09b457b (block: simplify holder symlink handling) incorrectly
> assumed that there is only one link at maximum.  dm may use multiple
> links and expects block layer to track reference count for each link,
> which is different from and unrelated to the exclusive device holder
> identified by @holder when the device is opened.
> 
> Remove the single holder assumption and automatic removal of the link
> and revive the per-link reference count tracking.  The code
> essentially behaves the same as before commit e09b457b sans the
> unnecessary kobject reference count dancing.
> 
> While at it, note that this facility should not be used by anyone else
> than the current ones.  Sysfs symlinks shouldn't be abused like this
> and the whole thing doesn't belong in the block layer at all.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Milan Broz <mbroz@redhat.com>
> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: Neil Brown <neilb@suse.de>
> Cc: linux-raid@vger.kernel.org
> Cc: Kay Sievers <kay.sievers@vrfy.org>
> ---
> Thanks for the test commands.  They were very helpful.  Can you please
> test this one?

Hi,

yes, this one works for me. I run full lvm2 testsuite and no warnings.
Thanks!

Tested-by: Milan Broz <mbroz@redhat.com>

Milan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder() support
  2011-01-14 21:09             ` Milan Broz
@ 2011-01-17  0:18               ` Jun'ichi Nomura
  0 siblings, 0 replies; 44+ messages in thread
From: Jun'ichi Nomura @ 2011-01-17  0:18 UTC (permalink / raw)
  To: Milan Broz, Tejun Heo
  Cc: Jens Axboe, Valdis.Kletnieks, Alexander Viro, Neil Brown,
	linux-kernel, linux-fsdevel, linux-raid,
	device-mapper development, Kay Sievers, Alasdair G Kergon

On 01/15/11 06:09, Milan Broz wrote:
> On 01/14/2011 05:10 PM, Tejun Heo wrote:
>> Commit e09b457b (block: simplify holder symlink handling) incorrectly
>> assumed that there is only one link at maximum.  dm may use multiple
>> links and expects block layer to track reference count for each link,
>> which is different from and unrelated to the exclusive device holder
>> identified by @holder when the device is opened.
>>
>> Remove the single holder assumption and automatic removal of the link
>> and revive the per-link reference count tracking.  The code
>> essentially behaves the same as before commit e09b457b sans the
>> unnecessary kobject reference count dancing.
>>
>> While at it, note that this facility should not be used by anyone else
>> than the current ones.  Sysfs symlinks shouldn't be abused like this
>> and the whole thing doesn't belong in the block layer at all.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Reported-by: Milan Broz <mbroz@redhat.com>
>> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
>> Cc: Neil Brown <neilb@suse.de>
>> Cc: linux-raid@vger.kernel.org
>> Cc: Kay Sievers <kay.sievers@vrfy.org>
>> ---
>> Thanks for the test commands.  They were very helpful.  Can you please
>> test this one?
> 
> Hi,
> 
> yes, this one works for me. I run full lvm2 testsuite and no warnings.
> Thanks!
> 
> Tested-by: Milan Broz <mbroz@redhat.com>

Thanks Tejun, Milan!
If it passed my quick test and the lvm2 testsuite,
I have nothing to add here.
And the code looks ok, too.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2011-01-17  0:18 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-12 17:34 linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac() Valdis.Kletnieks
2011-01-13  0:23 ` Milan Broz
2011-01-13  2:19   ` Jun'ichi Nomura
2011-01-13 11:06     ` Tejun Heo
2011-01-13 11:26       ` [dm-devel] " Milan Broz
2011-01-13 12:27         ` Karel Zak
2011-01-13 13:12           ` Tejun Heo
2011-01-13 13:26             ` Karel Zak
2011-01-13 13:37               ` Tejun Heo
2011-01-13 13:52                 ` Tejun Heo
2011-01-13 13:58                 ` Milan Broz
2011-01-13 14:11                   ` Tejun Heo
2011-01-13 14:25                     ` Milan Broz
2011-01-13 14:30                       ` Tejun Heo
2011-01-13 14:43                         ` Kay Sievers
2011-01-13 15:03                           ` Milan Broz
2011-01-14  7:38                             ` Jun'ichi Nomura
2011-01-13 15:59                           ` Karel Zak
2011-01-13 15:59                             ` [dm-devel] " Karel Zak
2011-01-13 16:10                             ` Kay Sievers
2011-01-14 15:07                               ` Karel Zak
2011-01-14 15:07                                 ` Karel Zak
2011-01-14 15:23                                 ` Kay Sievers
2011-01-14 15:23                                   ` Kay Sievers
2011-01-13 14:45                         ` Theodore Tso
2011-01-13 20:18                           ` NeilBrown
2011-01-13 20:41                             ` Ted Ts'o
2011-01-14 16:20                               ` Tejun Heo
2011-01-14 16:20                                 ` Tejun Heo
2011-01-14 17:59                                 ` Ted Ts'o
2011-01-14 18:23                                   ` Tejun Heo
2011-01-14 18:23                                   ` Tejun Heo
2011-01-14 18:23                                   ` Tejun Heo
2011-01-14 18:23                                   ` [dm-devel] " Tejun Heo
2011-01-14 16:20                               ` Tejun Heo
2011-01-13 14:49                         ` Milan Broz
2011-01-14 16:35                           ` Tejun Heo
2011-01-13 17:21     ` [PATCH] block: restore multiple bd_link_disk_holder() support Tejun Heo
2011-01-13 18:42       ` Milan Broz
2011-01-14  7:31         ` Jun'ichi Nomura
2011-01-14 16:10           ` [PATCH UPDATED] " Tejun Heo
2011-01-14 16:10             ` Tejun Heo
2011-01-14 21:09             ` Milan Broz
2011-01-17  0:18               ` Jun'ichi Nomura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.