linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2)
@ 2006-02-22 16:06 Jun'ichi Nomura
  2006-02-22 16:13 ` [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2) Jun'ichi Nomura
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 16:06 UTC (permalink / raw)
  To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, Greg KH
  Cc: linux-kernel, device-mapper development

Hello,

This is a revised set of pathces which provides common
representation of dependencies between stacked devices (dm and md)
in sysfs.

Variants of bd_claim/bd_release are added to accept a kobject
and create symlinks between the claimed bdev and the holder.

dm/md will give a child of its gendisk kobject to bd_claim.
For example, if dm-0 maps to sda, we have the following symlinks;
   /sys/block/dm-0/slaves/sda --> /sys/block/sda
   /sys/block/sda/holders/dm-0 --> /sys/block/dm-0

Comments are welcome.

A few points I would appreciate comments/reviews from maintainers:
  About sysfs
    - I confirmed sysfs_remove_symlink() and kobject_del() don't
      allocate memory in 2.6.15 and it seems true on the git head.
      I would like to make sure it's true in future versions of kernel
      because they are called during device-mapper's table swapping
      where I/O to free memory could deadlock on the dm device.
      What is the recommended way to do that?
      Or can I just expect these functions will not allocate memory
      in future versions of kernel?
  About dm
    - To get a reference to mapped_device, table_load() do
      dm_get() before populating table. It will dm_put() when
      the table is being discarded or the table is being activated.
  About md
    - Rather than carrying mddev pointer around, bd_claim is now
      made twice. First is not changed at lock_rdev().
      The second is at bind_rdev_to_array() where kobject is passed
      and symlinks are created.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-22 16:06 [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Jun'ichi Nomura
@ 2006-02-22 16:13 ` Jun'ichi Nomura
  2006-02-22 18:48   ` Greg KH
  2006-02-22 16:13 ` [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2) Jun'ichi Nomura
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 16:13 UTC (permalink / raw)
  To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, Greg KH
  Cc: linux-kernel, device-mapper development

[-- Attachment #1: Type: text/plain, Size: 186 bytes --]

This patch adds bd_claim_by_kobject and bd_release_from_kobject
which create/remove symlinks between the claimed bdev and
the holder.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

[-- Attachment #2: common.patch --]
[-- Type: text/x-patch, Size: 8189 bytes --]

Adding bd_claim_by_kobject/bd_release_from_kobject for use by
stacked device drivers (dm and md)

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

--- linux-2.6.15/include/linux/fs.h	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/include/linux/fs.h	2006-02-21 19:04:46.000000000 -0500
@@ -373,6 +373,8 @@ struct block_device {
 	struct list_head	bd_inodes;
 	void *			bd_holder;
 	int			bd_holders;
+	struct kobject		bd_holder_dir;
+	struct list_head	bd_holder_list;
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
 	struct hd_struct *	bd_part;
@@ -1351,6 +1353,8 @@ extern int blkdev_get(struct block_devic
 extern int blkdev_put(struct block_device *);
 extern int bd_claim(struct block_device *, void *);
 extern void bd_release(struct block_device *);
+extern int bd_claim_by_kobject(struct block_device *, void *, struct kobject *);
+extern void bd_release_from_kobject(struct block_device *, struct kobject *);
 
 /* fs/char_dev.c */
 extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *);
--- linux-2.6.15/include/linux/genhd.h	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/include/linux/genhd.h	2006-02-21 11:38:01.000000000 -0500
@@ -114,6 +114,7 @@ struct gendisk {
 	int number;			/* more of the same */
 	struct device *driverfs_dev;
 	struct kobject kobj;
+	struct kobject slave_dir;
 
 	struct timer_rand_state *random;
 	int policy;
--- linux-2.6.15/fs/block_dev.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/fs/block_dev.c	2006-02-22 09:48:37.000000000 -0500
@@ -443,7 +443,159 @@ void bd_forget(struct inode *inode)
 	spin_unlock(&bdev_lock);
 }
 
-int bd_claim(struct block_device *bdev, void *holder)
+/*
+ * Functions for bd_claim_by_kobject / bd_release_from_kobject
+ *
+ *     If a kobject is passed to bd_claim_by_kobject() 
+ *     and the kobject has a parent directory,
+ *     following symlinks are created:
+ *        o from the kobject to the claimed bdev
+ *        o from "holders" directory of the bdev to the parent of the kobject
+ *     bd_release_from_kobject() removes these symlinks.
+ *
+ *     Example:
+ *        If /dev/dm-0 maps to /dev/sda, kobject corresponding to
+ *        /sys/block/dm-0/slaves is passed to bd_claim_by_kobject(), then:
+ *           /sys/block/dm-0/slaves/sda --> /sys/block/sda
+ *           /sys/block/sda/holders/dm-0 --> /sys/block/dm-0
+ */
+
+static inline struct kobject * bdev_get_kobj(struct block_device *bdev)
+{
+	if (!bdev)
+		return NULL;
+	else if (bdev->bd_contains != bdev)
+		return kobject_get(&bdev->bd_part->kobj);
+	else
+		return kobject_get(&bdev->bd_disk->kobj);
+}
+
+static inline void add_symlink(struct kobject *from, struct kobject *to)
+{
+	if (!from || !to)
+		return;
+	sysfs_create_link(from, to, kobject_name(to));
+}
+
+static inline void del_symlink(struct kobject *from, struct kobject *to)
+{
+	if (!from || !to)
+		return;
+	sysfs_remove_link(from, kobject_name(to));
+}
+
+static void link_bd_holder(struct block_device *bdev, struct kobject *holder)
+{
+	struct kobject *kobj;
+
+	if (!holder->parent)
+		return;
+
+	kobj = bdev_get_kobj(bdev);
+	add_symlink(holder, kobj);
+	add_symlink(&bdev->bd_holder_dir, holder->parent);
+	kobject_put(kobj);
+}
+
+static void unlink_bd_holder(struct block_device *bdev, struct kobject *holder)
+{
+	struct kobject *kobj;
+
+	if (!holder->parent)
+		return;
+
+	kobj = bdev_get_kobj(bdev);
+	del_symlink(holder, kobj);
+	del_symlink(&bdev->bd_holder_dir, holder->parent);
+	kobject_put(kobj);
+}
+
+/* This is a mere directory in sysfs. No methods are needed. */
+static struct kobj_type bd_holder_ktype = {
+	.release	= NULL,
+	.sysfs_ops	= NULL,
+	.default_attrs	= NULL,
+};
+
+static inline void add_holder_dir(struct block_device *bdev)
+{
+	struct kobject *kobj = &bdev->bd_holder_dir;
+
+	kobj->ktype = &bd_holder_ktype;
+	kobject_set_name(kobj, "holders");
+	kobj->parent = bdev_get_kobj(bdev);
+	kobject_init(kobj);
+	kobject_add(kobj);
+	kobject_put(kobj->parent);
+}
+
+static inline void del_holder_dir(struct block_device *bdev)
+{
+	/*
+	 * Don't kobject_unregister to avoid memory allocation
+	 * in kobject_hotplug.
+	 */
+	kobject_del(&bdev->bd_holder_dir);
+	kobject_put(&bdev->bd_holder_dir);
+}
+
+/* bd_holder_list is protected by bdev_lock */
+struct bd_holder {
+	struct list_head list;	/* chain of holders of the bdev */
+	int count;		/* references from the holder */
+	struct kobject *kobj;	/* holder kobject */
+};
+
+static int add_bd_holder(struct block_device *bdev, struct kobject *kobj)
+{
+        struct bd_holder *bo;
+
+	list_for_each_entry(bo, &bdev->bd_holder_list, list) {
+		if (bo->kobj == kobj) {
+			bo->count++;
+			return 0;
+		}
+	}
+
+	if (list_empty(&bdev->bd_holder_list))
+		add_holder_dir(bdev);
+
+	bo = kmalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	bo->count = 1;
+	bo->kobj = kobj;
+	list_add_tail(&bo->list, &bdev->bd_holder_list);
+	link_bd_holder(bdev, kobj);
+
+	return 0;
+}
+
+static int del_bd_holder(struct block_device *bdev, struct kobject *kobj)
+{
+	struct bd_holder *bo;
+
+	list_for_each_entry(bo, &bdev->bd_holder_list, list) {
+		if (bo->kobj == kobj) {
+			bo->count--;
+			BUG_ON(bo->count < 0);
+			if (!bo->count) {
+				unlink_bd_holder(bdev, kobj);
+				list_del(&bo->list);
+				kfree(bo);
+			}
+			break;
+		}
+	}
+
+	if (list_empty(&bdev->bd_holder_list))
+		del_holder_dir(bdev);
+
+	return 0;
+}
+
+int bd_claim_by_kobject(struct block_device *bdev, void *holder, struct kobject *kobj)
 {
 	int res;
 	spin_lock(&bdev_lock);
@@ -464,6 +616,9 @@ int bd_claim(struct block_device *bdev, 
 		res = 0;	 /* is a partition of an un-held device */
 
 	/* now impose change */
+	if (res == 0 && kobj)
+		res = add_bd_holder(bdev, kobj);
+
 	if (res==0) {
 		/* note that for a whole device bd_holders
 		 * will be incremented twice, and bd_holder will
@@ -478,11 +633,20 @@ int bd_claim(struct block_device *bdev, 
 	return res;
 }
 
+EXPORT_SYMBOL(bd_claim_by_kobject);
+
+int bd_claim(struct block_device *bdev, void *holder)
+{
+	return bd_claim_by_kobject(bdev, holder, NULL);
+}
+
 EXPORT_SYMBOL(bd_claim);
 
-void bd_release(struct block_device *bdev)
+void bd_release_from_kobject(struct block_device *bdev, struct kobject *kobj)
 {
 	spin_lock(&bdev_lock);
+	if (kobj)
+		del_bd_holder(bdev, kobj);
 	if (!--bdev->bd_contains->bd_holders)
 		bdev->bd_contains->bd_holder = NULL;
 	if (!--bdev->bd_holders)
@@ -490,6 +654,13 @@ void bd_release(struct block_device *bde
 	spin_unlock(&bdev_lock);
 }
 
+EXPORT_SYMBOL(bd_release_from_kobject);
+
+void bd_release(struct block_device *bdev)
+{
+	bd_release_from_kobject(bdev, NULL);
+}
+
 EXPORT_SYMBOL(bd_release);
 
 /*
@@ -578,6 +749,7 @@ static int do_open(struct block_device *
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
 		bdev->bd_contains = bdev;
+		INIT_LIST_HEAD(&bdev->bd_holder_list);
 		if (!part) {
 			struct backing_dev_info *bdi;
 			if (disk->fops->open) {
--- linux-2.6.15/fs/partitions/check.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/fs/partitions/check.c	2006-02-21 14:30:23.000000000 -0500
@@ -293,6 +293,13 @@ struct kobj_type ktype_part = {
 	.sysfs_ops	= &part_sysfs_ops,
 };
 
+/* This is a mere directory in sysfs. No methods are needed. */
+static struct kobj_type slave_ktype = {
+	.release	= NULL,
+	.sysfs_ops	= NULL,
+	.default_attrs	= NULL,
+};
+
 void delete_partition(struct gendisk *disk, int part)
 {
 	struct hd_struct *p = disk->part[part-1];
@@ -343,6 +350,12 @@ static void disk_sysfs_symlinks(struct g
 		sysfs_create_link(&disk->kobj,&target->kobj,"device");
 		sysfs_create_link(&target->kobj,&disk->kobj,"block");
 	}
+
+	/* create subdirectory for symlinks to underlying device */
+	disk->slave_dir.ktype = &slave_ktype;
+	kobject_set_name(&disk->slave_dir, "slaves");
+	disk->slave_dir.parent = &disk->kobj;
+	kobject_register(&disk->slave_dir);
 }
 
 /* Not exported, helper to add_disk(). */
@@ -460,6 +473,7 @@ void del_gendisk(struct gendisk *disk)
 
 	devfs_remove_disk(disk);
 
+	kobject_unregister(&disk->slave_dir);
 	if (disk->driverfs_dev) {
 		sysfs_remove_link(&disk->kobj, "device");
 		sysfs_remove_link(&disk->driverfs_dev->kobj, "block");

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
  2006-02-22 16:06 [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Jun'ichi Nomura
  2006-02-22 16:13 ` [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2) Jun'ichi Nomura
@ 2006-02-22 16:13 ` Jun'ichi Nomura
  2006-02-22 16:34   ` Alasdair G Kergon
  2006-02-22 16:13 ` [PATCH 3/3] sysfs representation of stacked devices (md) (rev.2) Jun'ichi Nomura
  2006-02-22 18:47 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Greg KH
  3 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 16:13 UTC (permalink / raw)
  To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, Greg KH
  Cc: linux-kernel, device-mapper development

[-- Attachment #1: Type: text/plain, Size: 204 bytes --]

This patch modifies dm driver to call bd_claim_by_kobject
and bd_release_from_kobject.
To do that, reference to the mapped_device is added in
dm_table.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

[-- Attachment #2: dm.patch --]
[-- Type: text/x-patch, Size: 5475 bytes --]

Exporting stacked device relationship to sysfs (dm)

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

--- linux-2.6.15/drivers/md/dm.h	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm.h	2006-02-21 16:49:22.000000000 -0500
@@ -102,6 +102,8 @@ int dm_table_create(struct dm_table **re
 
 void dm_table_get(struct dm_table *t);
 void dm_table_put(struct dm_table *t);
+void dm_table_set_md(struct dm_table *t, struct mapped_device *md);
+struct mapped_device *dm_table_device(struct dm_table *t);
 
 int dm_table_add_target(struct dm_table *t, const char *type,
 			sector_t start,	sector_t len, char *params);
--- linux-2.6.15/drivers/md/dm-ioctl.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm-ioctl.c	2006-02-21 16:51:30.000000000 -0500
@@ -228,6 +228,16 @@ static int dm_hash_insert(const char *na
 	return -EBUSY;
 }
 
+/* called when the populated table is no longer needed */
+static void __release_new_map(struct dm_table *t)
+{
+	struct mapped_device *md = dm_table_device(t);
+
+	dm_table_put(t);
+	if (md)
+		dm_put(md);
+}
+
 static void __hash_remove(struct hash_cell *hc)
 {
 	struct dm_table *table;
@@ -246,7 +256,7 @@ static void __hash_remove(struct hash_ce
 
 	dm_put(hc->md);
 	if (hc->new_map)
-		dm_table_put(hc->new_map);
+		__release_new_map(hc->new_map);
 	free_cell(hc);
 }
 
@@ -719,6 +729,7 @@ static int do_resume(struct dm_ioctl *pa
 	dm_get(md);
 
 	new_map = hc->new_map;
+	dm_put(md); /* drop reference by dm_table_set_md() */
 	hc->new_map = NULL;
 	param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
 
@@ -963,9 +974,19 @@ static int table_load(struct dm_ioctl *p
 	if (r)
 		return r;
 
+	down_read(&_hash_lock);
+	hc = __find_device_hash_cell(param);
+	if (!hc) {
+		DMWARN("device doesn't appear to be in the dev hash table.");
+		up_read(&_hash_lock);
+		return -ENXIO;
+	}
+	dm_table_set_md(t, hc->md);
+	up_read(&_hash_lock);
+
 	r = populate_table(t, param, param_size);
 	if (r) {
-		dm_table_put(t);
+		__release_new_map(t);
 		return r;
 	}
 
@@ -979,7 +1000,7 @@ static int table_load(struct dm_ioctl *p
 	}
 
 	if (hc->new_map)
-		dm_table_put(hc->new_map);
+		__release_new_map(hc->new_map);
 	hc->new_map = t;
 	param->flags |= DM_INACTIVE_PRESENT_FLAG;
 
@@ -1003,7 +1024,7 @@ static int table_clear(struct dm_ioctl *
 	}
 
 	if (hc->new_map) {
-		dm_table_put(hc->new_map);
+		__release_new_map(hc->new_map);
 		hc->new_map = NULL;
 	}
 
--- linux-2.6.15/drivers/md/dm-table.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm-table.c	2006-02-21 19:11:03.000000000 -0500
@@ -53,6 +53,8 @@ struct dm_table {
 	/* events get handed up using this callback */
 	void (*event_fn)(void *);
 	void *event_context;
+
+	struct mapped_device *md;
 };
 
 /*
@@ -345,7 +347,7 @@ static struct dm_dev *find_device(struct
 /*
  * Open a device so we can use it as a map destination.
  */
-static int open_dev(struct dm_dev *d, dev_t dev)
+static int open_dev(struct dm_dev *d, dev_t dev, struct mapped_device *md)
 {
 	static char *_claim_ptr = "I belong to device-mapper";
 	struct block_device *bdev;
@@ -358,7 +360,7 @@ static int open_dev(struct dm_dev *d, de
 	bdev = open_by_devnum(dev, d->mode);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
-	r = bd_claim(bdev, _claim_ptr);
+	r = bd_claim_by_kobject(bdev, _claim_ptr, &dm_disk(md)->slave_dir);
 	if (r)
 		blkdev_put(bdev);
 	else
@@ -369,12 +371,12 @@ static int open_dev(struct dm_dev *d, de
 /*
  * Close a device that we've been using.
  */
-static void close_dev(struct dm_dev *d)
+static void close_dev(struct dm_dev *d, struct mapped_device *md)
 {
 	if (!d->bdev)
 		return;
 
-	bd_release(d->bdev);
+	bd_release_from_kobject(d->bdev, &dm_disk(md)->slave_dir);
 	blkdev_put(d->bdev);
 	d->bdev = NULL;
 }
@@ -395,7 +397,7 @@ static int check_device_area(struct dm_d
  * careful to leave things as they were if we fail to reopen the
  * device.
  */
-static int upgrade_mode(struct dm_dev *dd, int new_mode)
+static int upgrade_mode(struct dm_dev *dd, int new_mode, struct mapped_device *md)
 {
 	int r;
 	struct dm_dev dd_copy;
@@ -405,9 +407,9 @@ static int upgrade_mode(struct dm_dev *d
 
 	dd->mode |= new_mode;
 	dd->bdev = NULL;
-	r = open_dev(dd, dev);
+	r = open_dev(dd, dev, md);
 	if (!r)
-		close_dev(&dd_copy);
+		close_dev(&dd_copy, md);
 	else
 		*dd = dd_copy;
 
@@ -450,7 +452,7 @@ static int __table_get_device(struct dm_
 		dd->mode = mode;
 		dd->bdev = NULL;
 
-		if ((r = open_dev(dd, dev))) {
+		if ((r = open_dev(dd, dev, t->md))) {
 			kfree(dd);
 			return r;
 		}
@@ -461,7 +463,7 @@ static int __table_get_device(struct dm_
 		list_add(&dd->list, &t->devices);
 
 	} else if (dd->mode != (mode | dd->mode)) {
-		r = upgrade_mode(dd, mode);
+		r = upgrade_mode(dd, mode, t->md);
 		if (r)
 			return r;
 	}
@@ -536,7 +538,7 @@ int dm_get_device(struct dm_target *ti, 
 void dm_put_device(struct dm_target *ti, struct dm_dev *dd)
 {
 	if (atomic_dec_and_test(&dd->count)) {
-		close_dev(dd);
+		close_dev(dd, ti->table->md);
 		list_del(&dd->list);
 		kfree(dd);
 	}
@@ -945,6 +947,22 @@ int dm_table_flush_all(struct dm_table *
 	return ret;
 }
 
+void dm_table_set_md(struct dm_table *t, struct mapped_device *md)
+{
+	if (t->md) {
+		dm_put(t->md);
+		t->md = NULL;
+	}
+	if (md) {
+		dm_get(md);
+		t->md = md;
+	}
+}
+struct mapped_device *dm_table_device(struct dm_table *t)
+{
+	return t->md;
+}
+
 EXPORT_SYMBOL(dm_vcalloc);
 EXPORT_SYMBOL(dm_get_device);
 EXPORT_SYMBOL(dm_put_device);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sysfs representation of stacked devices (md) (rev.2)
  2006-02-22 16:06 [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Jun'ichi Nomura
  2006-02-22 16:13 ` [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2) Jun'ichi Nomura
  2006-02-22 16:13 ` [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2) Jun'ichi Nomura
@ 2006-02-22 16:13 ` Jun'ichi Nomura
  2006-02-22 18:47 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Greg KH
  3 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 16:13 UTC (permalink / raw)
  To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, Greg KH
  Cc: linux-kernel, device-mapper development

[-- Attachment #1: Type: text/plain, Size: 139 bytes --]

This patch modifies md driver to call bd_claim_by_kobject
and bd_release_from_kobject.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

[-- Attachment #2: md.patch --]
[-- Type: text/x-patch, Size: 771 bytes --]

Exporting stacked device relationship to sysfs (md)

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

--- linux-2.6.15/drivers/md/md.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/md.c	2006-02-21 19:11:16.000000000 -0500
@@ -1226,6 +1226,7 @@ static int bind_rdev_to_array(mdk_rdev_t
 	else
 		ko = &rdev->bdev->bd_disk->kobj;
 	sysfs_create_link(&rdev->kobj, ko, "block");
+	bd_claim_by_kobject(rdev->bdev, rdev, &mddev->gendisk->slave_dir);
 	return 0;
 }
 
@@ -1236,6 +1237,7 @@ static void unbind_rdev_from_array(mdk_r
 		MD_BUG();
 		return;
 	}
+	bd_release_from_kobject(rdev->bdev, &rdev->mddev->gendisk->slave_dir);
 	list_del_init(&rdev->same_set);
 	printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
  2006-02-22 16:13 ` [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2) Jun'ichi Nomura
@ 2006-02-22 16:34   ` Alasdair G Kergon
  2006-02-22 17:13     ` Jun'ichi Nomura
  0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-22 16:34 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, Greg KH,
	linux-kernel, device-mapper development, Mike Anderson

On Wed, Feb 22, 2006 at 11:13:06AM -0500, Jun'ichi Nomura wrote:
> This patch modifies dm driver to call bd_claim_by_kobject
> and bd_release_from_kobject.
> To do that, reference to the mapped_device is added in
> dm_table.
 
This patch needs splitting up so that independent changes can be 
considered separately.

c.f. The proposal from Mike Anderson (repeated below) which I prefer 
because it makes it clear that a table always belongs to exactly one md.

Exposing dm_table_set_md() suggests a table can have its owning md 
changed - it can't.

Alasdair
-- 
agk@redhat.com


 This patch adds a mapped_device member to the dm_table struct.

 Signed-off-by: Mike Anderson <andmike@us.ibm.com>

 drivers/md/dm-ioctl.c |   32 +++++++++++++++++++-------------
 drivers/md/dm-table.c |   12 +++++++++++-
 drivers/md/dm.h       |    4 +++-
 3 files changed, 33 insertions(+), 15 deletions(-)

Index: sas-2.6-patched/drivers/md/dm.h
===================================================================
--- sas-2.6-patched.orig/drivers/md/dm.h	2006-02-20 01:05:32.000000000 -0800
+++ sas-2.6-patched/drivers/md/dm.h	2006-02-20 01:42:29.000000000 -0800
@@ -99,7 +99,8 @@ int dm_suspended(struct mapped_device *m
  * Functions for manipulating a table.  Tables are also reference
  * counted.
  *---------------------------------------------------------------*/
-int dm_table_create(struct dm_table **result, int mode, unsigned num_targets);
+int dm_table_create(struct dm_table **result, int mode, unsigned
+		    num_targets, struct mapped_device *md);
 
 void dm_table_get(struct dm_table *t);
 void dm_table_put(struct dm_table *t);
@@ -123,6 +124,7 @@ void dm_table_resume_targets(struct dm_t
 int dm_table_any_congested(struct dm_table *t, int bdi_bits);
 void dm_table_unplug_all(struct dm_table *t);
 int dm_table_flush_all(struct dm_table *t);
+struct mapped_device *dm_table_get_md(struct dm_table *t);
 
 /*-----------------------------------------------------------------
  * A registry of target types.
Index: sas-2.6-patched/drivers/md/dm-table.c
===================================================================
--- sas-2.6-patched.orig/drivers/md/dm-table.c	2006-02-20 01:05:32.000000000 -0800
+++ sas-2.6-patched/drivers/md/dm-table.c	2006-02-20 01:42:29.000000000 -0800
@@ -33,6 +33,7 @@ struct dm_table {
 	unsigned int num_allocated;
 	sector_t *highs;
 	struct dm_target *targets;
+	struct mapped_device *md;
 
 	/*
 	 * Indicates the rw permissions for the new logical
@@ -204,7 +205,8 @@ static int alloc_targets(struct dm_table
 	return 0;
 }
 
-int dm_table_create(struct dm_table **result, int mode, unsigned num_targets)
+int dm_table_create(struct dm_table **result, int mode,
+		    unsigned num_targets, struct mapped_device *md)
 {
 	struct dm_table *t = kmalloc(sizeof(*t), GFP_KERNEL);
 
@@ -227,6 +229,7 @@ int dm_table_create(struct dm_table **re
 	}
 
 	t->mode = mode;
+	t->md = md;
 	*result = t;
 	return 0;
 }
@@ -945,6 +948,12 @@ int dm_table_flush_all(struct dm_table *
 	return ret;
 }
 
+struct mapped_device *dm_table_get_md(struct dm_table *t)
+{
+	dm_get(t->md);
+	return t->md;
+}
+
 EXPORT_SYMBOL(dm_vcalloc);
 EXPORT_SYMBOL(dm_get_device);
 EXPORT_SYMBOL(dm_put_device);
@@ -955,3 +964,4 @@ EXPORT_SYMBOL(dm_table_put);
 EXPORT_SYMBOL(dm_table_get);
 EXPORT_SYMBOL(dm_table_unplug_all);
 EXPORT_SYMBOL(dm_table_flush_all);
+EXPORT_SYMBOL(dm_table_get_md);
Index: sas-2.6-patched/drivers/md/dm-ioctl.c
===================================================================
--- sas-2.6-patched.orig/drivers/md/dm-ioctl.c	2006-02-20 01:05:32.000000000 -0800
+++ sas-2.6-patched/drivers/md/dm-ioctl.c	2006-02-20 01:42:29.000000000 -0800
@@ -972,27 +972,26 @@ static int populate_table(struct dm_tabl
 
 static int table_load(struct dm_ioctl *param, size_t param_size)
 {
-	int r;
+	int r = -ENXIO;
 	struct hash_cell *hc;
 	struct dm_table *t;
 
-	r = dm_table_create(&t, get_mode(param), param->target_count);
-	if (r)
-		return r;
-
-	r = populate_table(t, param, param_size);
-	if (r) {
-		dm_table_put(t);
-		return r;
-	}
 
 	down_write(&_hash_lock);
 	hc = __find_device_hash_cell(param);
 	if (!hc) {
 		DMWARN("device doesn't appear to be in the dev hash table.");
-		up_write(&_hash_lock);
-		dm_table_put(t);
-		return -ENXIO;
+		goto out;
+	}
+
+	r = dm_table_create(&t, get_mode(param), param->target_count,
+			    hc->md);
+	if (r)
+		goto out;
+
+	r = populate_table(t, param, param_size);
+	if (r) {
+		goto table_out;
 	}
 
 	if (hc->new_map)
@@ -1001,6 +1000,13 @@ static int table_load(struct dm_ioctl *p
 	param->flags |= DM_INACTIVE_PRESENT_FLAG;
 
 	r = __dev_status(hc->md, param);
+
+	up_write(&_hash_lock);
+	return r;
+
+table_out:
+	dm_table_put(t);
+out:
 	up_write(&_hash_lock);
 	return r;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
  2006-02-22 16:34   ` Alasdair G Kergon
@ 2006-02-22 17:13     ` Jun'ichi Nomura
  2006-02-22 18:13       ` Alasdair G Kergon
  0 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 17:13 UTC (permalink / raw)
  To: Alasdair G Kergon
  Cc: Neil Brown, Lars Marowsky-Bree, Greg KH, linux-kernel,
	device-mapper development, Mike Anderson

Hi,

Alasdair G Kergon wrote:
 > This patch needs splitting up so that independent changes can be
 > considered separately.
 >
 > c.f. The proposal from Mike Anderson (repeated below) which I prefer
 > because it makes it clear that a table always belongs to exactly one md.

I like his proposed patch.
The interface is useful for my purpose too and moving table
creation inside _hash_lock means I don't need dm_get() neither.

Is it going to be pushed to upstream?
I'll remake my patch based on it.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
  2006-02-22 17:13     ` Jun'ichi Nomura
@ 2006-02-22 18:13       ` Alasdair G Kergon
  2006-02-22 19:32         ` Jun'ichi Nomura
  0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-22 18:13 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Alasdair G Kergon, Neil Brown, Lars Marowsky-Bree, Greg KH,
	linux-kernel, device-mapper development, Mike Anderson

On Wed, Feb 22, 2006 at 12:13:56PM -0500, Jun'ichi Nomura wrote:
> Alasdair G Kergon wrote:
> > This patch needs splitting up so that independent changes can be
> > considered separately.
> > c.f. The proposal from Mike Anderson (repeated below) which I prefer
> > because it makes it clear that a table always belongs to exactly one md.
 
> I like his proposed patch.
> The interface is useful for my purpose too and moving table
> creation inside _hash_lock means I don't need dm_get() neither.

The global _hash_lock should not be held (thereby locking out most dm ioctl 
operations on any device) while the slow populate_table() runs.

I'm trying out a variant of the patch that drops and reacquires that lock.
 
Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2)
  2006-02-22 16:06 [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Jun'ichi Nomura
                   ` (2 preceding siblings ...)
  2006-02-22 16:13 ` [PATCH 3/3] sysfs representation of stacked devices (md) (rev.2) Jun'ichi Nomura
@ 2006-02-22 18:47 ` Greg KH
  3 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2006-02-22 18:47 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

On Wed, Feb 22, 2006 at 11:06:24AM -0500, Jun'ichi Nomura wrote:
> Hello,
> 
> This is a revised set of pathces which provides common
> representation of dependencies between stacked devices (dm and md)
> in sysfs.
> 
> Variants of bd_claim/bd_release are added to accept a kobject
> and create symlinks between the claimed bdev and the holder.
> 
> dm/md will give a child of its gendisk kobject to bd_claim.
> For example, if dm-0 maps to sda, we have the following symlinks;
>    /sys/block/dm-0/slaves/sda --> /sys/block/sda
>    /sys/block/sda/holders/dm-0 --> /sys/block/dm-0
> 
> Comments are welcome.
> 
> A few points I would appreciate comments/reviews from maintainers:
>   About sysfs
>     - I confirmed sysfs_remove_symlink() and kobject_del() don't
>       allocate memory in 2.6.15 and it seems true on the git head.
>       I would like to make sure it's true in future versions of kernel
>       because they are called during device-mapper's table swapping
>       where I/O to free memory could deadlock on the dm device.
>       What is the recommended way to do that?

But it can possibly sleep.

Hm, wait, the put_device stuff can possibly sleep, the "raw"
kobject_del() stuff looks safe.  Either way, they don't create new
memory, unless you do something really wierd in your release callback.

So you should be safe here.

But if you want to be absolutly safe, look at the thread on lkml about
changing the scsi code to do the final release in a non-interrupt
context.  That looks like it might be the same thing you want to do
here to guarantee that nothing bad happens.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-22 16:13 ` [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2) Jun'ichi Nomura
@ 2006-02-22 18:48   ` Greg KH
  2006-02-22 22:22     ` Jun'ichi Nomura
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2006-02-22 18:48 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

On Wed, Feb 22, 2006 at 11:13:00AM -0500, Jun'ichi Nomura wrote:
> +/* This is a mere directory in sysfs. No methods are needed. */
> +static struct kobj_type bd_holder_ktype = {
> +	.release	= NULL,
> +	.sysfs_ops	= NULL,
> +	.default_attrs	= NULL,
> +};

That doesn't look right.  You always need a release function.

> +static inline void add_holder_dir(struct block_device *bdev)
> +{
> +	struct kobject *kobj = &bdev->bd_holder_dir;
> +
> +	kobj->ktype = &bd_holder_ktype;
> +	kobject_set_name(kobj, "holders");
> +	kobj->parent = bdev_get_kobj(bdev);
> +	kobject_init(kobj);
> +	kobject_add(kobj);
> +	kobject_put(kobj->parent);
> +}
> +
> +static inline void del_holder_dir(struct block_device *bdev)
> +{
> +	/*
> +	 * Don't kobject_unregister to avoid memory allocation
> +	 * in kobject_hotplug.
> +	 */
> +	kobject_del(&bdev->bd_holder_dir);
> +	kobject_put(&bdev->bd_holder_dir);
> +}

No, do it correctly please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
  2006-02-22 18:13       ` Alasdair G Kergon
@ 2006-02-22 19:32         ` Jun'ichi Nomura
  0 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 19:32 UTC (permalink / raw)
  To: Alasdair G Kergon
  Cc: Neil Brown, Lars Marowsky-Bree, Greg KH, linux-kernel,
	device-mapper development, Mike Anderson

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

Hi,

Alasdair G Kergon wrote:
> The global _hash_lock should not be held (thereby locking out most dm ioctl 
> operations on any device) while the slow populate_table() runs.
> 
> I'm trying out a variant of the patch that drops and reacquires that lock.

OK, thanks for the confirmation.

I guess the variant itself will need dm_get() to avoid md
being stolen. Depending on that, I might change my patch.

Attached is a revised patch based on Mike Anderson's patch.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

[-- Attachment #2: dm.patch --]
[-- Type: text/x-patch, Size: 2446 bytes --]

--- linux-2.6.15/drivers/md/dm-table.c	2006-02-22 11:25:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm-table.c	2006-02-22 11:32:54.000000000 -0500
@@ -348,7 +348,7 @@ static struct dm_dev *find_device(struct
 /*
  * Open a device so we can use it as a map destination.
  */
-static int open_dev(struct dm_dev *d, dev_t dev)
+static int open_dev(struct dm_dev *d, dev_t dev, struct mapped_device *md)
 {
 	static char *_claim_ptr = "I belong to device-mapper";
 	struct block_device *bdev;
@@ -361,7 +361,7 @@ static int open_dev(struct dm_dev *d, de
 	bdev = open_by_devnum(dev, d->mode);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
-	r = bd_claim(bdev, _claim_ptr);
+	r = bd_claim_by_kobject(bdev, _claim_ptr, &dm_disk(md)->slave_dir);
 	if (r)
 		blkdev_put(bdev);
 	else
@@ -372,12 +372,12 @@ static int open_dev(struct dm_dev *d, de
 /*
  * Close a device that we've been using.
  */
-static void close_dev(struct dm_dev *d)
+static void close_dev(struct dm_dev *d, struct mapped_device *md)
 {
 	if (!d->bdev)
 		return;
 
-	bd_release(d->bdev);
+	bd_release_from_kobject(d->bdev, &dm_disk(md)->slave_dir);
 	blkdev_put(d->bdev);
 	d->bdev = NULL;
 }
@@ -398,7 +398,7 @@ static int check_device_area(struct dm_d
  * careful to leave things as they were if we fail to reopen the
  * device.
  */
-static int upgrade_mode(struct dm_dev *dd, int new_mode)
+static int upgrade_mode(struct dm_dev *dd, int new_mode, struct mapped_device *md)
 {
 	int r;
 	struct dm_dev dd_copy;
@@ -408,9 +408,9 @@ static int upgrade_mode(struct dm_dev *d
 
 	dd->mode |= new_mode;
 	dd->bdev = NULL;
-	r = open_dev(dd, dev);
+	r = open_dev(dd, dev, md);
 	if (!r)
-		close_dev(&dd_copy);
+		close_dev(&dd_copy, md);
 	else
 		*dd = dd_copy;
 
@@ -453,7 +453,7 @@ static int __table_get_device(struct dm_
 		dd->mode = mode;
 		dd->bdev = NULL;
 
-		if ((r = open_dev(dd, dev))) {
+		if ((r = open_dev(dd, dev, t->md))) {
 			kfree(dd);
 			return r;
 		}
@@ -464,7 +464,7 @@ static int __table_get_device(struct dm_
 		list_add(&dd->list, &t->devices);
 
 	} else if (dd->mode != (mode | dd->mode)) {
-		r = upgrade_mode(dd, mode);
+		r = upgrade_mode(dd, mode, t->md);
 		if (r)
 			return r;
 	}
@@ -539,7 +539,7 @@ int dm_get_device(struct dm_target *ti, 
 void dm_put_device(struct dm_target *ti, struct dm_dev *dd)
 {
 	if (atomic_dec_and_test(&dd->count)) {
-		close_dev(dd);
+		close_dev(dd, ti->table->md);
 		list_del(&dd->list);
 		kfree(dd);
 	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-22 18:48   ` Greg KH
@ 2006-02-22 22:22     ` Jun'ichi Nomura
  2006-02-22 22:28       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-22 22:22 UTC (permalink / raw)
  To: Greg KH
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

Hi Greg,

Thanks for comments.

Greg KH wrote:
>>+/* This is a mere directory in sysfs. No methods are needed. */
>>+static struct kobj_type bd_holder_ktype = {
>>+	.release	= NULL,
>>+	.sysfs_ops	= NULL,
>>+	.default_attrs	= NULL,
>>+};
> 
> That doesn't look right.  You always need a release function.

I'll move them out to gendisk/hd_struct creation with proper
release function.

I thought it's correct because NULL release function is
just ignored in kobject_cleanup() and it let outside function
to release the whole structure.
But it seems wrong to embed these additional kobjects in
the structures which are logically separate from them.

>>+static inline void del_holder_dir(struct block_device *bdev)
>>+{
>>+	/*
>>+	 * Don't kobject_unregister to avoid memory allocation
>>+	 * in kobject_hotplug.
>>+	 */
>>+	kobject_del(&bdev->bd_holder_dir);
>>+	kobject_put(&bdev->bd_holder_dir);
>>+}
> 
> No, do it correctly please.

OK, I'll change them to kobject_unregister() and do it
when gendisk/hd_struct is removed.
Then we can avoid possible memory allocation in dm's atomic
operation, too.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-22 22:22     ` Jun'ichi Nomura
@ 2006-02-22 22:28       ` Greg KH
  2006-02-23 19:15         ` Jun'ichi Nomura
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2006-02-22 22:28 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

On Wed, Feb 22, 2006 at 05:22:02PM -0500, Jun'ichi Nomura wrote:
> Hi Greg,
> 
> Thanks for comments.
> 
> Greg KH wrote:
> >>+/* This is a mere directory in sysfs. No methods are needed. */
> >>+static struct kobj_type bd_holder_ktype = {
> >>+	.release	= NULL,
> >>+	.sysfs_ops	= NULL,
> >>+	.default_attrs	= NULL,
> >>+};
> >
> >That doesn't look right.  You always need a release function.
> 
> I'll move them out to gendisk/hd_struct creation with proper
> release function.
> 
> I thought it's correct because NULL release function is
> just ignored in kobject_cleanup() and it let outside function
> to release the whole structure.
> But it seems wrong to embed these additional kobjects in
> the structures which are logically separate from them.
> 
> >>+static inline void del_holder_dir(struct block_device *bdev)
> >>+{
> >>+	/*
> >>+	 * Don't kobject_unregister to avoid memory allocation
> >>+	 * in kobject_hotplug.
> >>+	 */
> >>+	kobject_del(&bdev->bd_holder_dir);
> >>+	kobject_put(&bdev->bd_holder_dir);
> >>+}
> >
> >No, do it correctly please.
> 
> OK, I'll change them to kobject_unregister() and do it
> when gendisk/hd_struct is removed.
> Then we can avoid possible memory allocation in dm's atomic
> operation, too.

That sounds great.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-22 22:28       ` Greg KH
@ 2006-02-23 19:15         ` Jun'ichi Nomura
  2006-02-24  3:40           ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-23 19:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

Hello Greg,

>>>>+/* This is a mere directory in sysfs. No methods are needed. */
>>>>+static struct kobj_type bd_holder_ktype = {
>>>>+	.release	= NULL,
>>>>+	.sysfs_ops	= NULL,
>>>>+	.default_attrs	= NULL,
>>>>+};
>>>
>>>That doesn't look right.  You always need a release function.

I updated the patch based your comments.
Could you take a look at this version whether there's
any problematic use of sysfs/kobjects?

   - I removed embedded child-kobjects from struct block_device
     and struct gendisk which I added in my previous patch.
     Kobject registration occurs when gendisk or hd_struct is
     registered. Release function of the kobject type is added.
   - Reference counting of kobjects is done in much symmetric
     manner than before.
   - Added bd_claim_by_disk/bd_release_from_disk inline functions
     to help proper reference counting.

Thanks,
-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

[-- Attachment #2: common-2.6.16-rc4.patch --]
[-- Type: text/x-patch, Size: 11093 bytes --]

This patch provides a common functions/structures for
sysfs representation of dependencies between stacked devices (dm and md).

Variants of bd_claim/bd_release are added to accept a kobject
(or a gendisk containing it) and create symlinks between the claimed bdev
and the holder.
Other patches for dm/md will make use of these functions.

For example, if dm-0 maps to sda, we have the following symlinks;
    /sys/block/dm-0/slaves/sda --> /sys/block/sda
    /sys/block/sda/holders/dm-0 --> /sys/block/dm-0

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

 fs/block_dev.c        |  196 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/partitions/check.c |   55 ++++++++++++++
 include/linux/fs.h    |    3
 include/linux/genhd.h |   16 ++++
 4 files changed, 268 insertions(+), 2 deletions(-)

--- linux-2.6.16-rc4/include/linux/fs.h	2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/include/linux/fs.h	2006-02-22 22:37:52.000000000 -0500
@@ -401,6 +401,7 @@ struct block_device {
 	struct list_head	bd_inodes;
 	void *			bd_holder;
 	int			bd_holders;
+	struct list_head	bd_holder_list;
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
 	struct hd_struct *	bd_part;
@@ -1380,6 +1381,8 @@ extern int blkdev_get(struct block_devic
 extern int blkdev_put(struct block_device *);
 extern int bd_claim(struct block_device *, void *);
 extern void bd_release(struct block_device *);
+extern int bd_claim_by_kobject(struct block_device *, void *, struct kobject *);
+extern void bd_release_from_kobject(struct block_device *, struct kobject *);
 
 /* fs/char_dev.c */
 extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *);
--- linux-2.6.16-rc4/include/linux/genhd.h	2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/include/linux/genhd.h	2006-02-22 22:37:50.000000000 -0500
@@ -78,6 +78,7 @@ struct hd_struct {
 	sector_t start_sect;
 	sector_t nr_sects;
 	struct kobject kobj;
+	struct kobject *holder_dir;
 	unsigned ios[2], sectors[2];	/* READs and WRITEs */
 	int policy, partno;
 };
@@ -114,6 +115,8 @@ struct gendisk {
 	int number;			/* more of the same */
 	struct device *driverfs_dev;
 	struct kobject kobj;
+	struct kobject *holder_dir;
+	struct kobject *slave_dir;
 
 	struct timer_rand_state *random;
 	int policy;
@@ -421,6 +424,19 @@ static inline struct block_device *bdget
 	return bdget(MKDEV(disk->major, disk->first_minor) + index);
 }
 
+static inline int bd_claim_by_disk(struct block_device *bdev,
+					void *holder, struct gendisk *disk)
+{
+	return bd_claim_by_kobject(bdev, holder, kobject_get(disk->slave_dir));
+}
+
+static inline void bd_release_from_disk(struct block_device *bdev,
+					struct gendisk *disk)
+{
+	bd_release_from_kobject(bdev, disk->slave_dir);
+	kobject_put(disk->slave_dir);
+}
+
 #endif
 
 #endif
--- linux-2.6.16-rc4/fs/block_dev.c	2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/fs/block_dev.c	2006-02-22 23:50:20.000000000 -0500
@@ -269,6 +269,7 @@ static void init_once(void * foo, kmem_c
 		sema_init(&bdev->bd_mount_sem, 1);
 		INIT_LIST_HEAD(&bdev->bd_inodes);
 		INIT_LIST_HEAD(&bdev->bd_list);
+		INIT_LIST_HEAD(&bdev->bd_holder_list);
 		inode_init_once(&ei->vfs_inode);
 	}
 }
@@ -443,7 +444,179 @@ void bd_forget(struct inode *inode)
 	spin_unlock(&bdev_lock);
 }
 
-int bd_claim(struct block_device *bdev, void *holder)
+/*
+ * Functions for bd_claim_by_kobject / bd_release_from_kobject
+ *
+ *     If a kobject is passed to bd_claim_by_kobject() 
+ *     and the kobject has a parent directory,
+ *     following symlinks are created:
+ *        o from the kobject to the claimed bdev
+ *        o from "holders" directory of the bdev to the parent of the kobject
+ *     bd_release_from_kobject() removes these symlinks.
+ *
+ *     Example:
+ *        If /dev/dm-0 maps to /dev/sda, kobject corresponding to
+ *        /sys/block/dm-0/slaves is passed to bd_claim_by_kobject(), then:
+ *           /sys/block/dm-0/slaves/sda --> /sys/block/sda
+ *           /sys/block/sda/holders/dm-0 --> /sys/block/dm-0
+ */
+
+static inline struct kobject * bdev_get_kobj(struct block_device *bdev)
+{
+	if (!bdev)
+		return NULL;
+	else if (bdev->bd_contains != bdev)
+		return kobject_get(&bdev->bd_part->kobj);
+	else
+		return kobject_get(&bdev->bd_disk->kobj);
+}
+
+static inline struct kobject * bdev_get_holder(struct block_device *bdev)
+{
+	if (!bdev)
+		return NULL;
+	else if (bdev->bd_contains != bdev)
+		return kobject_get(bdev->bd_part->holder_dir);
+	else
+		return kobject_get(bdev->bd_disk->holder_dir);
+}
+
+static inline void add_symlink(struct kobject *from, struct kobject *to)
+{
+	if (!from || !to)
+		return;
+	kobject_get(from);
+	kobject_get(to);
+	sysfs_create_link(from, to, kobject_name(to));
+}
+
+static inline void del_symlink(struct kobject *from, struct kobject *to)
+{
+	if (!from || !to)
+		return;
+	sysfs_remove_link(from, kobject_name(to));
+	kobject_put(from);
+	kobject_put(to);
+}
+
+static inline int bd_claim_grab_dirs(struct block_device *bdev,
+			struct kobject *holder,
+			struct kobject **sdir, struct kobject **sdev,
+			struct kobject **hdir, struct kobject **hdev)
+{
+	*sdir = kobject_get(holder);
+	if (!*sdir)
+		return 0;
+
+	*hdev = kobject_get((*sdir)->parent);
+	if (!*hdev)
+		goto fail_put_sdir;
+
+	*sdev = bdev_get_kobj(bdev);
+	if (!*sdev)
+		goto fail_put_hdev;
+
+	*hdir = bdev_get_holder(bdev);
+	if (!*hdir)
+		goto fail_put_sdev;
+
+	return 1;
+
+fail_put_sdev:
+	kobject_put(*sdev);
+fail_put_hdev:
+	kobject_put(*hdev);
+fail_put_sdir:
+	kobject_put(*sdir);
+
+	return 0;
+}
+
+static inline void bd_claim_release_dirs(
+			struct kobject *sdir, struct kobject *sdev,
+			struct kobject *hdir, struct kobject *hdev)
+{
+	kobject_put(hdir);
+	kobject_put(sdev);
+	kobject_put(hdev);
+	kobject_put(sdir);
+}
+
+static void link_bd_holder(struct block_device *bdev, struct kobject *holder)
+{
+	struct kobject *sdir, *sdev, *hdir, *hdev;
+
+	if (bd_claim_grab_dirs(bdev, holder, &sdir, &sdev, &hdir, &hdev)) {
+		add_symlink(sdir, sdev);
+		add_symlink(hdir, hdev);
+		bd_claim_release_dirs(sdir, sdev, hdir, hdev);
+	}
+
+	return;
+}
+
+static void unlink_bd_holder(struct block_device *bdev, struct kobject *holder)
+{
+	struct kobject *sdir, *sdev, *hdir, *hdev;
+
+	if (bd_claim_grab_dirs(bdev, holder, &sdir, &sdev, &hdir, &hdev)) {
+		del_symlink(sdir, sdev);
+		del_symlink(hdir, hdev);
+		bd_claim_release_dirs(sdir, sdev, hdir, hdev);
+	}
+}
+
+/* bd_holder_list is protected by bdev_lock */
+struct bd_holder {
+	struct list_head list;	/* chain of holders of the bdev */
+	int count;		/* references from the holder */
+	struct kobject *kobj;	/* holder kobject */
+};
+
+static int add_bd_holder(struct block_device *bdev, struct kobject *kobj)
+{
+        struct bd_holder *bo;
+
+	list_for_each_entry(bo, &bdev->bd_holder_list, list) {
+		if (bo->kobj == kobj) {
+			bo->count++;
+			return 0;
+		}
+	}
+
+	bo = kmalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	bo->count = 1;
+	bo->kobj = kobj;
+	list_add_tail(&bo->list, &bdev->bd_holder_list);
+	link_bd_holder(bdev, kobj);
+
+	return 0;
+}
+
+static int del_bd_holder(struct block_device *bdev, struct kobject *kobj)
+{
+	struct bd_holder *bo;
+
+	list_for_each_entry(bo, &bdev->bd_holder_list, list) {
+		if (bo->kobj == kobj) {
+			bo->count--;
+			BUG_ON(bo->count < 0);
+			if (!bo->count) {
+				unlink_bd_holder(bdev, kobj);
+				list_del(&bo->list);
+				kfree(bo);
+			}
+			break;
+		}
+	}
+
+	return 0;
+}
+
+int bd_claim_by_kobject(struct block_device *bdev, void *holder, struct kobject *kobj)
 {
 	int res;
 	spin_lock(&bdev_lock);
@@ -464,6 +637,9 @@ int bd_claim(struct block_device *bdev, 
 		res = 0;	 /* is a partition of an un-held device */
 
 	/* now impose change */
+	if (res == 0 && kobj)
+		res = add_bd_holder(bdev, kobj);
+
 	if (res==0) {
 		/* note that for a whole device bd_holders
 		 * will be incremented twice, and bd_holder will
@@ -478,11 +654,20 @@ int bd_claim(struct block_device *bdev, 
 	return res;
 }
 
+EXPORT_SYMBOL(bd_claim_by_kobject);
+
+int bd_claim(struct block_device *bdev, void *holder)
+{
+	return bd_claim_by_kobject(bdev, holder, NULL);
+}
+
 EXPORT_SYMBOL(bd_claim);
 
-void bd_release(struct block_device *bdev)
+void bd_release_from_kobject(struct block_device *bdev, struct kobject *kobj)
 {
 	spin_lock(&bdev_lock);
+	if (kobj)
+		del_bd_holder(bdev, kobj);
 	if (!--bdev->bd_contains->bd_holders)
 		bdev->bd_contains->bd_holder = NULL;
 	if (!--bdev->bd_holders)
@@ -490,6 +675,13 @@ void bd_release(struct block_device *bde
 	spin_unlock(&bdev_lock);
 }
 
+EXPORT_SYMBOL(bd_release_from_kobject);
+
+void bd_release(struct block_device *bdev)
+{
+	bd_release_from_kobject(bdev, NULL);
+}
+
 EXPORT_SYMBOL(bd_release);
 
 /*
--- linux-2.6.16-rc4/fs/partitions/check.c	2006-02-17 17:23:45.000000000 -0500
+++ linux-2.6.16-rc4/fs/partitions/check.c	2006-02-22 23:18:06.000000000 -0500
@@ -297,6 +297,56 @@ struct kobj_type ktype_part = {
 	.sysfs_ops	= &part_sysfs_ops,
 };
 
+static void dir_release(struct kobject *kobj)
+{
+	kfree(kobj);
+}
+
+static struct kobj_type dir_ktype = {
+	.release	= dir_release,
+	.sysfs_ops	= NULL,
+	.default_attrs	= NULL,
+};
+
+static inline struct kobject *add_dir(struct kobject *parent, const char *name)
+{
+	struct kobject *k;
+
+	if (!parent)
+		return NULL;
+
+	k = kmalloc(sizeof(*k), GFP_KERNEL);
+	if (!k)
+		return NULL;
+
+	memset(k, 0, sizeof(*k));
+	k->parent = parent;
+	k->ktype = &dir_ktype;
+	kobject_set_name(k, name);
+	kobject_register(k);
+
+	return k;
+}
+
+static inline void partition_sysfs_add_subdir(struct hd_struct *p)
+{
+	struct kobject *k;
+
+	k = kobject_get(&p->kobj);
+	p->holder_dir = add_dir(k, "holders");
+	kobject_put(k);
+}
+
+static inline void disk_sysfs_add_subdirs(struct gendisk *disk)
+{
+	struct kobject *k;
+
+	k = kobject_get(&disk->kobj);
+	disk->holder_dir = add_dir(k, "holders");
+	disk->slave_dir = add_dir(k, "slaves");
+	kobject_put(k);
+}
+
 void delete_partition(struct gendisk *disk, int part)
 {
 	struct hd_struct *p = disk->part[part-1];
@@ -310,6 +360,7 @@ void delete_partition(struct gendisk *di
 	p->ios[0] = p->ios[1] = 0;
 	p->sectors[0] = p->sectors[1] = 0;
 	devfs_remove("%s/part%d", disk->devfs_name, part);
+	kobject_unregister(p->holder_dir);
 	kobject_unregister(&p->kobj);
 }
 
@@ -337,6 +388,7 @@ void add_partition(struct gendisk *disk,
 	p->kobj.parent = &disk->kobj;
 	p->kobj.ktype = &ktype_part;
 	kobject_register(&p->kobj);
+	partition_sysfs_add_subdir(p);
 	disk->part[part-1] = p;
 }
 
@@ -383,6 +435,7 @@ void register_disk(struct gendisk *disk)
 	if ((err = kobject_add(&disk->kobj)))
 		return;
 	disk_sysfs_symlinks(disk);
+ 	disk_sysfs_add_subdirs(disk);
 	kobject_uevent(&disk->kobj, KOBJ_ADD);
 
 	/* No minors to use for partitions */
@@ -483,6 +536,8 @@ void del_gendisk(struct gendisk *disk)
 
 	devfs_remove_disk(disk);
 
+	kobject_unregister(disk->holder_dir);
+	kobject_unregister(disk->slave_dir);
 	if (disk->driverfs_dev) {
 		char *disk_name = make_block_name(disk);
 		sysfs_remove_link(&disk->kobj, "device");

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-23 19:15         ` Jun'ichi Nomura
@ 2006-02-24  3:40           ` Greg KH
  2006-02-27 16:09             ` Jun'ichi Nomura
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2006-02-24  3:40 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

On Thu, Feb 23, 2006 at 02:15:45PM -0500, Jun'ichi Nomura wrote:
> Hello Greg,
> 
> >>>>+/* This is a mere directory in sysfs. No methods are needed. */
> >>>>+static struct kobj_type bd_holder_ktype = {
> >>>>+	.release	= NULL,
> >>>>+	.sysfs_ops	= NULL,
> >>>>+	.default_attrs	= NULL,
> >>>>+};
> >>>
> >>>That doesn't look right.  You always need a release function.
> 
> I updated the patch based your comments.
> Could you take a look at this version whether there's
> any problematic use of sysfs/kobjects?
> 
>   - I removed embedded child-kobjects from struct block_device
>     and struct gendisk which I added in my previous patch.
>     Kobject registration occurs when gendisk or hd_struct is
>     registered. Release function of the kobject type is added.
>   - Reference counting of kobjects is done in much symmetric
>     manner than before.
>   - Added bd_claim_by_disk/bd_release_from_disk inline functions
>     to help proper reference counting.

Looks great, only one comment:

> --- linux-2.6.16-rc4/fs/partitions/check.c	2006-02-17 17:23:45.000000000 -0500
> +++ linux-2.6.16-rc4/fs/partitions/check.c	2006-02-22 23:18:06.000000000 -0500
> @@ -297,6 +297,56 @@ struct kobj_type ktype_part = {
>  	.sysfs_ops	= &part_sysfs_ops,
>  };
>  
> +static void dir_release(struct kobject *kobj)
> +{
> +	kfree(kobj);
> +}
> +
> +static struct kobj_type dir_ktype = {
> +	.release	= dir_release,
> +	.sysfs_ops	= NULL,
> +	.default_attrs	= NULL,
> +};
> +
> +static inline struct kobject *add_dir(struct kobject *parent, const char *name)
> +{
> +	struct kobject *k;
> +
> +	if (!parent)
> +		return NULL;
> +
> +	k = kmalloc(sizeof(*k), GFP_KERNEL);
> +	if (!k)
> +		return NULL;
> +
> +	memset(k, 0, sizeof(*k));
> +	k->parent = parent;
> +	k->ktype = &dir_ktype;
> +	kobject_set_name(k, name);
> +	kobject_register(k);
> +
> +	return k;
> +}

This code looks good enough that we should add it to the core kobject
code, don't you think?  Also, you might use kzalloc instead of kmalloc
here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2)
  2006-02-24  3:40           ` Greg KH
@ 2006-02-27 16:09             ` Jun'ichi Nomura
  0 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-27 16:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree, linux-kernel,
	device-mapper development

Hi Greg,

Thank you for the comments.

Greg KH wrote:
>>+static inline struct kobject *add_dir(struct kobject *parent, const char *name)
>>+{
>>+	struct kobject *k;
>>+
>>+	if (!parent)
>>+		return NULL;
>>+
>>+	k = kmalloc(sizeof(*k), GFP_KERNEL);
>>+	if (!k)
>>+		return NULL;
>>+
>>+	memset(k, 0, sizeof(*k));
>>+	k->parent = parent;
>>+	k->ktype = &dir_ktype;
>>+	kobject_set_name(k, name);
>>+	kobject_register(k);
>>+
>>+	return k;
>>+}
> 
> This code looks good enough that we should add it to the core kobject
> code, don't you think?  Also, you might use kzalloc instead of kmalloc
> here.

Yes, it would be nice if kobject core has this function.
I'll move them to lib/kobject.c.

-- 
Jun'ichi Nomura, NEC Solutions (America), Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-02-27 16:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-22 16:06 [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Jun'ichi Nomura
2006-02-22 16:13 ` [PATCH 1/3] sysfs representation of stacked devices (common) (rev.2) Jun'ichi Nomura
2006-02-22 18:48   ` Greg KH
2006-02-22 22:22     ` Jun'ichi Nomura
2006-02-22 22:28       ` Greg KH
2006-02-23 19:15         ` Jun'ichi Nomura
2006-02-24  3:40           ` Greg KH
2006-02-27 16:09             ` Jun'ichi Nomura
2006-02-22 16:13 ` [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2) Jun'ichi Nomura
2006-02-22 16:34   ` Alasdair G Kergon
2006-02-22 17:13     ` Jun'ichi Nomura
2006-02-22 18:13       ` Alasdair G Kergon
2006-02-22 19:32         ` Jun'ichi Nomura
2006-02-22 16:13 ` [PATCH 3/3] sysfs representation of stacked devices (md) (rev.2) Jun'ichi Nomura
2006-02-22 18:47 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) (rev.2) Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).