All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] nvme multipath: expose slaves/holders
@ 2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, Christoph Hellwig, Jens Axboe

Exposing slaves/holders is necessary in order to find out the real PCI device
and its driver for the root filesystem when generating an initramfs with
initramfs-tools. That fails right now for nvme multipath devices, which this
patchset fixes.

However, because the slave devices are hidden, lsblk fails without some extra
patches, as it can't find the device numbers for the slave devices, and exits.

Christoph Hellwig (2):
  block: move holder tracking from struct block_device to hd_struct
  nvme: create slaves/holder entries for multipath devices

Thadeu Lima de Souza Cascardo (2):
  nvme: Should not warn when a disk path is opened
  block: expose devt for GENHD_FL_HIDDEN disks

 block/genhd.c                 | 13 ++++++----
 block/partition-generic.c     |  4 +++
 drivers/block/drbd/drbd_nl.c  |  4 +--
 drivers/md/bcache/super.c     |  8 +++---
 drivers/md/dm.c               |  4 +--
 drivers/md/md.c               |  4 +--
 drivers/nvme/host/core.c      |  9 ++++---
 drivers/nvme/host/multipath.c | 13 ++++++++--
 drivers/nvme/host/nvme.h      | 12 ++++++---
 fs/block_dev.c                | 48 +++++++++++++++--------------------
 include/linux/fs.h            | 11 +++-----
 include/linux/genhd.h         |  4 +++
 12 files changed, 75 insertions(+), 59 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/4] nvme multipath: expose slaves/holders
@ 2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)


Exposing slaves/holders is necessary in order to find out the real PCI device
and its driver for the root filesystem when generating an initramfs with
initramfs-tools. That fails right now for nvme multipath devices, which this
patchset fixes.

However, because the slave devices are hidden, lsblk fails without some extra
patches, as it can't find the device numbers for the slave devices, and exits.

Christoph Hellwig (2):
  block: move holder tracking from struct block_device to hd_struct
  nvme: create slaves/holder entries for multipath devices

Thadeu Lima de Souza Cascardo (2):
  nvme: Should not warn when a disk path is opened
  block: expose devt for GENHD_FL_HIDDEN disks

 block/genhd.c                 | 13 ++++++----
 block/partition-generic.c     |  4 +++
 drivers/block/drbd/drbd_nl.c  |  4 +--
 drivers/md/bcache/super.c     |  8 +++---
 drivers/md/dm.c               |  4 +--
 drivers/md/md.c               |  4 +--
 drivers/nvme/host/core.c      |  9 ++++---
 drivers/nvme/host/multipath.c | 13 ++++++++--
 drivers/nvme/host/nvme.h      | 12 ++++++---
 fs/block_dev.c                | 48 +++++++++++++++--------------------
 include/linux/fs.h            | 11 +++-----
 include/linux/genhd.h         |  4 +++
 12 files changed, 75 insertions(+), 59 deletions(-)

-- 
2.19.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/4] block: move holder tracking from struct block_device to hd_struct
  2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, Christoph Hellwig, Jens Axboe

From: Christoph Hellwig <hch@lst.de>

We'd like to track the slaves and holder for nvme multipath devices
in the same standard fashion as all the other stacked block devices
to make the life for things like distro installers easy.

But struct block_device only exists while we have open instances,
which we never have for the underlying devices of a nvme-multipath
setup.  But we can easily move the older list into struct hd_struct
which exists all the time the block device exists, the only interesting
bit is that we need a new mutex for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c                |  4 +++
 block/partition-generic.c    |  4 +++
 drivers/block/drbd/drbd_nl.c |  4 +--
 drivers/md/bcache/super.c    |  8 +++---
 drivers/md/dm.c              |  4 +--
 drivers/md/md.c              |  4 +--
 fs/block_dev.c               | 48 ++++++++++++++++--------------------
 include/linux/fs.h           | 11 +++------
 include/linux/genhd.h        |  4 +++
 9 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 0145bcb0cc76..7674fce32fca 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1468,6 +1468,10 @@ struct gendisk *__alloc_disk_node(int minors, int node_id)
 
 		disk->minors = minors;
 		rand_initialize_disk(disk);
+#ifdef CONFIG_SYSFS
+		INIT_LIST_HEAD(&disk->part0.holder_disks);
+		mutex_init(&disk->part0.holder_mutex);
+#endif
 		disk_to_dev(disk)->class = &block_class;
 		disk_to_dev(disk)->type = &disk_type;
 		device_initialize(disk_to_dev(disk));
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 5f8db5c5140f..83edc23ff1e8 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -333,6 +333,10 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno,
 	}
 
 	seqcount_init(&p->nr_sects_seq);
+#ifdef CONFIG_SYSFS
+	INIT_LIST_HEAD(&p->holder_disks);
+	mutex_init(&p->holder_mutex);
+#endif
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index d15703b1ffe8..42354e34b565 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1670,7 +1670,7 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
 	if (!do_bd_link)
 		return bdev;
 
-	err = bd_link_disk_holder(bdev, device->vdisk);
+	err = bd_link_disk_holder(bdev->bd_part, device->vdisk);
 	if (err) {
 		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 		drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
@@ -1719,7 +1719,7 @@ static void close_backing_dev(struct drbd_device *device, struct block_device *b
 	if (!bdev)
 		return;
 	if (do_bd_unlink)
-		bd_unlink_disk_holder(bdev, device->vdisk);
+		bd_unlink_disk_holder(bdev->bd_part, device->vdisk);
 	blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7bbd670a5a84..e49b177b1c60 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -677,7 +677,7 @@ static void bcache_device_unlink(struct bcache_device *d)
 		sysfs_remove_link(&d->kobj, "cache");
 
 		for_each_cache(ca, d->c, i)
-			bd_unlink_disk_holder(ca->bdev, d->disk);
+			bd_unlink_disk_holder(ca->bdev->bd_part, d->disk);
 	}
 }
 
@@ -688,7 +688,7 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
 	struct cache *ca;
 
 	for_each_cache(ca, d->c, i)
-		bd_link_disk_holder(ca->bdev, d->disk);
+		bd_link_disk_holder(ca->bdev->bd_part, d->disk);
 
 	snprintf(d->name, BCACHEDEVNAME_SIZE,
 		 "%s%u", name, d->id);
@@ -936,7 +936,7 @@ void bch_cached_dev_run(struct cached_dev *dc)
 	}
 
 	add_disk(d->disk);
-	bd_link_disk_holder(dc->bdev, dc->disk.disk);
+	bd_link_disk_holder(dc->bdev->bd_part, dc->disk.disk);
 	/*
 	 * won't show up in the uevent file, use udevadm monitor -e instead
 	 * only class / kset properties are persistent
@@ -1199,7 +1199,7 @@ static void cached_dev_free(struct closure *cl)
 		kthread_stop(dc->status_update_thread);
 
 	if (atomic_read(&dc->running))
-		bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
+		bd_unlink_disk_holder(dc->bdev->bd_part, dc->disk.disk);
 	bcache_device_free(&dc->disk);
 	list_del(&dc->list);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ab72d79775ee..823b592f066b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -770,7 +770,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
-	r = bd_link_disk_holder(bdev, dm_disk(md));
+	r = bd_link_disk_holder(bdev->bd_part, dm_disk(md));
 	if (r) {
 		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
 		return r;
@@ -789,7 +789,7 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 	if (!td->dm_dev.bdev)
 		return;
 
-	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+	bd_unlink_disk_holder(td->dm_dev.bdev->bd_part, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
 	td->dm_dev.bdev = NULL;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc488cb30a94..53ee747ba44d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2240,7 +2240,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	rdev->sysfs_state = sysfs_get_dirent_safe(rdev->kobj.sd, "state");
 
 	list_add_rcu(&rdev->same_set, &mddev->disks);
-	bd_link_disk_holder(rdev->bdev, mddev->gendisk);
+	bd_link_disk_holder(rdev->bdev->bd_part, mddev->gendisk);
 
 	/* May as well allow recovery to be retried once */
 	mddev->recovery_disabled++;
@@ -2264,7 +2264,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 
-	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
+	bd_unlink_disk_holder(rdev->bdev->bd_part, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index e1886cc7048f..06323d1bd4b5 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -780,9 +780,6 @@ static void init_once(void *foo)
 	memset(bdev, 0, sizeof(*bdev));
 	mutex_init(&bdev->bd_mutex);
 	INIT_LIST_HEAD(&bdev->bd_list);
-#ifdef CONFIG_SYSFS
-	INIT_LIST_HEAD(&bdev->bd_holder_disks);
-#endif
 	bdev->bd_bdi = &noop_backing_dev_info;
 	inode_init_once(&ei->vfs_inode);
 	/* Initialize mutex for freeze. */
@@ -1187,12 +1184,12 @@ struct bd_holder_disk {
 	int			refcnt;
 };
 
-static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
+static struct bd_holder_disk *bd_find_holder_disk(struct hd_struct *part,
 						  struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
-	list_for_each_entry(holder, &bdev->bd_holder_disks, list)
+	list_for_each_entry(holder, &part->holder_disks, list)
 		if (holder->disk == disk)
 			return holder;
 	return NULL;
@@ -1210,7 +1207,7 @@ static void del_symlink(struct kobject *from, struct kobject *to)
 
 /**
  * bd_link_disk_holder - create symlinks between holding disk and slave bdev
- * @bdev: the claimed slave bdev
+ * @part: the claimed slave hd_struct
  * @disk: the holding disk
  *
  * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
@@ -1236,20 +1233,18 @@ static void del_symlink(struct kobject *from, struct kobject *to)
  * RETURNS:
  * 0 on success, -errno on failure.
  */
-int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
+int bd_link_disk_holder(struct hd_struct *part, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 	int ret = 0;
 
-	mutex_lock(&bdev->bd_mutex);
-
-	WARN_ON_ONCE(!bdev->bd_holder);
+	mutex_lock(&part->holder_mutex);
 
 	/* FIXME: remove the following once add_disk() handles errors */
-	if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
+	if (WARN_ON(!disk->slave_dir || !part->holder_dir))
 		goto out_unlock;
 
-	holder = bd_find_holder_disk(bdev, disk);
+	holder = bd_find_holder_disk(part, disk);
 	if (holder) {
 		holder->refcnt++;
 		goto out_unlock;
@@ -1265,28 +1260,28 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	holder->disk = disk;
 	holder->refcnt = 1;
 
-	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	ret = add_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
 	if (ret)
 		goto out_free;
 
-	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	ret = add_symlink(part->holder_dir, &disk_to_dev(disk)->kobj);
 	if (ret)
 		goto out_del;
 	/*
-	 * bdev could be deleted beneath us which would implicitly destroy
+	 * part could be deleted beneath us which would implicitly destroy
 	 * the holder directory.  Hold on to it.
 	 */
-	kobject_get(bdev->bd_part->holder_dir);
+	kobject_get(part->holder_dir);
 
-	list_add(&holder->list, &bdev->bd_holder_disks);
+	list_add(&holder->list, &part->holder_disks);
 	goto out_unlock;
 
 out_del:
-	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	del_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
 out_free:
 	kfree(holder);
 out_unlock:
-	mutex_unlock(&bdev->bd_mutex);
+	mutex_unlock(&part->holder_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
@@ -1301,24 +1296,23 @@ EXPORT_SYMBOL_GPL(bd_link_disk_holder);
  * CONTEXT:
  * Might sleep.
  */
-void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
+void bd_unlink_disk_holder(struct hd_struct *part, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock(&part->holder_mutex);
 
-	holder = bd_find_holder_disk(bdev, disk);
+	holder = bd_find_holder_disk(part, disk);
 
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
-		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-		del_symlink(bdev->bd_part->holder_dir,
-			    &disk_to_dev(disk)->kobj);
-		kobject_put(bdev->bd_part->holder_dir);
+		del_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
+		del_symlink(part->holder_dir, &disk_to_dev(disk)->kobj);
+		kobject_put(part->holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
 	}
 
-	mutex_unlock(&bdev->bd_mutex);
+	mutex_unlock(&part->holder_mutex);
 }
 EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
 #endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a1ab233e6469..324261460e3e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -455,9 +455,6 @@ struct block_device {
 	void *			bd_holder;
 	int			bd_holders;
 	bool			bd_write_holder;
-#ifdef CONFIG_SYSFS
-	struct list_head	bd_holder_disks;
-#endif
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
 	u8			bd_partno;
@@ -2596,16 +2593,16 @@ extern int __blkdev_reread_part(struct block_device *bdev);
 extern int blkdev_reread_part(struct block_device *bdev);
 
 #ifdef CONFIG_SYSFS
-extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
-extern void bd_unlink_disk_holder(struct block_device *bdev,
+extern int bd_link_disk_holder(struct hd_struct *part, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct hd_struct *part,
 				  struct gendisk *disk);
 #else
-static inline int bd_link_disk_holder(struct block_device *bdev,
+static inline int bd_link_disk_holder(struct hd_struct *part,
 				      struct gendisk *disk)
 {
 	return 0;
 }
-static inline void bd_unlink_disk_holder(struct block_device *bdev,
+static inline void bd_unlink_disk_holder(struct hd_struct *part,
 					 struct gendisk *disk)
 {
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 0c5ee17b4d88..e516e95bb8cf 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -130,6 +130,10 @@ struct hd_struct {
 #endif
 	struct percpu_ref ref;
 	struct rcu_work rcu_work;
+#ifdef CONFIG_SYSFS
+	struct list_head	holder_disks;
+	struct mutex		holder_mutex;
+#endif
 };
 
 #define GENHD_FL_REMOVABLE			1
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 1/4] block: move holder tracking from struct block_device to hd_struct
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

We'd like to track the slaves and holder for nvme multipath devices
in the same standard fashion as all the other stacked block devices
to make the life for things like distro installers easy.

But struct block_device only exists while we have open instances,
which we never have for the underlying devices of a nvme-multipath
setup.  But we can easily move the older list into struct hd_struct
which exists all the time the block device exists, the only interesting
bit is that we need a new mutex for it.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/genhd.c                |  4 +++
 block/partition-generic.c    |  4 +++
 drivers/block/drbd/drbd_nl.c |  4 +--
 drivers/md/bcache/super.c    |  8 +++---
 drivers/md/dm.c              |  4 +--
 drivers/md/md.c              |  4 +--
 fs/block_dev.c               | 48 ++++++++++++++++--------------------
 include/linux/fs.h           | 11 +++------
 include/linux/genhd.h        |  4 +++
 9 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 0145bcb0cc76..7674fce32fca 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1468,6 +1468,10 @@ struct gendisk *__alloc_disk_node(int minors, int node_id)
 
 		disk->minors = minors;
 		rand_initialize_disk(disk);
+#ifdef CONFIG_SYSFS
+		INIT_LIST_HEAD(&disk->part0.holder_disks);
+		mutex_init(&disk->part0.holder_mutex);
+#endif
 		disk_to_dev(disk)->class = &block_class;
 		disk_to_dev(disk)->type = &disk_type;
 		device_initialize(disk_to_dev(disk));
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 5f8db5c5140f..83edc23ff1e8 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -333,6 +333,10 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno,
 	}
 
 	seqcount_init(&p->nr_sects_seq);
+#ifdef CONFIG_SYSFS
+	INIT_LIST_HEAD(&p->holder_disks);
+	mutex_init(&p->holder_mutex);
+#endif
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index d15703b1ffe8..42354e34b565 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1670,7 +1670,7 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
 	if (!do_bd_link)
 		return bdev;
 
-	err = bd_link_disk_holder(bdev, device->vdisk);
+	err = bd_link_disk_holder(bdev->bd_part, device->vdisk);
 	if (err) {
 		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 		drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
@@ -1719,7 +1719,7 @@ static void close_backing_dev(struct drbd_device *device, struct block_device *b
 	if (!bdev)
 		return;
 	if (do_bd_unlink)
-		bd_unlink_disk_holder(bdev, device->vdisk);
+		bd_unlink_disk_holder(bdev->bd_part, device->vdisk);
 	blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7bbd670a5a84..e49b177b1c60 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -677,7 +677,7 @@ static void bcache_device_unlink(struct bcache_device *d)
 		sysfs_remove_link(&d->kobj, "cache");
 
 		for_each_cache(ca, d->c, i)
-			bd_unlink_disk_holder(ca->bdev, d->disk);
+			bd_unlink_disk_holder(ca->bdev->bd_part, d->disk);
 	}
 }
 
@@ -688,7 +688,7 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
 	struct cache *ca;
 
 	for_each_cache(ca, d->c, i)
-		bd_link_disk_holder(ca->bdev, d->disk);
+		bd_link_disk_holder(ca->bdev->bd_part, d->disk);
 
 	snprintf(d->name, BCACHEDEVNAME_SIZE,
 		 "%s%u", name, d->id);
@@ -936,7 +936,7 @@ void bch_cached_dev_run(struct cached_dev *dc)
 	}
 
 	add_disk(d->disk);
-	bd_link_disk_holder(dc->bdev, dc->disk.disk);
+	bd_link_disk_holder(dc->bdev->bd_part, dc->disk.disk);
 	/*
 	 * won't show up in the uevent file, use udevadm monitor -e instead
 	 * only class / kset properties are persistent
@@ -1199,7 +1199,7 @@ static void cached_dev_free(struct closure *cl)
 		kthread_stop(dc->status_update_thread);
 
 	if (atomic_read(&dc->running))
-		bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
+		bd_unlink_disk_holder(dc->bdev->bd_part, dc->disk.disk);
 	bcache_device_free(&dc->disk);
 	list_del(&dc->list);
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ab72d79775ee..823b592f066b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -770,7 +770,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
-	r = bd_link_disk_holder(bdev, dm_disk(md));
+	r = bd_link_disk_holder(bdev->bd_part, dm_disk(md));
 	if (r) {
 		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
 		return r;
@@ -789,7 +789,7 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 	if (!td->dm_dev.bdev)
 		return;
 
-	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+	bd_unlink_disk_holder(td->dm_dev.bdev->bd_part, dm_disk(md));
 	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
 	put_dax(td->dm_dev.dax_dev);
 	td->dm_dev.bdev = NULL;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc488cb30a94..53ee747ba44d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2240,7 +2240,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
 	rdev->sysfs_state = sysfs_get_dirent_safe(rdev->kobj.sd, "state");
 
 	list_add_rcu(&rdev->same_set, &mddev->disks);
-	bd_link_disk_holder(rdev->bdev, mddev->gendisk);
+	bd_link_disk_holder(rdev->bdev->bd_part, mddev->gendisk);
 
 	/* May as well allow recovery to be retried once */
 	mddev->recovery_disabled++;
@@ -2264,7 +2264,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
 {
 	char b[BDEVNAME_SIZE];
 
-	bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
+	bd_unlink_disk_holder(rdev->bdev->bd_part, rdev->mddev->gendisk);
 	list_del_rcu(&rdev->same_set);
 	pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
 	rdev->mddev = NULL;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index e1886cc7048f..06323d1bd4b5 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -780,9 +780,6 @@ static void init_once(void *foo)
 	memset(bdev, 0, sizeof(*bdev));
 	mutex_init(&bdev->bd_mutex);
 	INIT_LIST_HEAD(&bdev->bd_list);
-#ifdef CONFIG_SYSFS
-	INIT_LIST_HEAD(&bdev->bd_holder_disks);
-#endif
 	bdev->bd_bdi = &noop_backing_dev_info;
 	inode_init_once(&ei->vfs_inode);
 	/* Initialize mutex for freeze. */
@@ -1187,12 +1184,12 @@ struct bd_holder_disk {
 	int			refcnt;
 };
 
-static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
+static struct bd_holder_disk *bd_find_holder_disk(struct hd_struct *part,
 						  struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
-	list_for_each_entry(holder, &bdev->bd_holder_disks, list)
+	list_for_each_entry(holder, &part->holder_disks, list)
 		if (holder->disk == disk)
 			return holder;
 	return NULL;
@@ -1210,7 +1207,7 @@ static void del_symlink(struct kobject *from, struct kobject *to)
 
 /**
  * bd_link_disk_holder - create symlinks between holding disk and slave bdev
- * @bdev: the claimed slave bdev
+ * @part: the claimed slave hd_struct
  * @disk: the holding disk
  *
  * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
@@ -1236,20 +1233,18 @@ static void del_symlink(struct kobject *from, struct kobject *to)
  * RETURNS:
  * 0 on success, -errno on failure.
  */
-int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
+int bd_link_disk_holder(struct hd_struct *part, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 	int ret = 0;
 
-	mutex_lock(&bdev->bd_mutex);
-
-	WARN_ON_ONCE(!bdev->bd_holder);
+	mutex_lock(&part->holder_mutex);
 
 	/* FIXME: remove the following once add_disk() handles errors */
-	if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
+	if (WARN_ON(!disk->slave_dir || !part->holder_dir))
 		goto out_unlock;
 
-	holder = bd_find_holder_disk(bdev, disk);
+	holder = bd_find_holder_disk(part, disk);
 	if (holder) {
 		holder->refcnt++;
 		goto out_unlock;
@@ -1265,28 +1260,28 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
 	holder->disk = disk;
 	holder->refcnt = 1;
 
-	ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	ret = add_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
 	if (ret)
 		goto out_free;
 
-	ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+	ret = add_symlink(part->holder_dir, &disk_to_dev(disk)->kobj);
 	if (ret)
 		goto out_del;
 	/*
-	 * bdev could be deleted beneath us which would implicitly destroy
+	 * part could be deleted beneath us which would implicitly destroy
 	 * the holder directory.  Hold on to it.
 	 */
-	kobject_get(bdev->bd_part->holder_dir);
+	kobject_get(part->holder_dir);
 
-	list_add(&holder->list, &bdev->bd_holder_disks);
+	list_add(&holder->list, &part->holder_disks);
 	goto out_unlock;
 
 out_del:
-	del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+	del_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
 out_free:
 	kfree(holder);
 out_unlock:
-	mutex_unlock(&bdev->bd_mutex);
+	mutex_unlock(&part->holder_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(bd_link_disk_holder);
@@ -1301,24 +1296,23 @@ EXPORT_SYMBOL_GPL(bd_link_disk_holder);
  * CONTEXT:
  * Might sleep.
  */
-void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
+void bd_unlink_disk_holder(struct hd_struct *part, struct gendisk *disk)
 {
 	struct bd_holder_disk *holder;
 
-	mutex_lock(&bdev->bd_mutex);
+	mutex_lock(&part->holder_mutex);
 
-	holder = bd_find_holder_disk(bdev, disk);
+	holder = bd_find_holder_disk(part, disk);
 
 	if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
-		del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
-		del_symlink(bdev->bd_part->holder_dir,
-			    &disk_to_dev(disk)->kobj);
-		kobject_put(bdev->bd_part->holder_dir);
+		del_symlink(disk->slave_dir, &part_to_dev(part)->kobj);
+		del_symlink(part->holder_dir, &disk_to_dev(disk)->kobj);
+		kobject_put(part->holder_dir);
 		list_del_init(&holder->list);
 		kfree(holder);
 	}
 
-	mutex_unlock(&bdev->bd_mutex);
+	mutex_unlock(&part->holder_mutex);
 }
 EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
 #endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a1ab233e6469..324261460e3e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -455,9 +455,6 @@ struct block_device {
 	void *			bd_holder;
 	int			bd_holders;
 	bool			bd_write_holder;
-#ifdef CONFIG_SYSFS
-	struct list_head	bd_holder_disks;
-#endif
 	struct block_device *	bd_contains;
 	unsigned		bd_block_size;
 	u8			bd_partno;
@@ -2596,16 +2593,16 @@ extern int __blkdev_reread_part(struct block_device *bdev);
 extern int blkdev_reread_part(struct block_device *bdev);
 
 #ifdef CONFIG_SYSFS
-extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
-extern void bd_unlink_disk_holder(struct block_device *bdev,
+extern int bd_link_disk_holder(struct hd_struct *part, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct hd_struct *part,
 				  struct gendisk *disk);
 #else
-static inline int bd_link_disk_holder(struct block_device *bdev,
+static inline int bd_link_disk_holder(struct hd_struct *part,
 				      struct gendisk *disk)
 {
 	return 0;
 }
-static inline void bd_unlink_disk_holder(struct block_device *bdev,
+static inline void bd_unlink_disk_holder(struct hd_struct *part,
 					 struct gendisk *disk)
 {
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 0c5ee17b4d88..e516e95bb8cf 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -130,6 +130,10 @@ struct hd_struct {
 #endif
 	struct percpu_ref ref;
 	struct rcu_work rcu_work;
+#ifdef CONFIG_SYSFS
+	struct list_head	holder_disks;
+	struct mutex		holder_mutex;
+#endif
 };
 
 #define GENHD_FL_REMOVABLE			1
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/4] nvme: create slaves/holder entries for multipath devices
  2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, Christoph Hellwig, Jens Axboe

From: Christoph Hellwig <hch@lst.de>

This allows tools like distro installers easily track the relationship.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[cascardo: only add disk when there is ns->head]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 drivers/nvme/host/core.c      |  5 +++--
 drivers/nvme/host/multipath.c | 13 +++++++++++--
 drivers/nvme/host/nvme.h      | 12 ++++++++----
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3ce7fc9df378..1dc29795f1ee 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -371,7 +371,7 @@ static void nvme_free_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
-	nvme_mpath_remove_disk(head);
+	nvme_mpath_remove_nshead(head);
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	list_del_init(&head->entry);
 	cleanup_srcu_struct_quiesced(&head->srcu);
@@ -3120,7 +3120,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
 
-	nvme_mpath_add_disk(ns, id);
+	nvme_mpath_add_ns(ns, id);
 	nvme_fault_inject_init(ns);
 	kfree(id);
 
@@ -3144,6 +3144,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	nvme_fault_inject_fini(ns);
 	if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+		nvme_mpath_remove_ns(ns);
 		del_gendisk(ns->disk);
 		blk_cleanup_queue(ns->queue);
 		if (blk_get_integrity(ns->disk))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ec310b1b9267..174c64643592 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -500,7 +500,7 @@ static int nvme_set_ns_ana_state(struct nvme_ctrl *ctrl,
 	return 0;
 }
 
-void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
+void nvme_mpath_add_ns(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
 	if (nvme_ctrl_use_ana(ns->ctrl)) {
 		mutex_lock(&ns->ctrl->ana_lock);
@@ -513,9 +513,18 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
 		nvme_mpath_set_live(ns);
 		mutex_unlock(&ns->head->lock);
 	}
+
+	if (ns->head->disk)
+		bd_link_disk_holder(&ns->disk->part0, ns->head->disk);
+}
+
+void nvme_mpath_remove_ns(struct nvme_ns *ns)
+{
+	if (ns->head->disk)
+		bd_unlink_disk_holder(&ns->disk->part0, ns->head->disk);
 }
 
-void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+void nvme_mpath_remove_nshead(struct nvme_ns_head *head)
 {
 	if (!head->disk)
 		return;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ae77eb16fd1f..365262d11e53 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -470,8 +470,9 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 void nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
-void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
-void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+void nvme_mpath_add_ns(struct nvme_ns *ns, struct nvme_id_ns *id);
+void nvme_mpath_remove_ns(struct nvme_ns *ns);
+void nvme_mpath_remove_nshead(struct nvme_ns_head *head);
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
 void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
 void nvme_mpath_stop(struct nvme_ctrl *ctrl);
@@ -515,11 +516,14 @@ static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,
 {
 	return 0;
 }
-static inline void nvme_mpath_add_disk(struct nvme_ns *ns,
+static inline void nvme_mpath_add_ns(struct nvme_ns *ns,
 		struct nvme_id_ns *id)
 {
 }
-static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+static inline void nvme_mpath_remove_ns(struct nvme_ns *ns)
+{
+}
+static inline void nvme_mpath_remove_nshead(struct nvme_ns_head *head)
 {
 }
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/4] nvme: create slaves/holder entries for multipath devices
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

This allows tools like distro installers easily track the relationship.

Signed-off-by: Christoph Hellwig <hch at lst.de>
[cascardo: only add disk when there is ns->head]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 drivers/nvme/host/core.c      |  5 +++--
 drivers/nvme/host/multipath.c | 13 +++++++++++--
 drivers/nvme/host/nvme.h      | 12 ++++++++----
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3ce7fc9df378..1dc29795f1ee 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -371,7 +371,7 @@ static void nvme_free_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
-	nvme_mpath_remove_disk(head);
+	nvme_mpath_remove_nshead(head);
 	ida_simple_remove(&head->subsys->ns_ida, head->instance);
 	list_del_init(&head->entry);
 	cleanup_srcu_struct_quiesced(&head->srcu);
@@ -3120,7 +3120,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
 
-	nvme_mpath_add_disk(ns, id);
+	nvme_mpath_add_ns(ns, id);
 	nvme_fault_inject_init(ns);
 	kfree(id);
 
@@ -3144,6 +3144,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	nvme_fault_inject_fini(ns);
 	if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+		nvme_mpath_remove_ns(ns);
 		del_gendisk(ns->disk);
 		blk_cleanup_queue(ns->queue);
 		if (blk_get_integrity(ns->disk))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index ec310b1b9267..174c64643592 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -500,7 +500,7 @@ static int nvme_set_ns_ana_state(struct nvme_ctrl *ctrl,
 	return 0;
 }
 
-void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
+void nvme_mpath_add_ns(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
 	if (nvme_ctrl_use_ana(ns->ctrl)) {
 		mutex_lock(&ns->ctrl->ana_lock);
@@ -513,9 +513,18 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
 		nvme_mpath_set_live(ns);
 		mutex_unlock(&ns->head->lock);
 	}
+
+	if (ns->head->disk)
+		bd_link_disk_holder(&ns->disk->part0, ns->head->disk);
+}
+
+void nvme_mpath_remove_ns(struct nvme_ns *ns)
+{
+	if (ns->head->disk)
+		bd_unlink_disk_holder(&ns->disk->part0, ns->head->disk);
 }
 
-void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+void nvme_mpath_remove_nshead(struct nvme_ns_head *head)
 {
 	if (!head->disk)
 		return;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ae77eb16fd1f..365262d11e53 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -470,8 +470,9 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 void nvme_failover_req(struct request *req);
 void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
-void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id);
-void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+void nvme_mpath_add_ns(struct nvme_ns *ns, struct nvme_id_ns *id);
+void nvme_mpath_remove_ns(struct nvme_ns *ns);
+void nvme_mpath_remove_nshead(struct nvme_ns_head *head);
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id);
 void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
 void nvme_mpath_stop(struct nvme_ctrl *ctrl);
@@ -515,11 +516,14 @@ static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,
 {
 	return 0;
 }
-static inline void nvme_mpath_add_disk(struct nvme_ns *ns,
+static inline void nvme_mpath_add_ns(struct nvme_ns *ns,
 		struct nvme_id_ns *id)
 {
 }
-static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+static inline void nvme_mpath_remove_ns(struct nvme_ns *ns)
+{
+}
+static inline void nvme_mpath_remove_nshead(struct nvme_ns_head *head)
 {
 }
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/4] nvme: Should not warn when a disk path is opened
  2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, Christoph Hellwig, Jens Axboe

As we hide the disks used for nvme multipath, their open functions
should be allowed to be called, as their devices are not exposed.

However, not exposing the devices cause problems with lsblk, which won't
find the devices and show them, which is even more of a problem when
holders/slaves relationships are exposed.

Not warning here allow us to expose the disks without creating spurious
warnings. A failure should still be returned, which is the case here,
and still allows lsblk to work.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 drivers/nvme/host/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1dc29795f1ee..22424b2adfad 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1327,8 +1327,8 @@ static int nvme_open(struct block_device *bdev, fmode_t mode)
 	struct nvme_ns *ns = bdev->bd_disk->private_data;
 
 #ifdef CONFIG_NVME_MULTIPATH
-	/* should never be called due to GENHD_FL_HIDDEN */
-	if (WARN_ON_ONCE(ns->head->disk))
+	/* Should fail as it's hidden */
+	if (ns->head->disk)
 		goto fail;
 #endif
 	if (!kref_get_unless_zero(&ns->kref))
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/4] nvme: Should not warn when a disk path is opened
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)


As we hide the disks used for nvme multipath, their open functions
should be allowed to be called, as their devices are not exposed.

However, not exposing the devices cause problems with lsblk, which won't
find the devices and show them, which is even more of a problem when
holders/slaves relationships are exposed.

Not warning here allow us to expose the disks without creating spurious
warnings. A failure should still be returned, which is the case here,
and still allows lsblk to work.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 drivers/nvme/host/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1dc29795f1ee..22424b2adfad 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1327,8 +1327,8 @@ static int nvme_open(struct block_device *bdev, fmode_t mode)
 	struct nvme_ns *ns = bdev->bd_disk->private_data;
 
 #ifdef CONFIG_NVME_MULTIPATH
-	/* should never be called due to GENHD_FL_HIDDEN */
-	if (WARN_ON_ONCE(ns->head->disk))
+	/* Should fail as it's hidden */
+	if (ns->head->disk)
 		goto fail;
 #endif
 	if (!kref_get_unless_zero(&ns->kref))
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-block, Christoph Hellwig, Jens Axboe

Without this exposure, lsblk will fail as it tries to find out the
device's dev_t numbers. This causes a real problem for nvme multipath
devices, as their slaves are hidden.

Exposing them fixes the problem, even though trying to open the devices
returns an error in the case of nvme multipath. So, right now, it's the
driver's responsibility to return a failure to open hidden devices.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
---
 block/genhd.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 7674fce32fca..65a7fa664188 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 
 	disk_alloc_events(disk);
 
+	disk_to_dev(disk)->devt = devt;
 	if (disk->flags & GENHD_FL_HIDDEN) {
 		/*
 		 * Don't let hidden disks show up in /proc/partitions,
@@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 		int ret;
 
 		/* Register BDI before referencing it from bdev */
-		disk_to_dev(disk)->devt = devt;
 		ret = bdi_register_owner(disk->queue->backing_dev_info,
 						disk_to_dev(disk));
 		WARN_ON(ret);
-		blk_register_region(disk_devt(disk), disk->minors, NULL,
-				    exact_match, exact_lock, disk);
 	}
+	blk_register_region(disk_devt(disk), disk->minors, NULL,
+			    exact_match, exact_lock, disk);
 	register_disk(parent, disk, groups);
 	if (register_queue)
 		blk_register_queue(disk);
@@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
 		WARN_ON(1);
 	}
 
-	if (!(disk->flags & GENHD_FL_HIDDEN))
-		blk_unregister_region(disk_devt(disk), disk->minors);
+	blk_unregister_region(disk_devt(disk), disk->minors);
 
 	kobject_put(disk->part0.holder_dir);
 	kobject_put(disk->slave_dir);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-06 16:48 UTC (permalink / raw)


Without this exposure, lsblk will fail as it tries to find out the
device's dev_t numbers. This causes a real problem for nvme multipath
devices, as their slaves are hidden.

Exposing them fixes the problem, even though trying to open the devices
returns an error in the case of nvme multipath. So, right now, it's the
driver's responsibility to return a failure to open hidden devices.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
---
 block/genhd.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 7674fce32fca..65a7fa664188 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 
 	disk_alloc_events(disk);
 
+	disk_to_dev(disk)->devt = devt;
 	if (disk->flags & GENHD_FL_HIDDEN) {
 		/*
 		 * Don't let hidden disks show up in /proc/partitions,
@@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 		int ret;
 
 		/* Register BDI before referencing it from bdev */
-		disk_to_dev(disk)->devt = devt;
 		ret = bdi_register_owner(disk->queue->backing_dev_info,
 						disk_to_dev(disk));
 		WARN_ON(ret);
-		blk_register_region(disk_devt(disk), disk->minors, NULL,
-				    exact_match, exact_lock, disk);
 	}
+	blk_register_region(disk_devt(disk), disk->minors, NULL,
+			    exact_match, exact_lock, disk);
 	register_disk(parent, disk, groups);
 	if (register_queue)
 		blk_register_queue(disk);
@@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
 		WARN_ON(1);
 	}
 
-	if (!(disk->flags & GENHD_FL_HIDDEN))
-		blk_unregister_region(disk_devt(disk), disk->minors);
+	blk_unregister_region(disk_devt(disk), disk->minors);
 
 	kobject_put(disk->part0.holder_dir);
 	kobject_put(disk->slave_dir);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
@ 2018-12-06 20:22     ` Christoph Hellwig
  -1 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-06 20:22 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-nvme, linux-block, Christoph Hellwig, Jens Axboe

On Thu, Dec 06, 2018 at 02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> Without this exposure, lsblk will fail as it tries to find out the
> device's dev_t numbers. This causes a real problem for nvme multipath
> devices, as their slaves are hidden.
> 
> Exposing them fixes the problem, even though trying to open the devices
> returns an error in the case of nvme multipath. So, right now, it's the
> driver's responsibility to return a failure to open hidden devices.

So the problem with this is that it will cause udev to actually create
the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
fail to open, which is a bit confusing.  I guess we could live with this
if we add udev rules to supress the creation or something, but in general
it is a bit ugly.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-06 20:22     ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-06 20:22 UTC (permalink / raw)


On Thu, Dec 06, 2018@02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> Without this exposure, lsblk will fail as it tries to find out the
> device's dev_t numbers. This causes a real problem for nvme multipath
> devices, as their slaves are hidden.
> 
> Exposing them fixes the problem, even though trying to open the devices
> returns an error in the case of nvme multipath. So, right now, it's the
> driver's responsibility to return a failure to open hidden devices.

So the problem with this is that it will cause udev to actually create
the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
fail to open, which is a bit confusing.  I guess we could live with this
if we add udev rules to supress the creation or something, but in general
it is a bit ugly.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-06 20:22     ` Christoph Hellwig
@ 2018-12-12  8:32       ` Christoph Hellwig
  -1 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-12  8:32 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-nvme, linux-block, Christoph Hellwig, Jens Axboe

On Thu, Dec 06, 2018 at 09:22:00PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 06, 2018 at 02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> 
> So the problem with this is that it will cause udev to actually create
> the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
> fail to open, which is a bit confusing.  I guess we could live with this
> if we add udev rules to supress the creation or something, but in general
> it is a bit ugly.

Thadeu (or other distro folks), any comments on this issue?  If not
I'd like to just move forward with the first two patches for now,
and it would be good to have some reviews for them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-12  8:32       ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-12  8:32 UTC (permalink / raw)


On Thu, Dec 06, 2018@09:22:00PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 06, 2018@02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> 
> So the problem with this is that it will cause udev to actually create
> the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
> fail to open, which is a bit confusing.  I guess we could live with this
> if we add udev rules to supress the creation or something, but in general
> it is a bit ugly.

Thadeu (or other distro folks), any comments on this issue?  If not
I'd like to just move forward with the first two patches for now,
and it would be good to have some reviews for them.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-06 20:22     ` Christoph Hellwig
@ 2018-12-12 12:39       ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-12 12:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, linux-block, Jens Axboe

On Thu, Dec 06, 2018 at 09:22:00PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 06, 2018 at 02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> 
> So the problem with this is that it will cause udev to actually create
> the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
> fail to open, which is a bit confusing.  I guess we could live with this
> if we add udev rules to supress the creation or something, but in general
> it is a bit ugly.

Well, udev could just look at the hidden attribute. At least to some, the fact
that lsblk would fail was reason enough to revert the patches, so I would
rather apply the entire set.

We could add a message to the log when users try to open them, but given that
some programs (udev is one) would open them without any user interaction, maybe
we shouldn't.

Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-12 12:39       ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-12 12:39 UTC (permalink / raw)


On Thu, Dec 06, 2018@09:22:00PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 06, 2018@02:48:12PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> 
> So the problem with this is that it will cause udev to actually create
> the /dev/nvmeXcYnZ nodes, which due to the previous patch will always
> fail to open, which is a bit confusing.  I guess we could live with this
> if we add udev rules to supress the creation or something, but in general
> it is a bit ugly.

Well, udev could just look at the hidden attribute. At least to some, the fact
that lsblk would fail was reason enough to revert the patches, so I would
rather apply the entire set.

We could add a message to the log when users try to open them, but given that
some programs (udev is one) would open them without any user interaction, maybe
we shouldn't.

Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/4] block: move holder tracking from struct block_device to hd_struct
  2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
@ 2018-12-13  9:14     ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:14 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-nvme
  Cc: linux-block, Christoph Hellwig, Jens Axboe

On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> We'd like to track the slaves and holder for nvme multipath devices
> in the same standard fashion as all the other stacked block devices
> to make the life for things like distro installers easy.
> 
> But struct block_device only exists while we have open instances,
> which we never have for the underlying devices of a nvme-multipath
> setup.  But we can easily move the older list into struct hd_struct
> which exists all the time the block device exists, the only interesting
> bit is that we need a new mutex for it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/genhd.c                |  4 +++
>   block/partition-generic.c    |  4 +++
>   drivers/block/drbd/drbd_nl.c |  4 +--
>   drivers/md/bcache/super.c    |  8 +++---
>   drivers/md/dm.c              |  4 +--
>   drivers/md/md.c              |  4 +--
>   fs/block_dev.c               | 48 ++++++++++++++++--------------------
>   include/linux/fs.h           | 11 +++------
>   include/linux/genhd.h        |  4 +++
>   9 files changed, 47 insertions(+), 44 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/4] block: move holder tracking from struct block_device to hd_struct
@ 2018-12-13  9:14     ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:14 UTC (permalink / raw)


On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> From: Christoph Hellwig <hch at lst.de>
> 
> We'd like to track the slaves and holder for nvme multipath devices
> in the same standard fashion as all the other stacked block devices
> to make the life for things like distro installers easy.
> 
> But struct block_device only exists while we have open instances,
> which we never have for the underlying devices of a nvme-multipath
> setup.  But we can easily move the older list into struct hd_struct
> which exists all the time the block device exists, the only interesting
> bit is that we need a new mutex for it.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   block/genhd.c                |  4 +++
>   block/partition-generic.c    |  4 +++
>   drivers/block/drbd/drbd_nl.c |  4 +--
>   drivers/md/bcache/super.c    |  8 +++---
>   drivers/md/dm.c              |  4 +--
>   drivers/md/md.c              |  4 +--
>   fs/block_dev.c               | 48 ++++++++++++++++--------------------
>   include/linux/fs.h           | 11 +++------
>   include/linux/genhd.h        |  4 +++
>   9 files changed, 47 insertions(+), 44 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/4] nvme: create slaves/holder entries for multipath devices
  2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
@ 2018-12-13  9:15     ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:15 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-nvme
  Cc: linux-block, Christoph Hellwig, Jens Axboe

On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> This allows tools like distro installers easily track the relationship.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [cascardo: only add disk when there is ns->head]
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>   drivers/nvme/host/core.c      |  5 +++--
>   drivers/nvme/host/multipath.c | 13 +++++++++++--
>   drivers/nvme/host/nvme.h      | 12 ++++++++----
>   3 files changed, 22 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 2/4] nvme: create slaves/holder entries for multipath devices
@ 2018-12-13  9:15     ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:15 UTC (permalink / raw)


On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> From: Christoph Hellwig <hch at lst.de>
> 
> This allows tools like distro installers easily track the relationship.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> [cascardo: only add disk when there is ns->head]
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> ---
>   drivers/nvme/host/core.c      |  5 +++--
>   drivers/nvme/host/multipath.c | 13 +++++++++++--
>   drivers/nvme/host/nvme.h      | 12 ++++++++----
>   3 files changed, 22 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/4] nvme: Should not warn when a disk path is opened
  2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
@ 2018-12-13  9:16     ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:16 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-nvme
  Cc: linux-block, Christoph Hellwig, Jens Axboe

On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> As we hide the disks used for nvme multipath, their open functions
> should be allowed to be called, as their devices are not exposed.
> 
> However, not exposing the devices cause problems with lsblk, which won't
> find the devices and show them, which is even more of a problem when
> holders/slaves relationships are exposed.
> 
> Not warning here allow us to expose the disks without creating spurious
> warnings. A failure should still be returned, which is the case here,
> and still allows lsblk to work.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>   drivers/nvme/host/core.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1dc29795f1ee..22424b2adfad 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1327,8 +1327,8 @@ static int nvme_open(struct block_device *bdev, fmode_t mode)
>   	struct nvme_ns *ns = bdev->bd_disk->private_data;
>   
>   #ifdef CONFIG_NVME_MULTIPATH
> -	/* should never be called due to GENHD_FL_HIDDEN */
> -	if (WARN_ON_ONCE(ns->head->disk))
> +	/* Should fail as it's hidden */
> +	if (ns->head->disk)
>   		goto fail;
>   #endif
>   	if (!kref_get_unless_zero(&ns->kref))
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 3/4] nvme: Should not warn when a disk path is opened
@ 2018-12-13  9:16     ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:16 UTC (permalink / raw)


On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> As we hide the disks used for nvme multipath, their open functions
> should be allowed to be called, as their devices are not exposed.
> 
> However, not exposing the devices cause problems with lsblk, which won't
> find the devices and show them, which is even more of a problem when
> holders/slaves relationships are exposed.
> 
> Not warning here allow us to expose the disks without creating spurious
> warnings. A failure should still be returned, which is the case here,
> and still allows lsblk to work.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> ---
>   drivers/nvme/host/core.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1dc29795f1ee..22424b2adfad 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1327,8 +1327,8 @@ static int nvme_open(struct block_device *bdev, fmode_t mode)
>   	struct nvme_ns *ns = bdev->bd_disk->private_data;
>   
>   #ifdef CONFIG_NVME_MULTIPATH
> -	/* should never be called due to GENHD_FL_HIDDEN */
> -	if (WARN_ON_ONCE(ns->head->disk))
> +	/* Should fail as it's hidden */
> +	if (ns->head->disk)
>   		goto fail;
>   #endif
>   	if (!kref_get_unless_zero(&ns->kref))
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
@ 2018-12-13  9:18     ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:18 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-nvme
  Cc: linux-block, Christoph Hellwig, Jens Axboe

On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> Without this exposure, lsblk will fail as it tries to find out the
> device's dev_t numbers. This causes a real problem for nvme multipath
> devices, as their slaves are hidden.
> 
> Exposing them fixes the problem, even though trying to open the devices
> returns an error in the case of nvme multipath. So, right now, it's the
> driver's responsibility to return a failure to open hidden devices.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> ---
>   block/genhd.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 7674fce32fca..65a7fa664188 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>   
>   	disk_alloc_events(disk);
>   
> +	disk_to_dev(disk)->devt = devt;
>   	if (disk->flags & GENHD_FL_HIDDEN) {
>   		/*
>   		 * Don't let hidden disks show up in /proc/partitions,
> @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>   		int ret;
>   
>   		/* Register BDI before referencing it from bdev */
> -		disk_to_dev(disk)->devt = devt;
>   		ret = bdi_register_owner(disk->queue->backing_dev_info,
>   						disk_to_dev(disk));
>   		WARN_ON(ret);
> -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> -				    exact_match, exact_lock, disk);
>   	}
> +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> +			    exact_match, exact_lock, disk);
>   	register_disk(parent, disk, groups);
>   	if (register_queue)
>   		blk_register_queue(disk);
> @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
>   		WARN_ON(1);
>   	}
>   
> -	if (!(disk->flags & GENHD_FL_HIDDEN))
> -		blk_unregister_region(disk_devt(disk), disk->minors);
> +	blk_unregister_region(disk_devt(disk), disk->minors);
>   
>   	kobject_put(disk->part0.holder_dir);
>   	kobject_put(disk->slave_dir);
> 
Welll ... this is not just 'lsblk', but more importantly this will force 
udev to create _block_ device nodes for the hidden devices, essentially 
'unhide' them.

Is this what we want?
Christoph?
I thought the entire _point_ of having hidden devices is that the are 
... well ... hidden ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13  9:18     ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13  9:18 UTC (permalink / raw)


On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> Without this exposure, lsblk will fail as it tries to find out the
> device's dev_t numbers. This causes a real problem for nvme multipath
> devices, as their slaves are hidden.
> 
> Exposing them fixes the problem, even though trying to open the devices
> returns an error in the case of nvme multipath. So, right now, it's the
> driver's responsibility to return a failure to open hidden devices.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> ---
>   block/genhd.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 7674fce32fca..65a7fa664188 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>   
>   	disk_alloc_events(disk);
>   
> +	disk_to_dev(disk)->devt = devt;
>   	if (disk->flags & GENHD_FL_HIDDEN) {
>   		/*
>   		 * Don't let hidden disks show up in /proc/partitions,
> @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>   		int ret;
>   
>   		/* Register BDI before referencing it from bdev */
> -		disk_to_dev(disk)->devt = devt;
>   		ret = bdi_register_owner(disk->queue->backing_dev_info,
>   						disk_to_dev(disk));
>   		WARN_ON(ret);
> -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> -				    exact_match, exact_lock, disk);
>   	}
> +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> +			    exact_match, exact_lock, disk);
>   	register_disk(parent, disk, groups);
>   	if (register_queue)
>   		blk_register_queue(disk);
> @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
>   		WARN_ON(1);
>   	}
>   
> -	if (!(disk->flags & GENHD_FL_HIDDEN))
> -		blk_unregister_region(disk_devt(disk), disk->minors);
> +	blk_unregister_region(disk_devt(disk), disk->minors);
>   
>   	kobject_put(disk->part0.holder_dir);
>   	kobject_put(disk->slave_dir);
> 
Welll ... this is not just 'lsblk', but more importantly this will force 
udev to create _block_ device nodes for the hidden devices, essentially 
'unhide' them.

Is this what we want?
Christoph?
I thought the entire _point_ of having hidden devices is that the are 
... well ... hidden ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] nvme multipath: expose slaves/holders
  2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
@ 2018-12-13  9:33   ` Johannes Thumshirn
  -1 siblings, 0 replies; 56+ messages in thread
From: Johannes Thumshirn @ 2018-12-13  9:33 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linux-nvme
  Cc: linux-block, Jens Axboe, Christoph Hellwig

On 06/12/2018 17:48, Thadeu Lima de Souza Cascardo wrote:
> Exposing slaves/holders is necessary in order to find out the real PCI device
> and its driver for the root filesystem when generating an initramfs with
> initramfs-tools. That fails right now for nvme multipath devices, which this
> patchset fixes.
> 
> However, because the slave devices are hidden, lsblk fails without some extra
> patches, as it can't find the device numbers for the slave devices, and exits.

Sorry for chiming in so late, can someone give me an explanation what
actually is broken?

I know this is technically a user visible regression, but isn't lsblk
(and others) already fixed? I believe it's [1] in util-linux.

And to find out the PCI device, why do you need slaves/holders here?
Have a look at blktest's _get_pci_dev_from_blkdev() function [2].

[1] d51f05bfecb2 ("lsblk: try device/dev to read devno")
[2] https://github.com/osandov/blktests/blob/master/common/rc#L185

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/4] nvme multipath: expose slaves/holders
@ 2018-12-13  9:33   ` Johannes Thumshirn
  0 siblings, 0 replies; 56+ messages in thread
From: Johannes Thumshirn @ 2018-12-13  9:33 UTC (permalink / raw)


On 06/12/2018 17:48, Thadeu Lima de Souza Cascardo wrote:
> Exposing slaves/holders is necessary in order to find out the real PCI device
> and its driver for the root filesystem when generating an initramfs with
> initramfs-tools. That fails right now for nvme multipath devices, which this
> patchset fixes.
> 
> However, because the slave devices are hidden, lsblk fails without some extra
> patches, as it can't find the device numbers for the slave devices, and exits.

Sorry for chiming in so late, can someone give me an explanation what
actually is broken?

I know this is technically a user visible regression, but isn't lsblk
(and others) already fixed? I believe it's [1] in util-linux.

And to find out the PCI device, why do you need slaves/holders here?
Have a look at blktest's _get_pci_dev_from_blkdev() function [2].

[1] d51f05bfecb2 ("lsblk: try device/dev to read devno")
[2] https://github.com/osandov/blktests/blob/master/common/rc#L185

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] nvme multipath: expose slaves/holders
  2018-12-13  9:33   ` Johannes Thumshirn
@ 2018-12-13 11:35     ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 11:35 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-nvme, linux-block, Jens Axboe, Christoph Hellwig

On Thu, Dec 13, 2018 at 10:33:13AM +0100, Johannes Thumshirn wrote:
> On 06/12/2018 17:48, Thadeu Lima de Souza Cascardo wrote:
> > Exposing slaves/holders is necessary in order to find out the real PCI device
> > and its driver for the root filesystem when generating an initramfs with
> > initramfs-tools. That fails right now for nvme multipath devices, which this
> > patchset fixes.
> > 
> > However, because the slave devices are hidden, lsblk fails without some extra
> > patches, as it can't find the device numbers for the slave devices, and exits.
> 
> Sorry for chiming in so late, can someone give me an explanation what
> actually is broken?
> 
> I know this is technically a user visible regression, but isn't lsblk
> (and others) already fixed? I believe it's [1] in util-linux.
> 
> And to find out the PCI device, why do you need slaves/holders here?
> Have a look at blktest's _get_pci_dev_from_blkdev() function [2].
> 
> [1] d51f05bfecb2 ("lsblk: try device/dev to read devno")
> [2] https://github.com/osandov/blktests/blob/master/common/rc#L185

Those two work fine for the block device associated to a controller. Now, when
multipath is used, there is now a virtual block device that balances IO to the
real paths.

# readlink -f /sys/block/nvme0n1/device
/sys/devices/virtual/nvme-subsystem/nvme-subsys0
#

And we can't just give it a parent because it may have multiple parents as it
might have multiple slaves, which is something we can represent, and has been
in use for a long time.

Now, the issue with lsblk is that the hidden block devices won't have a dev.
And now that I think of it, when you are reading device/dev instead of dev for
the nvme block device, you are, in fact, looking at the dev for the character
device, which seems wrong.

You are looking at nvme0, which is the character device that works as a
controlling interface. So, there is no reason lsblk should use that as the dev
number for the block device.

Cascardo.

> 
> Byte,
> 	Johannes
> -- 
> Johannes Thumshirn                            SUSE Labs Filesystems
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/4] nvme multipath: expose slaves/holders
@ 2018-12-13 11:35     ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 11:35 UTC (permalink / raw)


On Thu, Dec 13, 2018@10:33:13AM +0100, Johannes Thumshirn wrote:
> On 06/12/2018 17:48, Thadeu Lima de Souza Cascardo wrote:
> > Exposing slaves/holders is necessary in order to find out the real PCI device
> > and its driver for the root filesystem when generating an initramfs with
> > initramfs-tools. That fails right now for nvme multipath devices, which this
> > patchset fixes.
> > 
> > However, because the slave devices are hidden, lsblk fails without some extra
> > patches, as it can't find the device numbers for the slave devices, and exits.
> 
> Sorry for chiming in so late, can someone give me an explanation what
> actually is broken?
> 
> I know this is technically a user visible regression, but isn't lsblk
> (and others) already fixed? I believe it's [1] in util-linux.
> 
> And to find out the PCI device, why do you need slaves/holders here?
> Have a look at blktest's _get_pci_dev_from_blkdev() function [2].
> 
> [1] d51f05bfecb2 ("lsblk: try device/dev to read devno")
> [2] https://github.com/osandov/blktests/blob/master/common/rc#L185

Those two work fine for the block device associated to a controller. Now, when
multipath is used, there is now a virtual block device that balances IO to the
real paths.

# readlink -f /sys/block/nvme0n1/device
/sys/devices/virtual/nvme-subsystem/nvme-subsys0
#

And we can't just give it a parent because it may have multiple parents as it
might have multiple slaves, which is something we can represent, and has been
in use for a long time.

Now, the issue with lsblk is that the hidden block devices won't have a dev.
And now that I think of it, when you are reading device/dev instead of dev for
the nvme block device, you are, in fact, looking at the dev for the character
device, which seems wrong.

You are looking at nvme0, which is the character device that works as a
controlling interface. So, there is no reason lsblk should use that as the dev
number for the block device.

Cascardo.

> 
> Byte,
> 	Johannes
> -- 
> Johannes Thumshirn                            SUSE Labs Filesystems
> jthumshirn at suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: Felix Imend?rffer, Jane Smithard, Graham Norton
> HRB 21284 (AG N?rnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13  9:18     ` Hannes Reinecke
@ 2018-12-13 11:41       ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 11:41 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, linux-block, Christoph Hellwig, Jens Axboe

On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
> On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> > ---
> >   block/genhd.c | 9 ++++-----
> >   1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/genhd.c b/block/genhd.c
> > index 7674fce32fca..65a7fa664188 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> >   	disk_alloc_events(disk);
> > +	disk_to_dev(disk)->devt = devt;
> >   	if (disk->flags & GENHD_FL_HIDDEN) {
> >   		/*
> >   		 * Don't let hidden disks show up in /proc/partitions,
> > @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> >   		int ret;
> >   		/* Register BDI before referencing it from bdev */
> > -		disk_to_dev(disk)->devt = devt;
> >   		ret = bdi_register_owner(disk->queue->backing_dev_info,
> >   						disk_to_dev(disk));
> >   		WARN_ON(ret);
> > -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> > -				    exact_match, exact_lock, disk);
> >   	}
> > +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> > +			    exact_match, exact_lock, disk);
> >   	register_disk(parent, disk, groups);
> >   	if (register_queue)
> >   		blk_register_queue(disk);
> > @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
> >   		WARN_ON(1);
> >   	}
> > -	if (!(disk->flags & GENHD_FL_HIDDEN))
> > -		blk_unregister_region(disk_devt(disk), disk->minors);
> > +	blk_unregister_region(disk_devt(disk), disk->minors);
> >   	kobject_put(disk->part0.holder_dir);
> >   	kobject_put(disk->slave_dir);
> > 
> Welll ... this is not just 'lsblk', but more importantly this will force
> udev to create _block_ device nodes for the hidden devices, essentially
> 'unhide' them.
> 
> Is this what we want?
> Christoph?
> I thought the entire _point_ of having hidden devices is that the are ...
> well ... hidden ...
> 

I knew this would be the most controversial patch. And I had other solutions as
well, but preferred this one. So, the other alternative would be just not use
GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
single user in the kernel. That would still fix the two problems
(initramfs-tools and lsblk), and not create any other problem I know of. That
patch would still fail to open the underlying devices when there is a
head/multipath associated with it.

So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
for example. And we could also use it to fail open when blkdev_get* is called.
Of couse, that would still imply that its name should be changed, but as we
already have an attribute named after that, I find it hard to suggest such a
change.

Cascardo.

> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 11:41       ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 11:41 UTC (permalink / raw)


On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
> On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> > Without this exposure, lsblk will fail as it tries to find out the
> > device's dev_t numbers. This causes a real problem for nvme multipath
> > devices, as their slaves are hidden.
> > 
> > Exposing them fixes the problem, even though trying to open the devices
> > returns an error in the case of nvme multipath. So, right now, it's the
> > driver's responsibility to return a failure to open hidden devices.
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> > ---
> >   block/genhd.c | 9 ++++-----
> >   1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/genhd.c b/block/genhd.c
> > index 7674fce32fca..65a7fa664188 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> >   	disk_alloc_events(disk);
> > +	disk_to_dev(disk)->devt = devt;
> >   	if (disk->flags & GENHD_FL_HIDDEN) {
> >   		/*
> >   		 * Don't let hidden disks show up in /proc/partitions,
> > @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> >   		int ret;
> >   		/* Register BDI before referencing it from bdev */
> > -		disk_to_dev(disk)->devt = devt;
> >   		ret = bdi_register_owner(disk->queue->backing_dev_info,
> >   						disk_to_dev(disk));
> >   		WARN_ON(ret);
> > -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> > -				    exact_match, exact_lock, disk);
> >   	}
> > +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> > +			    exact_match, exact_lock, disk);
> >   	register_disk(parent, disk, groups);
> >   	if (register_queue)
> >   		blk_register_queue(disk);
> > @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
> >   		WARN_ON(1);
> >   	}
> > -	if (!(disk->flags & GENHD_FL_HIDDEN))
> > -		blk_unregister_region(disk_devt(disk), disk->minors);
> > +	blk_unregister_region(disk_devt(disk), disk->minors);
> >   	kobject_put(disk->part0.holder_dir);
> >   	kobject_put(disk->slave_dir);
> > 
> Welll ... this is not just 'lsblk', but more importantly this will force
> udev to create _block_ device nodes for the hidden devices, essentially
> 'unhide' them.
> 
> Is this what we want?
> Christoph?
> I thought the entire _point_ of having hidden devices is that the are ...
> well ... hidden ...
> 

I knew this would be the most controversial patch. And I had other solutions as
well, but preferred this one. So, the other alternative would be just not use
GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
single user in the kernel. That would still fix the two problems
(initramfs-tools and lsblk), and not create any other problem I know of. That
patch would still fail to open the underlying devices when there is a
head/multipath associated with it.

So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
for example. And we could also use it to fail open when blkdev_get* is called.
Of couse, that would still imply that its name should be changed, but as we
already have an attribute named after that, I find it hard to suggest such a
change.

Cascardo.

> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 11:41       ` Thadeu Lima de Souza Cascardo
@ 2018-12-13 12:19         ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13 12:19 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: linux-nvme, linux-block, Christoph Hellwig, Jens Axboe

On 12/13/18 12:41 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
>> On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
>>> Without this exposure, lsblk will fail as it tries to find out the
>>> device's dev_t numbers. This causes a real problem for nvme multipath
>>> devices, as their slaves are hidden.
>>>
>>> Exposing them fixes the problem, even though trying to open the devices
>>> returns an error in the case of nvme multipath. So, right now, it's the
>>> driver's responsibility to return a failure to open hidden devices.
>>>
>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
>>> ---
>>>    block/genhd.c | 9 ++++-----
>>>    1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/genhd.c b/block/genhd.c
>>> index 7674fce32fca..65a7fa664188 100644
>>> --- a/block/genhd.c
>>> +++ b/block/genhd.c
>>> @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>>>    	disk_alloc_events(disk);
>>> +	disk_to_dev(disk)->devt = devt;
>>>    	if (disk->flags & GENHD_FL_HIDDEN) {
>>>    		/*
>>>    		 * Don't let hidden disks show up in /proc/partitions,
>>> @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>>>    		int ret;
>>>    		/* Register BDI before referencing it from bdev */
>>> -		disk_to_dev(disk)->devt = devt;
>>>    		ret = bdi_register_owner(disk->queue->backing_dev_info,
>>>    						disk_to_dev(disk));
>>>    		WARN_ON(ret);
>>> -		blk_register_region(disk_devt(disk), disk->minors, NULL,
>>> -				    exact_match, exact_lock, disk);
>>>    	}
>>> +	blk_register_region(disk_devt(disk), disk->minors, NULL,
>>> +			    exact_match, exact_lock, disk);
>>>    	register_disk(parent, disk, groups);
>>>    	if (register_queue)
>>>    		blk_register_queue(disk);
>>> @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
>>>    		WARN_ON(1);
>>>    	}
>>> -	if (!(disk->flags & GENHD_FL_HIDDEN))
>>> -		blk_unregister_region(disk_devt(disk), disk->minors);
>>> +	blk_unregister_region(disk_devt(disk), disk->minors);
>>>    	kobject_put(disk->part0.holder_dir);
>>>    	kobject_put(disk->slave_dir);
>>>
>> Welll ... this is not just 'lsblk', but more importantly this will force
>> udev to create _block_ device nodes for the hidden devices, essentially
>> 'unhide' them.
>>
>> Is this what we want?
>> Christoph?
>> I thought the entire _point_ of having hidden devices is that the are ...
>> well ... hidden ...
>>
> 
> I knew this would be the most controversial patch. And I had other solutions as
> well, but preferred this one. So, the other alternative would be just not use
> GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
> single user in the kernel. That would still fix the two problems
> (initramfs-tools and lsblk), and not create any other problem I know of. That
> patch would still fail to open the underlying devices when there is a
> head/multipath associated with it.
> 
> So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
> for example. And we could also use it to fail open when blkdev_get* is called.
> Of couse, that would still imply that its name should be changed, but as we
> already have an attribute named after that, I find it hard to suggest such a
> change.
> 
The whole point of the native nvme multipath implementation was that the 
block devices behind the multipathed device are _NOT_ visible to the OS.

This patch reverts the whole scheme, making it behaving just like the 
classical device-mapper multipathing did.

Why can't we just update lsblk to present the correct output?

nvme-cli (and multipath, for that matter) work happily with the current 
setup, _and_ provide the correct topology:

So why can't we do it with lsblk?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 12:19         ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-13 12:19 UTC (permalink / raw)


On 12/13/18 12:41 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
>> On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
>>> Without this exposure, lsblk will fail as it tries to find out the
>>> device's dev_t numbers. This causes a real problem for nvme multipath
>>> devices, as their slaves are hidden.
>>>
>>> Exposing them fixes the problem, even though trying to open the devices
>>> returns an error in the case of nvme multipath. So, right now, it's the
>>> driver's responsibility to return a failure to open hidden devices.
>>>
>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
>>> ---
>>>    block/genhd.c | 9 ++++-----
>>>    1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/genhd.c b/block/genhd.c
>>> index 7674fce32fca..65a7fa664188 100644
>>> --- a/block/genhd.c
>>> +++ b/block/genhd.c
>>> @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>>>    	disk_alloc_events(disk);
>>> +	disk_to_dev(disk)->devt = devt;
>>>    	if (disk->flags & GENHD_FL_HIDDEN) {
>>>    		/*
>>>    		 * Don't let hidden disks show up in /proc/partitions,
>>> @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
>>>    		int ret;
>>>    		/* Register BDI before referencing it from bdev */
>>> -		disk_to_dev(disk)->devt = devt;
>>>    		ret = bdi_register_owner(disk->queue->backing_dev_info,
>>>    						disk_to_dev(disk));
>>>    		WARN_ON(ret);
>>> -		blk_register_region(disk_devt(disk), disk->minors, NULL,
>>> -				    exact_match, exact_lock, disk);
>>>    	}
>>> +	blk_register_region(disk_devt(disk), disk->minors, NULL,
>>> +			    exact_match, exact_lock, disk);
>>>    	register_disk(parent, disk, groups);
>>>    	if (register_queue)
>>>    		blk_register_queue(disk);
>>> @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
>>>    		WARN_ON(1);
>>>    	}
>>> -	if (!(disk->flags & GENHD_FL_HIDDEN))
>>> -		blk_unregister_region(disk_devt(disk), disk->minors);
>>> +	blk_unregister_region(disk_devt(disk), disk->minors);
>>>    	kobject_put(disk->part0.holder_dir);
>>>    	kobject_put(disk->slave_dir);
>>>
>> Welll ... this is not just 'lsblk', but more importantly this will force
>> udev to create _block_ device nodes for the hidden devices, essentially
>> 'unhide' them.
>>
>> Is this what we want?
>> Christoph?
>> I thought the entire _point_ of having hidden devices is that the are ...
>> well ... hidden ...
>>
> 
> I knew this would be the most controversial patch. And I had other solutions as
> well, but preferred this one. So, the other alternative would be just not use
> GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
> single user in the kernel. That would still fix the two problems
> (initramfs-tools and lsblk), and not create any other problem I know of. That
> patch would still fail to open the underlying devices when there is a
> head/multipath associated with it.
> 
> So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
> for example. And we could also use it to fail open when blkdev_get* is called.
> Of couse, that would still imply that its name should be changed, but as we
> already have an attribute named after that, I find it hard to suggest such a
> change.
> 
The whole point of the native nvme multipath implementation was that the 
block devices behind the multipathed device are _NOT_ visible to the OS.

This patch reverts the whole scheme, making it behaving just like the 
classical device-mapper multipathing did.

Why can't we just update lsblk to present the correct output?

nvme-cli (and multipath, for that matter) work happily with the current 
setup, _and_ provide the correct topology:

So why can't we do it with lsblk?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13  9:18     ` Hannes Reinecke
@ 2018-12-13 14:32       ` Christoph Hellwig
  -1 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-13 14:32 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Thadeu Lima de Souza Cascardo, linux-nvme, linux-block,
	Christoph Hellwig, Jens Axboe

On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
> Welll ... this is not just 'lsblk', but more importantly this will force 
> udev to create _block_ device nodes for the hidden devices, essentially 
> 'unhide' them.
>
> Is this what we want?
> Christoph?
> I thought the entire _point_ of having hidden devices is that the are ... 
> well ... hidden ...

Yes, that is why I really don't like the last two patches.

And I've checked back - lsblk actually works just fine at the moment.
But it turns out once we create the slave links it stops working,
which is a really good argument against the first two patches, which
would otherwise seem nice..

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 14:32       ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-13 14:32 UTC (permalink / raw)


On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
> Welll ... this is not just 'lsblk', but more importantly this will force 
> udev to create _block_ device nodes for the hidden devices, essentially 
> 'unhide' them.
>
> Is this what we want?
> Christoph?
> I thought the entire _point_ of having hidden devices is that the are ... 
> well ... hidden ...

Yes, that is why I really don't like the last two patches.

And I've checked back - lsblk actually works just fine at the moment.
But it turns out once we create the slave links it stops working,
which is a really good argument against the first two patches, which
would otherwise seem nice..

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 14:32       ` Christoph Hellwig
@ 2018-12-13 15:25         ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 15:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Hannes Reinecke, linux-nvme, linux-block, Jens Axboe

On Thu, Dec 13, 2018 at 03:32:18PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
> > Welll ... this is not just 'lsblk', but more importantly this will force 
> > udev to create _block_ device nodes for the hidden devices, essentially 
> > 'unhide' them.
> >
> > Is this what we want?
> > Christoph?
> > I thought the entire _point_ of having hidden devices is that the are ... 
> > well ... hidden ...
> 
> Yes, that is why I really don't like the last two patches.
> 
> And I've checked back - lsblk actually works just fine at the moment.
> But it turns out once we create the slave links it stops working,
> which is a really good argument against the first two patches, which
> would otherwise seem nice..

Which is why I have sent the "paths/" patchset in the first place. Because I
did some homework and read the previous discussion about this, and how lsblk
failure to behave with slave links led to the revert of the slaves/holders
patch by Dr. Hannes.

Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 15:25         ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 15:25 UTC (permalink / raw)


On Thu, Dec 13, 2018@03:32:18PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
> > Welll ... this is not just 'lsblk', but more importantly this will force 
> > udev to create _block_ device nodes for the hidden devices, essentially 
> > 'unhide' them.
> >
> > Is this what we want?
> > Christoph?
> > I thought the entire _point_ of having hidden devices is that the are ... 
> > well ... hidden ...
> 
> Yes, that is why I really don't like the last two patches.
> 
> And I've checked back - lsblk actually works just fine at the moment.
> But it turns out once we create the slave links it stops working,
> which is a really good argument against the first two patches, which
> would otherwise seem nice..

Which is why I have sent the "paths/" patchset in the first place. Because I
did some homework and read the previous discussion about this, and how lsblk
failure to behave with slave links led to the revert of the slaves/holders
patch by Dr. Hannes.

Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 12:19         ` Hannes Reinecke
@ 2018-12-13 16:08           ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 16:08 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, linux-block, Christoph Hellwig, Jens Axboe

On Thu, Dec 13, 2018 at 01:19:25PM +0100, Hannes Reinecke wrote:
> On 12/13/18 12:41 PM, Thadeu Lima de Souza Cascardo wrote:
> > On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
> > > On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> > > > Without this exposure, lsblk will fail as it tries to find out the
> > > > device's dev_t numbers. This causes a real problem for nvme multipath
> > > > devices, as their slaves are hidden.
> > > > 
> > > > Exposing them fixes the problem, even though trying to open the devices
> > > > returns an error in the case of nvme multipath. So, right now, it's the
> > > > driver's responsibility to return a failure to open hidden devices.
> > > > 
> > > > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> > > > ---
> > > >    block/genhd.c | 9 ++++-----
> > > >    1 file changed, 4 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/block/genhd.c b/block/genhd.c
> > > > index 7674fce32fca..65a7fa664188 100644
> > > > --- a/block/genhd.c
> > > > +++ b/block/genhd.c
> > > > @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> > > >    	disk_alloc_events(disk);
> > > > +	disk_to_dev(disk)->devt = devt;
> > > >    	if (disk->flags & GENHD_FL_HIDDEN) {
> > > >    		/*
> > > >    		 * Don't let hidden disks show up in /proc/partitions,
> > > > @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> > > >    		int ret;
> > > >    		/* Register BDI before referencing it from bdev */
> > > > -		disk_to_dev(disk)->devt = devt;
> > > >    		ret = bdi_register_owner(disk->queue->backing_dev_info,
> > > >    						disk_to_dev(disk));
> > > >    		WARN_ON(ret);
> > > > -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> > > > -				    exact_match, exact_lock, disk);
> > > >    	}
> > > > +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> > > > +			    exact_match, exact_lock, disk);
> > > >    	register_disk(parent, disk, groups);
> > > >    	if (register_queue)
> > > >    		blk_register_queue(disk);
> > > > @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
> > > >    		WARN_ON(1);
> > > >    	}
> > > > -	if (!(disk->flags & GENHD_FL_HIDDEN))
> > > > -		blk_unregister_region(disk_devt(disk), disk->minors);
> > > > +	blk_unregister_region(disk_devt(disk), disk->minors);
> > > >    	kobject_put(disk->part0.holder_dir);
> > > >    	kobject_put(disk->slave_dir);
> > > > 
> > > Welll ... this is not just 'lsblk', but more importantly this will force
> > > udev to create _block_ device nodes for the hidden devices, essentially
> > > 'unhide' them.
> > > 
> > > Is this what we want?
> > > Christoph?
> > > I thought the entire _point_ of having hidden devices is that the are ...
> > > well ... hidden ...
> > > 
> > 
> > I knew this would be the most controversial patch. And I had other solutions as
> > well, but preferred this one. So, the other alternative would be just not use
> > GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
> > single user in the kernel. That would still fix the two problems
> > (initramfs-tools and lsblk), and not create any other problem I know of. That
> > patch would still fail to open the underlying devices when there is a
> > head/multipath associated with it.
> > 
> > So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
> > for example. And we could also use it to fail open when blkdev_get* is called.
> > Of couse, that would still imply that its name should be changed, but as we
> > already have an attribute named after that, I find it hard to suggest such a
> > change.
> > 
> The whole point of the native nvme multipath implementation was that the
> block devices behind the multipathed device are _NOT_ visible to the OS.
> 

They are still partially visible. And the fact that their relationship to
the block device is not visible makes initramfs-tools break, that is, it
can't find the PCI device and its necessary drivers to get to the root
block device.

Also, when we make those relationships clear, lsblk finds those same
"hidden" devices but breaks because it can't find its devno.

> This patch reverts the whole scheme, making it behaving just like the
> classical device-mapper multipathing did.
> 

As you are the expert in multipath, would you say that the behavior is the
same? Do we really only care about showing relationships between devices
and hidden block devices as the only differente between those two
implementations?  If so, I don't think it's worth it. From the little I
know, they are two different multipath implementations, and hiding devices
is orthogonal to their differences.

> Why can't we just update lsblk to present the correct output?
> 
> nvme-cli (and multipath, for that matter) work happily with the current
> setup, _and_ provide the correct topology:
> 
> So why can't we do it with lsblk?

Because old userspace is broken by the patches that introduce slaves and
holders. That is, users upgrade their kernels, and, suddenly, lsblk is
broken, and distros need to offer upgrades.

It's not the same with initramfs-tools. It's broken already. And I am
trying to provide a kernel fix that will fix it without requiring any
updates. And if we go the "paths/" path, when a user upgrade their kernels,
they would still need an updated initramfs-tools, but an outdated one does
not regress because of the kernel upgrade. It has already been broken.

Cascardo.

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 16:08           ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 16:08 UTC (permalink / raw)


On Thu, Dec 13, 2018@01:19:25PM +0100, Hannes Reinecke wrote:
> On 12/13/18 12:41 PM, Thadeu Lima de Souza Cascardo wrote:
> > On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
> > > On 12/6/18 5:48 PM, Thadeu Lima de Souza Cascardo wrote:
> > > > Without this exposure, lsblk will fail as it tries to find out the
> > > > device's dev_t numbers. This causes a real problem for nvme multipath
> > > > devices, as their slaves are hidden.
> > > > 
> > > > Exposing them fixes the problem, even though trying to open the devices
> > > > returns an error in the case of nvme multipath. So, right now, it's the
> > > > driver's responsibility to return a failure to open hidden devices.
> > > > 
> > > > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo at canonical.com>
> > > > ---
> > > >    block/genhd.c | 9 ++++-----
> > > >    1 file changed, 4 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/block/genhd.c b/block/genhd.c
> > > > index 7674fce32fca..65a7fa664188 100644
> > > > --- a/block/genhd.c
> > > > +++ b/block/genhd.c
> > > > @@ -687,6 +687,7 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> > > >    	disk_alloc_events(disk);
> > > > +	disk_to_dev(disk)->devt = devt;
> > > >    	if (disk->flags & GENHD_FL_HIDDEN) {
> > > >    		/*
> > > >    		 * Don't let hidden disks show up in /proc/partitions,
> > > > @@ -698,13 +699,12 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
> > > >    		int ret;
> > > >    		/* Register BDI before referencing it from bdev */
> > > > -		disk_to_dev(disk)->devt = devt;
> > > >    		ret = bdi_register_owner(disk->queue->backing_dev_info,
> > > >    						disk_to_dev(disk));
> > > >    		WARN_ON(ret);
> > > > -		blk_register_region(disk_devt(disk), disk->minors, NULL,
> > > > -				    exact_match, exact_lock, disk);
> > > >    	}
> > > > +	blk_register_region(disk_devt(disk), disk->minors, NULL,
> > > > +			    exact_match, exact_lock, disk);
> > > >    	register_disk(parent, disk, groups);
> > > >    	if (register_queue)
> > > >    		blk_register_queue(disk);
> > > > @@ -776,8 +776,7 @@ void del_gendisk(struct gendisk *disk)
> > > >    		WARN_ON(1);
> > > >    	}
> > > > -	if (!(disk->flags & GENHD_FL_HIDDEN))
> > > > -		blk_unregister_region(disk_devt(disk), disk->minors);
> > > > +	blk_unregister_region(disk_devt(disk), disk->minors);
> > > >    	kobject_put(disk->part0.holder_dir);
> > > >    	kobject_put(disk->slave_dir);
> > > > 
> > > Welll ... this is not just 'lsblk', but more importantly this will force
> > > udev to create _block_ device nodes for the hidden devices, essentially
> > > 'unhide' them.
> > > 
> > > Is this what we want?
> > > Christoph?
> > > I thought the entire _point_ of having hidden devices is that the are ...
> > > well ... hidden ...
> > > 
> > 
> > I knew this would be the most controversial patch. And I had other solutions as
> > well, but preferred this one. So, the other alternative would be just not use
> > GENHD_FL_HIDDEN for the nvme devices, which would leave that flag without a
> > single user in the kernel. That would still fix the two problems
> > (initramfs-tools and lsblk), and not create any other problem I know of. That
> > patch would still fail to open the underlying devices when there is a
> > head/multipath associated with it.
> > 
> > So, the only thing GENHD_FL_HIDDEN gives us is saving some resources, like bdi,
> > for example. And we could also use it to fail open when blkdev_get* is called.
> > Of couse, that would still imply that its name should be changed, but as we
> > already have an attribute named after that, I find it hard to suggest such a
> > change.
> > 
> The whole point of the native nvme multipath implementation was that the
> block devices behind the multipathed device are _NOT_ visible to the OS.
> 

They are still partially visible. And the fact that their relationship to
the block device is not visible makes initramfs-tools break, that is, it
can't find the PCI device and its necessary drivers to get to the root
block device.

Also, when we make those relationships clear, lsblk finds those same
"hidden" devices but breaks because it can't find its devno.

> This patch reverts the whole scheme, making it behaving just like the
> classical device-mapper multipathing did.
> 

As you are the expert in multipath, would you say that the behavior is the
same? Do we really only care about showing relationships between devices
and hidden block devices as the only differente between those two
implementations?  If so, I don't think it's worth it. From the little I
know, they are two different multipath implementations, and hiding devices
is orthogonal to their differences.

> Why can't we just update lsblk to present the correct output?
> 
> nvme-cli (and multipath, for that matter) work happily with the current
> setup, _and_ provide the correct topology:
> 
> So why can't we do it with lsblk?

Because old userspace is broken by the patches that introduce slaves and
holders. That is, users upgrade their kernels, and, suddenly, lsblk is
broken, and distros need to offer upgrades.

It's not the same with initramfs-tools. It's broken already. And I am
trying to provide a kernel fix that will fix it without requiring any
updates. And if we go the "paths/" path, when a user upgrade their kernels,
they would still need an updated initramfs-tools, but an outdated one does
not regress because of the kernel upgrade. It has already been broken.

Cascardo.

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 15:25         ` Thadeu Lima de Souza Cascardo
@ 2018-12-13 20:20           ` Christoph Hellwig
  -1 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-13 20:20 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Christoph Hellwig, Hannes Reinecke, linux-nvme, linux-block, Jens Axboe

On Thu, Dec 13, 2018 at 01:25:33PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > And I've checked back - lsblk actually works just fine at the moment.
> > But it turns out once we create the slave links it stops working,
> > which is a really good argument against the first two patches, which
> > would otherwise seem nice..
> 
> Which is why I have sent the "paths/" patchset in the first place. Because I
> did some homework and read the previous discussion about this, and how lsblk
> failure to behave with slave links led to the revert of the slaves/holders
> patch by Dr. Hannes.

Sorry, I did not actually notice that Hannes patch manually created
the same slaves/holders link we otherwise create using the block layer
APIs.  Had I realized those actually were the same that had saved
me some work.

So I guess the v2 paths/ link patch from you is the least of all evils.
Hannes, can you look over that one?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 20:20           ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2018-12-13 20:20 UTC (permalink / raw)


On Thu, Dec 13, 2018@01:25:33PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > And I've checked back - lsblk actually works just fine at the moment.
> > But it turns out once we create the slave links it stops working,
> > which is a really good argument against the first two patches, which
> > would otherwise seem nice..
> 
> Which is why I have sent the "paths/" patchset in the first place. Because I
> did some homework and read the previous discussion about this, and how lsblk
> failure to behave with slave links led to the revert of the slaves/holders
> patch by Dr. Hannes.

Sorry, I did not actually notice that Hannes patch manually created
the same slaves/holders link we otherwise create using the block layer
APIs.  Had I realized those actually were the same that had saved
me some work.

So I guess the v2 paths/ link patch from you is the least of all evils.
Hannes, can you look over that one?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 20:20           ` Christoph Hellwig
@ 2018-12-13 21:00             ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 21:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Hannes Reinecke, linux-nvme, linux-block, Jens Axboe

On Thu, Dec 13, 2018 at 09:20:16PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018 at 01:25:33PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > > And I've checked back - lsblk actually works just fine at the moment.
> > > But it turns out once we create the slave links it stops working,
> > > which is a really good argument against the first two patches, which
> > > would otherwise seem nice..
> > 
> > Which is why I have sent the "paths/" patchset in the first place. Because I
> > did some homework and read the previous discussion about this, and how lsblk
> > failure to behave with slave links led to the revert of the slaves/holders
> > patch by Dr. Hannes.
> 
> Sorry, I did not actually notice that Hannes patch manually created
> the same slaves/holders link we otherwise create using the block layer
> APIs.  Had I realized those actually were the same that had saved
> me some work.
> 
> So I guess the v2 paths/ link patch from you is the least of all evils.
> Hannes, can you look over that one?

No hard feelings. Sorry we had to go through all these messages to make that
clear.

This should be Message-Id: <20181101232955.19329-1-cascardo@canonical.com> and
Hannes was on Cc as well.

Regards.
Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-13 21:00             ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-13 21:00 UTC (permalink / raw)


On Thu, Dec 13, 2018@09:20:16PM +0100, Christoph Hellwig wrote:
> On Thu, Dec 13, 2018@01:25:33PM -0200, Thadeu Lima de Souza Cascardo wrote:
> > > And I've checked back - lsblk actually works just fine at the moment.
> > > But it turns out once we create the slave links it stops working,
> > > which is a really good argument against the first two patches, which
> > > would otherwise seem nice..
> > 
> > Which is why I have sent the "paths/" patchset in the first place. Because I
> > did some homework and read the previous discussion about this, and how lsblk
> > failure to behave with slave links led to the revert of the slaves/holders
> > patch by Dr. Hannes.
> 
> Sorry, I did not actually notice that Hannes patch manually created
> the same slaves/holders link we otherwise create using the block layer
> APIs.  Had I realized those actually were the same that had saved
> me some work.
> 
> So I guess the v2 paths/ link patch from you is the least of all evils.
> Hannes, can you look over that one?

No hard feelings. Sorry we had to go through all these messages to make that
clear.

This should be Message-Id: <20181101232955.19329-1-cascardo at canonical.com> and
Hannes was on Cc as well.

Regards.
Cascardo.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-13 15:25         ` Thadeu Lima de Souza Cascardo
@ 2018-12-14  7:47           ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  7:47 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, Christoph Hellwig
  Cc: linux-nvme, linux-block, Jens Axboe

On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Dec 13, 2018 at 03:32:18PM +0100, Christoph Hellwig wrote:
>> On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
>>> Welll ... this is not just 'lsblk', but more importantly this will force
>>> udev to create _block_ device nodes for the hidden devices, essentially
>>> 'unhide' them.
>>>
>>> Is this what we want?
>>> Christoph?
>>> I thought the entire _point_ of having hidden devices is that the are ...
>>> well ... hidden ...
>>
>> Yes, that is why I really don't like the last two patches.
>>
>> And I've checked back - lsblk actually works just fine at the moment.
>> But it turns out once we create the slave links it stops working,
>> which is a really good argument against the first two patches, which
>> would otherwise seem nice..
> 
> Which is why I have sent the "paths/" patchset in the first place. Because I
> did some homework and read the previous discussion about this, and how lsblk
> failure to behave with slave links led to the revert of the slaves/holders
> patch by Dr. Hannes.
> 
But you haven't answered my question:

Why can't we patch 'lsblk' to provide the required information even with 
the current sysfs layout?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14  7:47           ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  7:47 UTC (permalink / raw)


On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Dec 13, 2018@03:32:18PM +0100, Christoph Hellwig wrote:
>> On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
>>> Welll ... this is not just 'lsblk', but more importantly this will force
>>> udev to create _block_ device nodes for the hidden devices, essentially
>>> 'unhide' them.
>>>
>>> Is this what we want?
>>> Christoph?
>>> I thought the entire _point_ of having hidden devices is that the are ...
>>> well ... hidden ...
>>
>> Yes, that is why I really don't like the last two patches.
>>
>> And I've checked back - lsblk actually works just fine at the moment.
>> But it turns out once we create the slave links it stops working,
>> which is a really good argument against the first two patches, which
>> would otherwise seem nice..
> 
> Which is why I have sent the "paths/" patchset in the first place. Because I
> did some homework and read the previous discussion about this, and how lsblk
> failure to behave with slave links led to the revert of the slaves/holders
> patch by Dr. Hannes.
> 
But you haven't answered my question:

Why can't we patch 'lsblk' to provide the required information even with 
the current sysfs layout?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  7:47           ` Hannes Reinecke
@ 2018-12-14  8:56             ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14  8:56 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
> On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
> > On Thu, Dec 13, 2018 at 03:32:18PM +0100, Christoph Hellwig wrote:
> > > On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
> > > > Welll ... this is not just 'lsblk', but more importantly this will force
> > > > udev to create _block_ device nodes for the hidden devices, essentially
> > > > 'unhide' them.
> > > > 
> > > > Is this what we want?
> > > > Christoph?
> > > > I thought the entire _point_ of having hidden devices is that the are ...
> > > > well ... hidden ...
> > > 
> > > Yes, that is why I really don't like the last two patches.
> > > 
> > > And I've checked back - lsblk actually works just fine at the moment.
> > > But it turns out once we create the slave links it stops working,
> > > which is a really good argument against the first two patches, which
> > > would otherwise seem nice..
> > 
> > Which is why I have sent the "paths/" patchset in the first place. Because I
> > did some homework and read the previous discussion about this, and how lsblk
> > failure to behave with slave links led to the revert of the slaves/holders
> > patch by Dr. Hannes.
> > 
> But you haven't answered my question:
> 
> Why can't we patch 'lsblk' to provide the required information even with the
> current sysfs layout?
> 

I think we could, but with my Ubuntu hat on, after the kernel fix for
initramfs-tools, that is, slaves/holders links, the user will get an updated
kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
require that we backport the lsblk fix, which is not only more work, but there
may be users who only update from -security, which is where kernel updates end
regularly, but not this lsblk fix.

And that kernel update is a regression against that old lsblk version.

Cascardo.

> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14  8:56             ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14  8:56 UTC (permalink / raw)


On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
> On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
> > On Thu, Dec 13, 2018@03:32:18PM +0100, Christoph Hellwig wrote:
> > > On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
> > > > Welll ... this is not just 'lsblk', but more importantly this will force
> > > > udev to create _block_ device nodes for the hidden devices, essentially
> > > > 'unhide' them.
> > > > 
> > > > Is this what we want?
> > > > Christoph?
> > > > I thought the entire _point_ of having hidden devices is that the are ...
> > > > well ... hidden ...
> > > 
> > > Yes, that is why I really don't like the last two patches.
> > > 
> > > And I've checked back - lsblk actually works just fine at the moment.
> > > But it turns out once we create the slave links it stops working,
> > > which is a really good argument against the first two patches, which
> > > would otherwise seem nice..
> > 
> > Which is why I have sent the "paths/" patchset in the first place. Because I
> > did some homework and read the previous discussion about this, and how lsblk
> > failure to behave with slave links led to the revert of the slaves/holders
> > patch by Dr. Hannes.
> > 
> But you haven't answered my question:
> 
> Why can't we patch 'lsblk' to provide the required information even with the
> current sysfs layout?
> 

I think we could, but with my Ubuntu hat on, after the kernel fix for
initramfs-tools, that is, slaves/holders links, the user will get an updated
kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
require that we backport the lsblk fix, which is not only more work, but there
may be users who only update from -security, which is where kernel updates end
regularly, but not this lsblk fix.

And that kernel update is a regression against that old lsblk version.

Cascardo.

> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  8:56             ` Thadeu Lima de Souza Cascardo
@ 2018-12-14  9:06               ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14  9:06 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
> > But you haven't answered my question:
> > 
> > Why can't we patch 'lsblk' to provide the required information even with the
> > current sysfs layout?
> > 
> 

Just to be clear here. If with 'current sysfs layout' you mean without any of
the patches we have been talking about, lsblk is not broken. It just works with
nvme multipath enabled. It will show the multipath paths and simply ignore the
underlying/hidden ones. If we hid them, we meant for them to be hidden, right?

What I am trying to fix here is how to find out which PCI device/driver is
needed to get to the block device holding the root filesystem, which is what
initramfs needs. And the nvme multipath device is a virtual device, pointing to
no driver at all, and no relation to its underlying devices, needed for it to
work.

Cascardo.

> I think we could, but with my Ubuntu hat on, after the kernel fix for
> initramfs-tools, that is, slaves/holders links, the user will get an updated
> kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
> require that we backport the lsblk fix, which is not only more work, but there
> may be users who only update from -security, which is where kernel updates end
> regularly, but not this lsblk fix.
> 
> And that kernel update is a regression against that old lsblk version.
> 
> Cascardo.
> 
> > Cheers,
> > 
> > Hannes
> > -- 
> > Dr. Hannes Reinecke		   Teamlead Storage & Networking
> > hare@suse.de			               +49 911 74053 688
> > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> > HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14  9:06               ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14  9:06 UTC (permalink / raw)


On Fri, Dec 14, 2018@06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
> > But you haven't answered my question:
> > 
> > Why can't we patch 'lsblk' to provide the required information even with the
> > current sysfs layout?
> > 
> 

Just to be clear here. If with 'current sysfs layout' you mean without any of
the patches we have been talking about, lsblk is not broken. It just works with
nvme multipath enabled. It will show the multipath paths and simply ignore the
underlying/hidden ones. If we hid them, we meant for them to be hidden, right?

What I am trying to fix here is how to find out which PCI device/driver is
needed to get to the block device holding the root filesystem, which is what
initramfs needs. And the nvme multipath device is a virtual device, pointing to
no driver at all, and no relation to its underlying devices, needed for it to
work.

Cascardo.

> I think we could, but with my Ubuntu hat on, after the kernel fix for
> initramfs-tools, that is, slaves/holders links, the user will get an updated
> kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
> require that we backport the lsblk fix, which is not only more work, but there
> may be users who only update from -security, which is where kernel updates end
> regularly, but not this lsblk fix.
> 
> And that kernel update is a regression against that old lsblk version.
> 
> Cascardo.
> 
> > Cheers,
> > 
> > Hannes
> > -- 
> > Dr. Hannes Reinecke		   Teamlead Storage & Networking
> > hare at suse.de			               +49 911 74053 688
> > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> > GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> > HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  8:56             ` Thadeu Lima de Souza Cascardo
@ 2018-12-14  9:44               ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  9:44 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On 12/14/18 9:56 AM, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
>> On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
>>> On Thu, Dec 13, 2018 at 03:32:18PM +0100, Christoph Hellwig wrote:
>>>> On Thu, Dec 13, 2018 at 10:18:40AM +0100, Hannes Reinecke wrote:
>>>>> Welll ... this is not just 'lsblk', but more importantly this will force
>>>>> udev to create _block_ device nodes for the hidden devices, essentially
>>>>> 'unhide' them.
>>>>>
>>>>> Is this what we want?
>>>>> Christoph?
>>>>> I thought the entire _point_ of having hidden devices is that the are ...
>>>>> well ... hidden ...
>>>>
>>>> Yes, that is why I really don't like the last two patches.
>>>>
>>>> And I've checked back - lsblk actually works just fine at the moment.
>>>> But it turns out once we create the slave links it stops working,
>>>> which is a really good argument against the first two patches, which
>>>> would otherwise seem nice..
>>>
>>> Which is why I have sent the "paths/" patchset in the first place. Because I
>>> did some homework and read the previous discussion about this, and how lsblk
>>> failure to behave with slave links led to the revert of the slaves/holders
>>> patch by Dr. Hannes.
>>>
>> But you haven't answered my question:
>>
>> Why can't we patch 'lsblk' to provide the required information even with the
>> current sysfs layout?
>>
> 
> I think we could, but with my Ubuntu hat on, after the kernel fix for
> initramfs-tools, that is, slaves/holders links, the user will get an updated
> kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
> require that we backport the lsblk fix, which is not only more work, but there
> may be users who only update from -security, which is where kernel updates end
> regularly, but not this lsblk fix.
> 
> And that kernel update is a regression against that old lsblk version.
> 
Now you get me confused.
Which kernel update you are referring to?
We do _not_ provide any 'slave' link with the current upstream, so I 
somewhat fail to see which breakage you are referring to ...

Confused,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14  9:44               ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  9:44 UTC (permalink / raw)


On 12/14/18 9:56 AM, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
>> On 12/13/18 4:25 PM, Thadeu Lima de Souza Cascardo wrote:
>>> On Thu, Dec 13, 2018@03:32:18PM +0100, Christoph Hellwig wrote:
>>>> On Thu, Dec 13, 2018@10:18:40AM +0100, Hannes Reinecke wrote:
>>>>> Welll ... this is not just 'lsblk', but more importantly this will force
>>>>> udev to create _block_ device nodes for the hidden devices, essentially
>>>>> 'unhide' them.
>>>>>
>>>>> Is this what we want?
>>>>> Christoph?
>>>>> I thought the entire _point_ of having hidden devices is that the are ...
>>>>> well ... hidden ...
>>>>
>>>> Yes, that is why I really don't like the last two patches.
>>>>
>>>> And I've checked back - lsblk actually works just fine at the moment.
>>>> But it turns out once we create the slave links it stops working,
>>>> which is a really good argument against the first two patches, which
>>>> would otherwise seem nice..
>>>
>>> Which is why I have sent the "paths/" patchset in the first place. Because I
>>> did some homework and read the previous discussion about this, and how lsblk
>>> failure to behave with slave links led to the revert of the slaves/holders
>>> patch by Dr. Hannes.
>>>
>> But you haven't answered my question:
>>
>> Why can't we patch 'lsblk' to provide the required information even with the
>> current sysfs layout?
>>
> 
> I think we could, but with my Ubuntu hat on, after the kernel fix for
> initramfs-tools, that is, slaves/holders links, the user will get an updated
> kernel that breaks the current lsblk on Bionic (Ubuntu 18.04). That will
> require that we backport the lsblk fix, which is not only more work, but there
> may be users who only update from -security, which is where kernel updates end
> regularly, but not this lsblk fix.
> 
> And that kernel update is a regression against that old lsblk version.
> 
Now you get me confused.
Which kernel update you are referring to?
We do _not_ provide any 'slave' link with the current upstream, so I 
somewhat fail to see which breakage you are referring to ...

Confused,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  9:06               ` Thadeu Lima de Souza Cascardo
@ 2018-12-14  9:54                 ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  9:54 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
>> On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
>>> But you haven't answered my question:
>>>
>>> Why can't we patch 'lsblk' to provide the required information even with the
>>> current sysfs layout?
>>>
>>
> 
> Just to be clear here. If with 'current sysfs layout' you mean without any of
> the patches we have been talking about, lsblk is not broken. It just works with
> nvme multipath enabled. It will show the multipath paths and simply ignore the
> underlying/hidden ones. If we hid them, we meant for them to be hidden, right?
> 
> What I am trying to fix here is how to find out which PCI device/driver is
> needed to get to the block device holding the root filesystem, which is what
> initramfs needs. And the nvme multipath device is a virtual device, pointing to
> no driver at all, and no relation to its underlying devices, needed for it to
> work.
> 

Well ...
But this is an entirely different proposition.
The 'slaves'/'holders' trick just allows to map the relationship between 
_block_ devices, which arguably is a bit pointless here seeing that we 
don't actually have block devices for the underlying devices.
But even if we _were_ implementing that you would still fail to get to 
the PCI device providing the block devices as there is no link pointing 
from one to another.

With the currently layout we have this hierarchy:

NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller

and the NVMe controller is missing a link pointing to the device 
presenting the controller:

# ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
total 0
-r--r--r-- 1 root root 4096 Dec 13 13:18 address
-r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
--w------- 1 root root 4096 Dec 13 13:18 delete_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 dev
lrwxrwxrwx 1 root root    0 Dec 13 13:18 device -> ../../ctl
-r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
-r--r--r-- 1 root root 4096 Dec 13 13:18 model
drwxr-xr-x 9 root root    0 Dec  3 13:55 nvme2c64n1
drwxr-xr-x 2 root root    0 Dec 13 13:18 power
--w------- 1 root root 4096 Dec 13 13:18 rescan_controller
--w------- 1 root root 4096 Dec 13 13:18 reset_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 serial
-r--r--r-- 1 root root 4096 Dec 13 13:18 state
-r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
lrwxrwxrwx 1 root root    0 Dec  3 13:44 subsystem -> 
../../../../../class/nvme
-r--r--r-- 1 root root 4096 Dec 13 13:18 transport
-rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent

So what we need to do is to update the 'device' link to point to the PCI 
device providing the controller.
(Actually, we would need to point the 'device' link to point to the 
entity providing the transport address, but I guess we don't have that 
for now.)

And _that's_ what we need to fix; the slaves/holders stuff doesn't solve 
the underlying problem, and really shouldn't be merged at all.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14  9:54                 ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14  9:54 UTC (permalink / raw)


On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Dec 14, 2018@06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
>> On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
>>> But you haven't answered my question:
>>>
>>> Why can't we patch 'lsblk' to provide the required information even with the
>>> current sysfs layout?
>>>
>>
> 
> Just to be clear here. If with 'current sysfs layout' you mean without any of
> the patches we have been talking about, lsblk is not broken. It just works with
> nvme multipath enabled. It will show the multipath paths and simply ignore the
> underlying/hidden ones. If we hid them, we meant for them to be hidden, right?
> 
> What I am trying to fix here is how to find out which PCI device/driver is
> needed to get to the block device holding the root filesystem, which is what
> initramfs needs. And the nvme multipath device is a virtual device, pointing to
> no driver at all, and no relation to its underlying devices, needed for it to
> work.
> 

Well ...
But this is an entirely different proposition.
The 'slaves'/'holders' trick just allows to map the relationship between 
_block_ devices, which arguably is a bit pointless here seeing that we 
don't actually have block devices for the underlying devices.
But even if we _were_ implementing that you would still fail to get to 
the PCI device providing the block devices as there is no link pointing 
from one to another.

With the currently layout we have this hierarchy:

NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller

and the NVMe controller is missing a link pointing to the device 
presenting the controller:

# ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
total 0
-r--r--r-- 1 root root 4096 Dec 13 13:18 address
-r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
--w------- 1 root root 4096 Dec 13 13:18 delete_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 dev
lrwxrwxrwx 1 root root    0 Dec 13 13:18 device -> ../../ctl
-r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
-r--r--r-- 1 root root 4096 Dec 13 13:18 model
drwxr-xr-x 9 root root    0 Dec  3 13:55 nvme2c64n1
drwxr-xr-x 2 root root    0 Dec 13 13:18 power
--w------- 1 root root 4096 Dec 13 13:18 rescan_controller
--w------- 1 root root 4096 Dec 13 13:18 reset_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 serial
-r--r--r-- 1 root root 4096 Dec 13 13:18 state
-r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
lrwxrwxrwx 1 root root    0 Dec  3 13:44 subsystem -> 
../../../../../class/nvme
-r--r--r-- 1 root root 4096 Dec 13 13:18 transport
-rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent

So what we need to do is to update the 'device' link to point to the PCI 
device providing the controller.
(Actually, we would need to point the 'device' link to point to the 
entity providing the transport address, but I guess we don't have that 
for now.)

And _that's_ what we need to fix; the slaves/holders stuff doesn't solve 
the underlying problem, and really shouldn't be merged at all.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  9:54                 ` Hannes Reinecke
@ 2018-12-14 10:18                   ` Hannes Reinecke
  -1 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14 10:18 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On 12/14/18 10:54 AM, Hannes Reinecke wrote:
> On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
>> On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza 
>> Cascardo wrote:
>>> On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
>>>> But you haven't answered my question:
>>>>
>>>> Why can't we patch 'lsblk' to provide the required information even 
>>>> with the
>>>> current sysfs layout?
>>>>
>>>
>>
>> Just to be clear here. If with 'current sysfs layout' you mean without 
>> any of
>> the patches we have been talking about, lsblk is not broken. It just 
>> works with
>> nvme multipath enabled. It will show the multipath paths and simply 
>> ignore the
>> underlying/hidden ones. If we hid them, we meant for them to be 
>> hidden, right?
>>
>> What I am trying to fix here is how to find out which PCI 
>> device/driver is
>> needed to get to the block device holding the root filesystem, which 
>> is what
>> initramfs needs. And the nvme multipath device is a virtual device, 
>> pointing to
>> no driver at all, and no relation to its underlying devices, needed 
>> for it to
>> work.
>>
> 
> Well ...
> But this is an entirely different proposition.
> The 'slaves'/'holders' trick just allows to map the relationship between 
> _block_ devices, which arguably is a bit pointless here seeing that we 
> don't actually have block devices for the underlying devices.
> But even if we _were_ implementing that you would still fail to get to 
> the PCI device providing the block devices as there is no link pointing 
> from one to another.
> 
> With the currently layout we have this hierarchy:
> 
> NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller
> 
> and the NVMe controller is missing a link pointing to the device 
> presenting the controller:
> 
> # ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
> total 0
> -r--r--r-- 1 root root 4096 Dec 13 13:18 address
> -r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
> --w------- 1 root root 4096 Dec 13 13:18 delete_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 dev
> lrwxrwxrwx 1 root root    0 Dec 13 13:18 device -> ../../ctl
> -r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
> -r--r--r-- 1 root root 4096 Dec 13 13:18 model
> drwxr-xr-x 9 root root    0 Dec  3 13:55 nvme2c64n1
> drwxr-xr-x 2 root root    0 Dec 13 13:18 power
> --w------- 1 root root 4096 Dec 13 13:18 rescan_controller
> --w------- 1 root root 4096 Dec 13 13:18 reset_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 serial
> -r--r--r-- 1 root root 4096 Dec 13 13:18 state
> -r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
> lrwxrwxrwx 1 root root    0 Dec  3 13:44 subsystem -> 
> ../../../../../class/nvme
> -r--r--r-- 1 root root 4096 Dec 13 13:18 transport
> -rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent
> 
> So what we need to do is to update the 'device' link to point to the PCI 
> device providing the controller.
> (Actually, we would need to point the 'device' link to point to the 
> entity providing the transport address, but I guess we don't have that 
> for now.)
> 
> And _that's_ what we need to fix; the slaves/holders stuff doesn't solve 
> the underlying problem, and really shouldn't be merged at all.
> 
Mind you, it _does_ work for PCI-NVMe:

# ls -l /sys/class/nvme/nvme0
total 0
-r--r--r--  1 root root 4096 Dec 14 11:14 cntlid
-r--r--r--  1 root root 4096 Dec 14 11:14 dev
lrwxrwxrwx  1 root root    0 Dec 14 11:14 device -> ../../../0000:45:00.0
-r--r--r--  1 root root 4096 Dec 14 11:14 firmware_rev
-r--r--r--  1 root root 4096 Dec 14 11:14 model
drwxr-xr-x 12 root root    0 Dec  3 13:43 nvme1n1
drwxr-xr-x  2 root root    0 Dec 14 11:14 power
--w-------  1 root root 4096 Dec 14 11:14 rescan_controller
--w-------  1 root root 4096 Dec 14 11:14 reset_controller
-r--r--r--  1 root root 4096 Dec 14 11:14 serial
-r--r--r--  1 root root 4096 Dec 14 11:14 state
-r--r--r--  1 root root 4096 Dec 14 11:14 subsysnqn
lrwxrwxrwx  1 root root    0 Dec  3 13:43 subsystem -> 
../../../../../../class/nvme
-r--r--r--  1 root root 4096 Dec 14 11:14 transport
-rw-r--r--  1 root root 4096 Dec 14 11:14 uevent

So it might be as simple as this patch:

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index feb86b59170e..1ecdec6b8b4a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3117,7 +3117,7 @@ nvme_fc_init_ctrl(struct device *dev, struct 
nvmf_ctrl_options *opts,
          * Defer this to the connect path.
          */

-       ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
+       ret = nvme_init_ctrl(&ctrl->ctrl, ctrl->dev, &nvme_fc_ctrl_ops, 0);
         if (ret)
                 goto out_cleanup_admin_q;


As for RDMA / TCP we're running on a network address which really isn't 
tied to a specific device, so we wouldn't have any device to hook on 
without some trickery.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14 10:18                   ` Hannes Reinecke
  0 siblings, 0 replies; 56+ messages in thread
From: Hannes Reinecke @ 2018-12-14 10:18 UTC (permalink / raw)


On 12/14/18 10:54 AM, Hannes Reinecke wrote:
> On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
>> On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza 
>> Cascardo wrote:
>>> On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
>>>> But you haven't answered my question:
>>>>
>>>> Why can't we patch 'lsblk' to provide the required information even 
>>>> with the
>>>> current sysfs layout?
>>>>
>>>
>>
>> Just to be clear here. If with 'current sysfs layout' you mean without 
>> any of
>> the patches we have been talking about, lsblk is not broken. It just 
>> works with
>> nvme multipath enabled. It will show the multipath paths and simply 
>> ignore the
>> underlying/hidden ones. If we hid them, we meant for them to be 
>> hidden, right?
>>
>> What I am trying to fix here is how to find out which PCI 
>> device/driver is
>> needed to get to the block device holding the root filesystem, which 
>> is what
>> initramfs needs. And the nvme multipath device is a virtual device, 
>> pointing to
>> no driver at all, and no relation to its underlying devices, needed 
>> for it to
>> work.
>>
> 
> Well ...
> But this is an entirely different proposition.
> The 'slaves'/'holders' trick just allows to map the relationship between 
> _block_ devices, which arguably is a bit pointless here seeing that we 
> don't actually have block devices for the underlying devices.
> But even if we _were_ implementing that you would still fail to get to 
> the PCI device providing the block devices as there is no link pointing 
> from one to another.
> 
> With the currently layout we have this hierarchy:
> 
> NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller
> 
> and the NVMe controller is missing a link pointing to the device 
> presenting the controller:
> 
> # ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
> total 0
> -r--r--r-- 1 root root 4096 Dec 13 13:18 address
> -r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
> --w------- 1 root root 4096 Dec 13 13:18 delete_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 dev
> lrwxrwxrwx 1 root root??? 0 Dec 13 13:18 device -> ../../ctl
> -r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
> -r--r--r-- 1 root root 4096 Dec 13 13:18 model
> drwxr-xr-x 9 root root??? 0 Dec? 3 13:55 nvme2c64n1
> drwxr-xr-x 2 root root??? 0 Dec 13 13:18 power
> --w------- 1 root root 4096 Dec 13 13:18 rescan_controller
> --w------- 1 root root 4096 Dec 13 13:18 reset_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 serial
> -r--r--r-- 1 root root 4096 Dec 13 13:18 state
> -r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
> lrwxrwxrwx 1 root root??? 0 Dec? 3 13:44 subsystem -> 
> ../../../../../class/nvme
> -r--r--r-- 1 root root 4096 Dec 13 13:18 transport
> -rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent
> 
> So what we need to do is to update the 'device' link to point to the PCI 
> device providing the controller.
> (Actually, we would need to point the 'device' link to point to the 
> entity providing the transport address, but I guess we don't have that 
> for now.)
> 
> And _that's_ what we need to fix; the slaves/holders stuff doesn't solve 
> the underlying problem, and really shouldn't be merged at all.
> 
Mind you, it _does_ work for PCI-NVMe:

# ls -l /sys/class/nvme/nvme0
total 0
-r--r--r--  1 root root 4096 Dec 14 11:14 cntlid
-r--r--r--  1 root root 4096 Dec 14 11:14 dev
lrwxrwxrwx  1 root root    0 Dec 14 11:14 device -> ../../../0000:45:00.0
-r--r--r--  1 root root 4096 Dec 14 11:14 firmware_rev
-r--r--r--  1 root root 4096 Dec 14 11:14 model
drwxr-xr-x 12 root root    0 Dec  3 13:43 nvme1n1
drwxr-xr-x  2 root root    0 Dec 14 11:14 power
--w-------  1 root root 4096 Dec 14 11:14 rescan_controller
--w-------  1 root root 4096 Dec 14 11:14 reset_controller
-r--r--r--  1 root root 4096 Dec 14 11:14 serial
-r--r--r--  1 root root 4096 Dec 14 11:14 state
-r--r--r--  1 root root 4096 Dec 14 11:14 subsysnqn
lrwxrwxrwx  1 root root    0 Dec  3 13:43 subsystem -> 
../../../../../../class/nvme
-r--r--r--  1 root root 4096 Dec 14 11:14 transport
-rw-r--r--  1 root root 4096 Dec 14 11:14 uevent

So it might be as simple as this patch:

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index feb86b59170e..1ecdec6b8b4a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3117,7 +3117,7 @@ nvme_fc_init_ctrl(struct device *dev, struct 
nvmf_ctrl_options *opts,
          * Defer this to the connect path.
          */

-       ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
+       ret = nvme_init_ctrl(&ctrl->ctrl, ctrl->dev, &nvme_fc_ctrl_ops, 0);
         if (ret)
                 goto out_cleanup_admin_q;


As for RDMA / TCP we're running on a network address which really isn't 
tied to a specific device, so we wouldn't have any device to hook on 
without some trickery.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
  2018-12-14  9:54                 ` Hannes Reinecke
@ 2018-12-14 11:09                   ` Thadeu Lima de Souza Cascardo
  -1 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14 11:09 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, linux-nvme, linux-block, Jens Axboe

On Fri, Dec 14, 2018 at 10:54:04AM +0100, Hannes Reinecke wrote:
> On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
> > > On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
> > > > But you haven't answered my question:
> > > > 
> > > > Why can't we patch 'lsblk' to provide the required information even with the
> > > > current sysfs layout?
> > > > 
> > > 
> > 
> > Just to be clear here. If with 'current sysfs layout' you mean without any of
> > the patches we have been talking about, lsblk is not broken. It just works with
> > nvme multipath enabled. It will show the multipath paths and simply ignore the
> > underlying/hidden ones. If we hid them, we meant for them to be hidden, right?
> > 
> > What I am trying to fix here is how to find out which PCI device/driver is
> > needed to get to the block device holding the root filesystem, which is what
> > initramfs needs. And the nvme multipath device is a virtual device, pointing to
> > no driver at all, and no relation to its underlying devices, needed for it to
> > work.
> > 
> 
> Well ...
> But this is an entirely different proposition.
> The 'slaves'/'holders' trick just allows to map the relationship between
> _block_ devices, which arguably is a bit pointless here seeing that we don't
> actually have block devices for the underlying devices.
> But even if we _were_ implementing that you would still fail to get to the
> PCI device providing the block devices as there is no link pointing from one
> to another.

Well, initramfs-tools does traverse the slaves because your root filesystem
could be on top of a device-mapped block device. I will try your other patch in
any case and I will see if that fixes the problem I have at hand.

Thanks.
Cascardo.

> 
> With the currently layout we have this hierarchy:
> 
> NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller
> 
> and the NVMe controller is missing a link pointing to the device presenting
> the controller:
> 
> # ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
> total 0
> -r--r--r-- 1 root root 4096 Dec 13 13:18 address
> -r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
> --w------- 1 root root 4096 Dec 13 13:18 delete_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 dev
> lrwxrwxrwx 1 root root    0 Dec 13 13:18 device -> ../../ctl
> -r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
> -r--r--r-- 1 root root 4096 Dec 13 13:18 model
> drwxr-xr-x 9 root root    0 Dec  3 13:55 nvme2c64n1
> drwxr-xr-x 2 root root    0 Dec 13 13:18 power
> --w------- 1 root root 4096 Dec 13 13:18 rescan_controller
> --w------- 1 root root 4096 Dec 13 13:18 reset_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 serial
> -r--r--r-- 1 root root 4096 Dec 13 13:18 state
> -r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
> lrwxrwxrwx 1 root root    0 Dec  3 13:44 subsystem ->
> ../../../../../class/nvme
> -r--r--r-- 1 root root 4096 Dec 13 13:18 transport
> -rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent
> 
> So what we need to do is to update the 'device' link to point to the PCI
> device providing the controller.
> (Actually, we would need to point the 'device' link to point to the entity
> providing the transport address, but I guess we don't have that for now.)
> 
> And _that's_ what we need to fix; the slaves/holders stuff doesn't solve the
> underlying problem, and really shouldn't be merged at all.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks
@ 2018-12-14 11:09                   ` Thadeu Lima de Souza Cascardo
  0 siblings, 0 replies; 56+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2018-12-14 11:09 UTC (permalink / raw)


On Fri, Dec 14, 2018@10:54:04AM +0100, Hannes Reinecke wrote:
> On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
> > On Fri, Dec 14, 2018@06:56:06AM -0200, Thadeu Lima de Souza Cascardo wrote:
> > > On Fri, Dec 14, 2018@08:47:20AM +0100, Hannes Reinecke wrote:
> > > > But you haven't answered my question:
> > > > 
> > > > Why can't we patch 'lsblk' to provide the required information even with the
> > > > current sysfs layout?
> > > > 
> > > 
> > 
> > Just to be clear here. If with 'current sysfs layout' you mean without any of
> > the patches we have been talking about, lsblk is not broken. It just works with
> > nvme multipath enabled. It will show the multipath paths and simply ignore the
> > underlying/hidden ones. If we hid them, we meant for them to be hidden, right?
> > 
> > What I am trying to fix here is how to find out which PCI device/driver is
> > needed to get to the block device holding the root filesystem, which is what
> > initramfs needs. And the nvme multipath device is a virtual device, pointing to
> > no driver at all, and no relation to its underlying devices, needed for it to
> > work.
> > 
> 
> Well ...
> But this is an entirely different proposition.
> The 'slaves'/'holders' trick just allows to map the relationship between
> _block_ devices, which arguably is a bit pointless here seeing that we don't
> actually have block devices for the underlying devices.
> But even if we _were_ implementing that you would still fail to get to the
> PCI device providing the block devices as there is no link pointing from one
> to another.

Well, initramfs-tools does traverse the slaves because your root filesystem
could be on top of a device-mapped block device. I will try your other patch in
any case and I will see if that fixes the problem I have at hand.

Thanks.
Cascardo.

> 
> With the currently layout we have this hierarchy:
> 
> NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller
> 
> and the NVMe controller is missing a link pointing to the device presenting
> the controller:
> 
> # ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
> total 0
> -r--r--r-- 1 root root 4096 Dec 13 13:18 address
> -r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
> --w------- 1 root root 4096 Dec 13 13:18 delete_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 dev
> lrwxrwxrwx 1 root root    0 Dec 13 13:18 device -> ../../ctl
> -r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
> -r--r--r-- 1 root root 4096 Dec 13 13:18 model
> drwxr-xr-x 9 root root    0 Dec  3 13:55 nvme2c64n1
> drwxr-xr-x 2 root root    0 Dec 13 13:18 power
> --w------- 1 root root 4096 Dec 13 13:18 rescan_controller
> --w------- 1 root root 4096 Dec 13 13:18 reset_controller
> -r--r--r-- 1 root root 4096 Dec 13 13:18 serial
> -r--r--r-- 1 root root 4096 Dec 13 13:18 state
> -r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
> lrwxrwxrwx 1 root root    0 Dec  3 13:44 subsystem ->
> ../../../../../class/nvme
> -r--r--r-- 1 root root 4096 Dec 13 13:18 transport
> -rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent
> 
> So what we need to do is to update the 'device' link to point to the PCI
> device providing the controller.
> (Actually, we would need to point the 'device' link to point to the entity
> providing the transport address, but I guess we don't have that for now.)
> 
> And _that's_ what we need to fix; the slaves/holders stuff doesn't solve the
> underlying problem, and really shouldn't be merged at all.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-12-14 11:09 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-06 16:48 [PATCH 0/4] nvme multipath: expose slaves/holders Thadeu Lima de Souza Cascardo
2018-12-06 16:48 ` Thadeu Lima de Souza Cascardo
2018-12-06 16:48 ` [PATCH 1/4] block: move holder tracking from struct block_device to hd_struct Thadeu Lima de Souza Cascardo
2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
2018-12-13  9:14   ` Hannes Reinecke
2018-12-13  9:14     ` Hannes Reinecke
2018-12-06 16:48 ` [PATCH 2/4] nvme: create slaves/holder entries for multipath devices Thadeu Lima de Souza Cascardo
2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
2018-12-13  9:15   ` Hannes Reinecke
2018-12-13  9:15     ` Hannes Reinecke
2018-12-06 16:48 ` [PATCH 3/4] nvme: Should not warn when a disk path is opened Thadeu Lima de Souza Cascardo
2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
2018-12-13  9:16   ` Hannes Reinecke
2018-12-13  9:16     ` Hannes Reinecke
2018-12-06 16:48 ` [PATCH 4/4] block: expose devt for GENHD_FL_HIDDEN disks Thadeu Lima de Souza Cascardo
2018-12-06 16:48   ` Thadeu Lima de Souza Cascardo
2018-12-06 20:22   ` Christoph Hellwig
2018-12-06 20:22     ` Christoph Hellwig
2018-12-12  8:32     ` Christoph Hellwig
2018-12-12  8:32       ` Christoph Hellwig
2018-12-12 12:39     ` Thadeu Lima de Souza Cascardo
2018-12-12 12:39       ` Thadeu Lima de Souza Cascardo
2018-12-13  9:18   ` Hannes Reinecke
2018-12-13  9:18     ` Hannes Reinecke
2018-12-13 11:41     ` Thadeu Lima de Souza Cascardo
2018-12-13 11:41       ` Thadeu Lima de Souza Cascardo
2018-12-13 12:19       ` Hannes Reinecke
2018-12-13 12:19         ` Hannes Reinecke
2018-12-13 16:08         ` Thadeu Lima de Souza Cascardo
2018-12-13 16:08           ` Thadeu Lima de Souza Cascardo
2018-12-13 14:32     ` Christoph Hellwig
2018-12-13 14:32       ` Christoph Hellwig
2018-12-13 15:25       ` Thadeu Lima de Souza Cascardo
2018-12-13 15:25         ` Thadeu Lima de Souza Cascardo
2018-12-13 20:20         ` Christoph Hellwig
2018-12-13 20:20           ` Christoph Hellwig
2018-12-13 21:00           ` Thadeu Lima de Souza Cascardo
2018-12-13 21:00             ` Thadeu Lima de Souza Cascardo
2018-12-14  7:47         ` Hannes Reinecke
2018-12-14  7:47           ` Hannes Reinecke
2018-12-14  8:56           ` Thadeu Lima de Souza Cascardo
2018-12-14  8:56             ` Thadeu Lima de Souza Cascardo
2018-12-14  9:06             ` Thadeu Lima de Souza Cascardo
2018-12-14  9:06               ` Thadeu Lima de Souza Cascardo
2018-12-14  9:54               ` Hannes Reinecke
2018-12-14  9:54                 ` Hannes Reinecke
2018-12-14 10:18                 ` Hannes Reinecke
2018-12-14 10:18                   ` Hannes Reinecke
2018-12-14 11:09                 ` Thadeu Lima de Souza Cascardo
2018-12-14 11:09                   ` Thadeu Lima de Souza Cascardo
2018-12-14  9:44             ` Hannes Reinecke
2018-12-14  9:44               ` Hannes Reinecke
2018-12-13  9:33 ` [PATCH 0/4] nvme multipath: expose slaves/holders Johannes Thumshirn
2018-12-13  9:33   ` Johannes Thumshirn
2018-12-13 11:35   ` Thadeu Lima de Souza Cascardo
2018-12-13 11:35     ` Thadeu Lima de Souza Cascardo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.