linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] block-layer interposer
@ 2021-02-03 15:53 Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 1/6] docs: device-mapper: add remap_and_filter Sergei Shtepa
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

Hi all.

I'm ready to suggest the block-layer interposer (blk_interposer).
blk_interposer allows to intercept bio requests, remap bio to another
devices or add new bios.

This patch series adds support blk_interposer for dm-linear.

In the first patch, I suggest the remap_and_filter.rst file.
Yes, Mike, it's probably too early for documentation, but maybe it will be
interesting for someone. In the documentation I tried to explain
the purpose of blk_interposer and what prospects it opens up.

The second patch is dedicated to blk_interposer itself, which provides
the ability to intercept bio.

The third - adds the function blk_mq_is_queue_frozen(). It allows to
assert a queue state.

The fourth one adds support for blk_interposer from the device mapper.
Added ioctl DM_DEV_REMAP_CMD.

In the fifth - added the 'noexcl' option for dm-linear, which allows
to open the underlying block-device without the FMODE_EXCL mode.
This allows to create a dm device to which can redirect bio requests
using DM_DEV_REMAP_CMD.

Here I would like to discuss the implementation.
I don't like to play with flags. Adding the non_exclusive flags creates
an excessive complexity. An alternative solution is to explicitly specify
the FMODE_EXCL mode for each dm-target. The problem is that in this case,
we will have to change each dm-target. It doesn't look cool either.

The latest patch changes linear.rst with the description of the 'noexcl'
option that is added for dm-linear.

A little history of changes:

v4 - current patch set
Mostly changes were made, due to Damien's comments:
 * on the design of the code;
 * by the patch set organization;
 * bug with passing a wrong parameter to dm_get_device();
 * description of the 'noexcl' parameter in the linear.rst.
Also added remap_and_filter.rst.

v3 - https://patchwork.kernel.org/project/linux-block/cover/1611853955-32167-1-git-send-email-sergei.shtepa@veeam.com/
In this version, I already suggested blk_interposer to apply to dm-linear.
Problems were solved:
 * Interception of bio requests from a specific device on the disk, not
   from the entire disk. To do this, we added the dm_interposed_dev
   structure and an interval tree to store these structures.
 * Implemented ioctl DM_DEV_REMAP_CMD. A patch with changes in the lvm2
   project was sent to the team lvm-devel@redhat.com.
 * Added the 'noexcl' option for dm-linear, which allows you to open
   the underlying block-device without FMODE_EXCL mode.

v2 - https://patchwork.kernel.org/project/linux-block/cover/1607518911-30692-1-git-send-email-sergei.shtepa@veeam.com/
I tried to suggest blk_interposer without using it in device mapper,
but with the addition of a sample of its use. It was then that I learned
about the maintainers' attitudes towards the samples directory :).

v1 - https://lwn.net/ml/linux-block/20201119164924.74401-1-hare@suse.de/
This Hannes's patch can be considered as a starting point, since this is
where the interception mechanism and the term blk_interposer itself
appeared. It became clear that blk_interposer can be useful for
device mapper.

before v1 - https://patchwork.kernel.org/project/linux-block/cover/1603271049-20681-1-git-send-email-sergei.shtepa@veeam.com/
I tried to offer a rather cumbersome blk-filter and a monster-like
blk-snap module for creating snapshots.

Thank you to everyone who was able to take the time to review
the previous versions.
I hope that this time I achieved the required quality.

Thanks,
Sergei.

Sergei Shtepa (6):
  docs: device-mapper: add remap_and_filter
  block: add blk_interposer
  block: add blk_mq_is_queue_frozen()
  dm: new ioctl DM_DEV_REMAP_CMD
  dm: add 'noexcl' option for dm-linear
  docs: device-mapper: 'noexcl' option for dm-linear

 .../admin-guide/device-mapper/index.rst       |   1 +
 .../admin-guide/device-mapper/linear.rst      |  26 +-
 .../device-mapper/remap_and_filter.rst        | 132 ++++++
 block/bio.c                                   |   2 +
 block/blk-core.c                              |  33 ++
 block/blk-mq.c                                |  13 +
 block/genhd.c                                 |  82 ++++
 drivers/md/dm-core.h                          |  20 +
 drivers/md/dm-ioctl.c                         |  39 ++
 drivers/md/dm-linear.c                        |  14 +-
 drivers/md/dm-table.c                         |  12 +-
 drivers/md/dm.c                               | 400 +++++++++++++++++-
 drivers/md/dm.h                               |   2 +-
 include/linux/blk-mq.h                        |   1 +
 include/linux/blk_types.h                     |   6 +-
 include/linux/device-mapper.h                 |   7 +
 include/linux/genhd.h                         |  18 +
 include/uapi/linux/dm-ioctl.h                 |  15 +-
 18 files changed, 792 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/admin-guide/device-mapper/remap_and_filter.rst

--
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] docs: device-mapper: add remap_and_filter
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  2021-02-04  4:15   ` Randy Dunlap
  2021-02-03 15:53 ` [PATCH v4 2/6] block: add blk_interposer Sergei Shtepa
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

remap_and_filter - describes the new features that
blk_interposer provides for device mapper.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 .../admin-guide/device-mapper/index.rst       |   1 +
 .../device-mapper/remap_and_filter.rst        | 132 ++++++++++++++++++
 2 files changed, 133 insertions(+)
 create mode 100644 Documentation/admin-guide/device-mapper/remap_and_filter.rst

diff --git a/Documentation/admin-guide/device-mapper/index.rst b/Documentation/admin-guide/device-mapper/index.rst
index 6cf8adc86fa8..e868d5bbec7e 100644
--- a/Documentation/admin-guide/device-mapper/index.rst
+++ b/Documentation/admin-guide/device-mapper/index.rst
@@ -27,6 +27,7 @@ Device Mapper
     linear
     log-writes
     persistent-data
+    remap_and_filter
     snapshot
     statistics
     striped
diff --git a/Documentation/admin-guide/device-mapper/remap_and_filter.rst b/Documentation/admin-guide/device-mapper/remap_and_filter.rst
new file mode 100644
index 000000000000..616b67998305
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/remap_and_filter.rst
@@ -0,0 +1,132 @@
+=================
+DM remap & filter
+=================
+
+Introduction
+============
+
+Usually LVM should be used for new devices.
+The administrator have to create logical volumes for the system partition
+when installing the operating system. For a running system with
+partitioned disk space and mounted file systems, it is quite difficult to
+reconfigure to logical volumes. As a result, all the features that Device
+Mapper provides are not available for non-LVM systems.
+This problem is partially solved by the dm remap functionality, which
+uses the kernel's blk_interposer.
+
+blk_interposer
+==============
+
+Blk_interposer extends the capabilities of the DM, as it allows to
+intercept and redirect bio requests for block devices that are not
+dm device. At the same time, blk_interposer allows to attach and detach
+from devices "on the fly", without stopping the execution of user
+programs.
+
+Blk_interposer allows to do two tasks: remap and filter.
+Remap allows to redirect all requests from one block device to another.
+Filter allows to do additional processing of the request, but without
+redirection. An intercepted request can get to the block device to which
+it was addressed, without changes.
+
+Remap
+=====
+
+Consider the functionality of the remap. This will allow to connect
+any block device with a dm device "on the fly".
+Suppose we have a file system mounted on the block device /dev/sda1::
+
+  +-------------+
+  | file system |
+  +-------------+
+        ||
+        \/
+  +-------------+
+  | /dev/sda1   |
+  +-------------+
+
+Creating a new DM device that will be mapped on the same /dev/sda1::
+
+  +-------------+  +-----------+
+  | file system |  | dm-linear |
+  +-------------+  +-----------+
+           ||         ||
+           \/         \/
+         +---------------+
+         |   /dev/sda1   |
+         +---------------+
+
+Redirecting all bio requests for the /dev/sda1 device to the new dm
+device::
+
+  +-------------+
+  | file system |
+  +-------------+
+        ||
+        \/
+   +----------+    +-----------+
+   |  remap   | => | dm-linear |
+   +----------+    +-----------+
+                         ||
+                         \/
+                   +-----------+
+                   | /dev/sda1 |
+                   +-----------+
+
+To achieve this, you need to:
+
+Create new dm device with option 'noexcl'. It's allow to open
+underlying block-device without the FMODE_EXCL flag::
+
+  echo "0 `blockdev --getsz $1` linear $DEV 0 noexcl" | dmsetup create dm-noexcl
+
+Call remap command::
+
+  dmsetup remap start dm-noexcl $1
+
+Remap can be used to extend the functionality of dm-snap. This will allow
+to take snapshots from any block devices, not just logical volumes.
+
+Filter
+======
+
+Filter does not redirect the bio to another device. It does not create
+a clone of the bio request. After receiving the request, the filter can
+only add some processing, complete the bio request, or return the bio
+for further processing.
+
+Suppose we have a file system mounted on the block device /dev/sda1::
+
+  +-------------+
+  | file system |
+  +-------------+
+        ||
+        \/
+  +-------------+
+  | /dev/sda1   |
+  +-------------+
+
+Creating a new dm device that will implement filter::
+
+  +-------------+
+  | file system |
+  +-------------+
+        ||
+        \/
+    +--------+    +----------+
+    | filter | => | dm-delay |
+    +--------+    +----------+
+        ||
+        \/
+  +-------------+
+  | /dev/sda1   |
+  +-------------+
+
+Using filter we can change the behavior of debugging tools:
+ * dm-dust,
+ * dm-delay,
+ * dm-flakey,
+ * dm-verity.
+
+In the new version, they are will be able to change the behavior of any
+existing block device, without creating a new one.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] block: add blk_interposer
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 1/6] docs: device-mapper: add remap_and_filter Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  2021-02-03 16:18   ` Mike Snitzer
  2021-02-03 15:53 ` [PATCH v4 3/6] block: add blk_mq_is_queue_frozen() Sergei Shtepa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

blk_interposer allows to intercept bio requests, remap bio to another devices or add new bios.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 block/bio.c               |  2 +
 block/blk-core.c          | 33 ++++++++++++++++
 block/genhd.c             | 82 +++++++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h |  6 ++-
 include/linux/genhd.h     | 18 +++++++++
 5 files changed, 139 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..f6f135eb84b5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio_set_flag(bio, BIO_CLONED);
 	if (bio_flagged(bio_src, BIO_THROTTLED))
 		bio_set_flag(bio, BIO_THROTTLED);
+	if (bio_flagged(bio_src, BIO_INTERPOSED))
+		bio_set_flag(bio, BIO_INTERPOSED);
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_write_hint = bio_src->bi_write_hint;
diff --git a/block/blk-core.c b/block/blk-core.c
index 7663a9b94b80..c84bc42ba88b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1032,6 +1032,32 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 	return ret;
 }
 
+static blk_qc_t __submit_bio_interposed(struct bio *bio)
+{
+	struct bio_list bio_list[2] = { };
+	blk_qc_t ret = BLK_QC_T_NONE;
+
+	current->bio_list = bio_list;
+	if (likely(bio_queue_enter(bio) == 0)) {
+		struct gendisk *disk = bio->bi_disk;
+
+		if (likely(blk_has_interposer(disk))) {
+			bio_set_flag(bio, BIO_INTERPOSED);
+			disk->interposer->ip_submit_bio(bio);
+		} else /* interposer was removed */
+			bio_list_add(&current->bio_list[0], bio);
+
+		blk_queue_exit(disk->queue);
+	}
+	current->bio_list = NULL;
+
+	/* Resubmit remaining bios */
+	while ((bio = bio_list_pop(&bio_list[0])))
+		ret = submit_bio_noacct(bio);
+
+	return ret;
+}
+
 /**
  * submit_bio_noacct - re-submit a bio to the block device layer for I/O
  * @bio:  The bio describing the location in memory and on the device.
@@ -1057,6 +1083,13 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
 		return BLK_QC_T_NONE;
 	}
 
+	/*
+	 * Checking the BIO_INTERPOSED flag is necessary so that the bio
+	 * created by the blk_interposer do not get to it for processing.
+	 */
+	if (blk_has_interposer(bio->bi_disk) &&
+	    !bio_flagged(bio, BIO_INTERPOSED))
+		return __submit_bio_interposed(bio);
 	if (!bio->bi_disk->fops->submit_bio)
 		return __submit_bio_noacct_mq(bio);
 	return __submit_bio_noacct(bio);
diff --git a/block/genhd.c b/block/genhd.c
index 419548e92d82..39785a3ef703 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -30,6 +30,7 @@
 static struct kobject *block_depr;
 
 DECLARE_RWSEM(bdev_lookup_sem);
+DEFINE_MUTEX(bdev_interposer_mutex);
 
 /* for extended dynamic devt allocation, currently only one major is used */
 #define NR_EXT_DEVT		(1 << MINORBITS)
@@ -2148,3 +2149,84 @@ static void disk_release_events(struct gendisk *disk)
 	WARN_ON_ONCE(disk->ev && disk->ev->block != 1);
 	kfree(disk->ev);
 }
+
+/**
+ * blk_interposer_attach - Attach interposer to disk
+ * @disk: target disk
+ * @interposer: block device interposer
+ * @ip_submit_bio: hook for submit_bio()
+ *
+ * Returns:
+ *     -EINVAL if @interposer is NULL.
+ *     -EPERM if queue is not frozen.
+ *     -EBUSY if the block device already has @interposer.
+ *     -EALREADY if the block device already has @interposer with same callback.
+ *
+ * Disk must be frozen by blk_mq_freeze_queue().
+ */
+int blk_interposer_attach(struct gendisk *disk, struct blk_interposer *interposer,
+			  const ip_submit_bio_t ip_submit_bio)
+{
+	int ret = 0;
+
+	if (WARN_ON(!interposer))
+		return -EINVAL;
+
+	if (!blk_mq_is_queue_frozen(disk->queue))
+		return -EPERM;
+
+	mutex_lock(&bdev_interposer_mutex);
+	if (blk_has_interposer(disk)) {
+		if (disk->interposer->ip_submit_bio == ip_submit_bio)
+			ret = -EALREADY;
+		else
+			ret = -EBUSY;
+		goto out;
+	}
+
+	interposer->ip_submit_bio = ip_submit_bio;
+	interposer->disk = disk;
+
+	disk->interposer = interposer;
+out:
+	mutex_unlock(&bdev_interposer_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blk_interposer_attach);
+
+/**
+ * blk_interposer_detach - Detach interposer from disk
+ * @interposer: block device interposer
+ * @ip_submit_bio: hook for submit_bio()
+ *
+ * Disk must be frozen by blk_mq_freeze_queue().
+ */
+void blk_interposer_detach(struct blk_interposer *interposer,
+			  const ip_submit_bio_t ip_submit_bio)
+{
+	struct gendisk *disk;
+
+	if (WARN_ON(!interposer))
+		return;
+
+	mutex_lock(&bdev_interposer_mutex);
+
+	/* Check if the interposer is still active. */
+	disk = interposer->disk;
+	if (WARN_ON(!disk))
+		goto out;
+
+	if (WARN_ON(!blk_mq_is_queue_frozen(disk->queue)))
+		goto out;
+
+	/* Check if it is really our interposer. */
+	if (WARN_ON(disk->interposer->ip_submit_bio != ip_submit_bio))
+		goto out;
+
+	disk->interposer = NULL;
+	interposer->disk = NULL;
+out:
+	mutex_unlock(&bdev_interposer_mutex);
+}
+EXPORT_SYMBOL_GPL(blk_interposer_detach);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 866f74261b3b..6c1351d7b73f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -227,7 +227,7 @@ struct bio {
 						 * top bits REQ_OP. Use
 						 * accessors.
 						 */
-	unsigned short		bi_flags;	/* status, etc and bvec pool number */
+	unsigned int		bi_flags;	/* status, etc and bvec pool number */
 	unsigned short		bi_ioprio;
 	unsigned short		bi_write_hint;
 	blk_status_t		bi_status;
@@ -304,6 +304,8 @@ enum {
 				 * of this bio. */
 	BIO_CGROUP_ACCT,	/* has been accounted to a cgroup */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
+	BIO_INTERPOSED,		/* bio has been interposed and can be moved to
+				 * a different disk */
 	BIO_FLAG_LAST
 };
 
@@ -322,7 +324,7 @@ enum {
  * freed.
  */
 #define BVEC_POOL_BITS		(3)
-#define BVEC_POOL_OFFSET	(16 - BVEC_POOL_BITS)
+#define BVEC_POOL_OFFSET	(32 - BVEC_POOL_BITS)
 #define BVEC_POOL_IDX(bio)	((bio)->bi_flags >> BVEC_POOL_OFFSET)
 #if (1<< BVEC_POOL_BITS) < (BVEC_POOL_NR+1)
 # error "BVEC_POOL_BITS is too small"
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 809aaa32d53c..f68c8e83b4f1 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -134,6 +134,13 @@ struct blk_integrity {
 	unsigned char				tag_size;
 };
 
+typedef void (*ip_submit_bio_t) (struct bio *bio);
+
+struct blk_interposer {
+	ip_submit_bio_t ip_submit_bio;
+	struct gendisk *disk;
+};
+
 struct gendisk {
 	/* major, first_minor and minors are input parameters only,
 	 * don't use directly.  Use disk_devt() and disk_max_parts().
@@ -158,6 +165,7 @@ struct gendisk {
 
 	const struct block_device_operations *fops;
 	struct request_queue *queue;
+	struct blk_interposer *interposer;
 	void *private_data;
 
 	int flags;
@@ -346,4 +354,14 @@ static inline void printk_all_partitions(void)
 }
 #endif /* CONFIG_BLOCK */
 
+/*
+ * block layer interposer
+ */
+#define blk_has_interposer(d) ((d)->interposer != NULL)
+
+int blk_interposer_attach(struct gendisk *disk, struct blk_interposer *interposer,
+			  const ip_submit_bio_t ip_submit_bio);
+void blk_interposer_detach(struct blk_interposer *interposer,
+			   const ip_submit_bio_t ip_submit_bio);
+
 #endif /* _LINUX_GENHD_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] block: add blk_mq_is_queue_frozen()
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 1/6] docs: device-mapper: add remap_and_filter Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 2/6] block: add blk_interposer Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  2021-02-03 16:09   ` Mike Snitzer
  2021-02-03 15:53 ` [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD Sergei Shtepa
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

blk_mq_is_queue_frozen() allow to assert that the queue is frozen.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 block/blk-mq.c         | 13 +++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f285a9123a8b..924ec26fae5f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -161,6 +161,19 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
 
+
+bool blk_mq_is_queue_frozen(struct request_queue *q)
+{
+	bool ret;
+
+	mutex_lock(&q->mq_freeze_lock);
+	ret = percpu_ref_is_dying(&q->q_usage_counter) && percpu_ref_is_zero(&q->q_usage_counter);
+	mutex_unlock(&q->mq_freeze_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blk_mq_is_queue_frozen);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d705b174d346..9d1e8c4e922e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -525,6 +525,7 @@ void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout);
+bool blk_mq_is_queue_frozen(struct request_queue *q);
 
 int blk_mq_map_queues(struct blk_mq_queue_map *qmap);
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
                   ` (2 preceding siblings ...)
  2021-02-03 15:53 ` [PATCH v4 3/6] block: add blk_mq_is_queue_frozen() Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  2021-02-03 16:00   ` Greg KH
  2021-02-03 15:53 ` [PATCH v4 5/6] dm: add 'noexcl' option for dm-linear Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 6/6] docs: device-mapper: " Sergei Shtepa
  5 siblings, 1 reply; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

New ioctl DM_DEV_REMAP_CMD allow to remap bio requests
from regular block device to dm device.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 drivers/md/dm-core.h          |  20 ++
 drivers/md/dm-ioctl.c         |  39 ++++
 drivers/md/dm.c               | 376 +++++++++++++++++++++++++++++++++-
 include/uapi/linux/dm-ioctl.h |  15 +-
 4 files changed, 438 insertions(+), 12 deletions(-)

diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 086d293c2b03..7ef4c44609dc 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -13,6 +13,7 @@
 #include <linux/ktime.h>
 #include <linux/genhd.h>
 #include <linux/blk-mq.h>
+#include <linux/rbtree.h>
 
 #include <trace/events/block.h>
 
@@ -109,6 +110,9 @@ struct mapped_device {
 	bool init_tio_pdu:1;
 
 	struct srcu_struct io_barrier;
+
+	/* interposer device for remap */
+	struct dm_interposed_dev *ip_dev;
 };
 
 void disable_discard(struct mapped_device *md);
@@ -164,6 +168,22 @@ struct dm_table {
 	struct dm_md_mempools *mempools;
 };
 
+struct dm_rb_range {
+	struct rb_node node;
+	sector_t start;		/* start sector of rb node */
+	sector_t last;		/* end sector of rb node */
+	sector_t _subtree_last; /* highest sector in subtree of rb node */
+};
+
+void dm_rb_insert(struct dm_rb_range *node, struct rb_root_cached *root);
+void dm_rb_remove(struct dm_rb_range *node, struct rb_root_cached *root);
+
+struct dm_rb_range *dm_rb_iter_first(struct rb_root_cached *root, sector_t start, sector_t last);
+struct dm_rb_range *dm_rb_iter_next(struct dm_rb_range *node, sector_t start, sector_t last);
+
+int dm_remap_install(struct mapped_device *md, const char *donor_device_name);
+int dm_remap_uninstall(struct mapped_device *md);
+
 static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
 {
 	return &container_of(kobj, struct dm_kobject_holder, kobj)->completion;
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5e306bba4375..2944d442c256 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1649,6 +1649,44 @@ static int target_message(struct file *filp, struct dm_ioctl *param, size_t para
 	return r;
 }
 
+static inline int dev_remap_start(struct mapped_device *md, uint8_t *params)
+{
+	char *donor_device_name = (char *)params;
+
+	return dm_remap_install(md, donor_device_name);
+}
+static int dev_remap_finish(struct mapped_device *md)
+{
+	return dm_remap_uninstall(md);
+}
+
+static int dev_remap(struct file *filp, struct dm_ioctl *param, size_t param_size)
+{
+	int ret = 0;
+	struct mapped_device *md;
+	void *bin_data;
+	struct dm_remap_param *remap_param;
+
+	md = find_device(param);
+	if (!md)
+		return -ENXIO;
+
+	bin_data = (void *)(param) + param->data_start;
+	remap_param = bin_data;
+
+	if (remap_param->cmd == REMAP_START_CMD)
+		ret = dev_remap_start(md, remap_param->params);
+	else if (remap_param->cmd == REMAP_FINISH_CMD)
+		ret = dev_remap_finish(md);
+	else {
+		DMWARN("Invalid remap command, %d", remap_param->cmd);
+		ret = -EINVAL;
+	}
+
+	dm_put(md);
+	return ret;
+}
+
 /*
  * The ioctl parameter block consists of two parts, a dm_ioctl struct
  * followed by a data buffer.  This flag is set if the second part,
@@ -1691,6 +1729,7 @@ static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags)
 		{DM_DEV_SET_GEOMETRY_CMD, 0, dev_set_geometry},
 		{DM_DEV_ARM_POLL, IOCTL_FLAGS_NO_PARAMS, dev_arm_poll},
 		{DM_GET_TARGET_VERSION, 0, get_target_version},
+		{DM_DEV_REMAP_CMD, 0, dev_remap},
 	};
 
 	if (unlikely(cmd >= ARRAY_SIZE(_ioctls)))
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7bac564f3faa..0b184dfec233 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -28,6 +28,7 @@
 #include <linux/refcount.h>
 #include <linux/part_stat.h>
 #include <linux/blk-crypto.h>
+#include <linux/interval_tree_generic.h>
 
 #define DM_MSG_PREFIX "core"
 
@@ -56,6 +57,8 @@ static struct workqueue_struct *deferred_remove_workqueue;
 atomic_t dm_global_event_nr = ATOMIC_INIT(0);
 DECLARE_WAIT_QUEUE_HEAD(dm_global_eventq);
 
+static DEFINE_MUTEX(dm_interposer_attach_lock);
+
 void dm_issue_global_event(void)
 {
 	atomic_inc(&dm_global_event_nr);
@@ -162,6 +165,37 @@ struct table_device {
 	struct dm_dev dm_dev;
 };
 
+/*
+ * Device mapper's interposer.
+ */
+struct dm_interposer {
+	struct blk_interposer blk_ip;
+	struct mapped_device *md;
+
+	struct kref kref;
+	struct rw_semaphore ip_devs_lock;
+	struct rb_root_cached ip_devs_root; /* dm_interposed_dev tree */
+};
+
+typedef void (*dm_interpose_bio_t) (void *context, struct dm_rb_range *node,  struct bio *bio);
+
+struct dm_interposed_dev {
+	struct gendisk *disk;
+	struct dm_rb_range node;
+	void *context;
+	dm_interpose_bio_t dm_interpose_bio;
+
+	atomic64_t ip_cnt; /*for debug purpose*/
+};
+
+/*
+ * Interval tree for device mapper
+ */
+#define START(node) ((node)->start)
+#define LAST(node) ((node)->last)
+INTERVAL_TREE_DEFINE(struct dm_rb_range, node, sector_t, _subtree_last,
+		     START, LAST,, dm_rb);
+
 /*
  * Bio-based DM's mempools' reserved IOs set by the user.
  */
@@ -733,28 +767,342 @@ static void dm_put_live_table_fast(struct mapped_device *md) __releases(RCU)
 	rcu_read_unlock();
 }
 
+static void dm_submit_bio_interposer_fn(struct bio *bio)
+{
+	struct dm_interposer *ip;
+	unsigned int noio_flag = 0;
+	sector_t start;
+	sector_t last;
+	struct dm_rb_range *node;
+
+	ip = container_of(bio->bi_disk->interposer, struct dm_interposer, blk_ip);
+	start = bio->bi_iter.bi_sector;
+	last = start + dm_sector_div_up(bio->bi_iter.bi_size, SECTOR_SIZE);
+
+	noio_flag = memalloc_noio_save();
+	down_read(&ip->ip_devs_lock);
+	node = dm_rb_iter_first(&ip->ip_devs_root, start, last);
+	while (node) {
+		struct dm_interposed_dev *ip_dev =
+			container_of(node, struct dm_interposed_dev, node);
+
+		atomic64_inc(&ip_dev->ip_cnt);
+		ip_dev->dm_interpose_bio(ip_dev->context, node, bio);
+
+		node = dm_rb_iter_next(node, start, last);
+	}
+	up_read(&ip->ip_devs_lock);
+	memalloc_noio_restore(noio_flag);
+}
+
+static void dm_interposer_free(struct kref *kref)
+{
+	struct dm_interposer *ip = container_of(kref, struct dm_interposer, kref);
+
+	blk_interposer_detach(&ip->blk_ip, dm_submit_bio_interposer_fn);
+
+	kfree(ip);
+}
+
+static struct dm_interposer *dm_interposer_new(struct gendisk *disk)
+{
+	int ret = 0;
+	struct dm_interposer *ip;
+
+	ip = kzalloc(sizeof(struct dm_interposer), GFP_NOIO);
+	if (!ip)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&ip->kref);
+	init_rwsem(&ip->ip_devs_lock);
+	ip->ip_devs_root = RB_ROOT_CACHED;
+
+	ret = blk_interposer_attach(disk, &ip->blk_ip, dm_submit_bio_interposer_fn);
+	if (ret) {
+		DMERR("Failed to attack blk_interposer");
+		kref_put(&ip->kref, dm_interposer_free);
+		return ERR_PTR(ret);
+	}
+
+	return ip;
+}
+
+static struct dm_interposer *dm_interposer_get(struct gendisk *disk)
+{
+	struct dm_interposer *ip;
+
+	if (!blk_has_interposer(disk))
+		return NULL;
+
+	if (disk->interposer->ip_submit_bio != dm_submit_bio_interposer_fn) {
+		DMERR("Disks interposer slot already occupied.");
+		return ERR_PTR(-EBUSY);
+	}
+
+	ip = container_of(disk->interposer, struct dm_interposer, blk_ip);
+
+	kref_get(&ip->kref);
+	return ip;
+}
+
+static struct dm_interposed_dev *dm_interposer_new_dev(struct gendisk *disk,
+						       sector_t ofs, sector_t len,
+						       void *context,
+						       dm_interpose_bio_t dm_interpose_bio)
+{
+	sector_t start = ofs;
+	sector_t last =  ofs + len - 1;
+	struct dm_interposed_dev *ip_dev = NULL;
+
+	/* Allocate new ip_dev */
+	ip_dev = kzalloc(sizeof(struct dm_interposed_dev), GFP_KERNEL);
+	if (!ip_dev)
+		return NULL;
+
+	ip_dev->disk = disk;
+	ip_dev->node.start = start;
+	ip_dev->node.last = last;
+
+	ip_dev->context = context;
+	ip_dev->dm_interpose_bio = dm_interpose_bio;
+
+	atomic64_set(&ip_dev->ip_cnt, 0);
+
+	return ip_dev;
+}
+
+static inline void dm_interposer_free_dev(struct dm_interposed_dev *ip_dev)
+{
+	kfree(ip_dev);
+}
+
+static inline void dm_disk_freeze(struct gendisk *disk)
+{
+	blk_mq_freeze_queue(disk->queue);
+	blk_mq_quiesce_queue(disk->queue);
+}
+
+static inline void dm_disk_unfreeze(struct gendisk *disk)
+{
+	blk_mq_unquiesce_queue(disk->queue);
+	blk_mq_unfreeze_queue(disk->queue);
+}
+
+static int dm_interposer_attach_dev(struct dm_interposed_dev *ip_dev)
+{
+	int ret = 0;
+	struct dm_interposer *ip = NULL;
+	unsigned int noio_flag = 0;
+
+	if (!ip_dev)
+		return -EINVAL;
+
+	dm_disk_freeze(ip_dev->disk);
+	mutex_lock(&dm_interposer_attach_lock);
+	noio_flag = memalloc_noio_save();
+
+	ip = dm_interposer_get(ip_dev->disk);
+	if (ip == NULL)
+		ip = dm_interposer_new(ip_dev->disk);
+	if (IS_ERR(ip)) {
+		ret = PTR_ERR(ip);
+		goto out;
+	}
+
+	/* Attach dm_interposed_dev to dm_interposer */
+	down_write(&ip->ip_devs_lock);
+	do {
+		struct dm_rb_range *node;
+
+		/* checking that ip_dev already exists for this region */
+		node = dm_rb_iter_first(&ip->ip_devs_root, ip_dev->node.start, ip_dev->node.last);
+		if (node) {
+			DMERR("Disk part form [%llu] to [%llu] already have interposer",
+			      node->start, node->last);
+
+			ret = -EBUSY;
+			break;
+		}
+
+		/* insert ip_dev to ip tree */
+		dm_rb_insert(&ip_dev->node, &ip->ip_devs_root);
+		/* increment ip reference counter */
+		kref_get(&ip->kref);
+	} while (false);
+	up_write(&ip->ip_devs_lock);
+
+	kref_put(&ip->kref, dm_interposer_free);
+
+out:
+	memalloc_noio_restore(noio_flag);
+	mutex_unlock(&dm_interposer_attach_lock);
+	dm_disk_unfreeze(ip_dev->disk);
+
+	return ret;
+}
+
+static int dm_interposer_detach_dev(struct dm_interposed_dev *ip_dev)
+{
+	int ret = 0;
+	struct dm_interposer *ip = NULL;
+	unsigned int noio_flag = 0;
+
+	if (!ip_dev)
+		return -EINVAL;
+
+	dm_disk_freeze(ip_dev->disk);
+	mutex_lock(&dm_interposer_attach_lock);
+	noio_flag = memalloc_noio_save();
+
+	ip = dm_interposer_get(ip_dev->disk);
+	if (IS_ERR(ip)) {
+		ret = PTR_ERR(ip);
+		DMERR("Interposer not found");
+		goto out;
+	}
+	if (unlikely(ip == NULL)) {
+		ret = -ENXIO;
+		DMERR("Interposer not found");
+		goto out;
+	}
+
+	down_write(&ip->ip_devs_lock);
+	do {
+		dm_rb_remove(&ip_dev->node, &ip->ip_devs_root);
+		/* the reference counter here cannot be zero */
+		kref_put(&ip->kref, dm_interposer_free);
+
+	} while (false);
+	up_write(&ip->ip_devs_lock);
+
+	/* detach and free interposer if it's not needed */
+	kref_put(&ip->kref, dm_interposer_free);
+out:
+	memalloc_noio_restore(noio_flag);
+	mutex_unlock(&dm_interposer_attach_lock);
+	dm_disk_unfreeze(ip_dev->disk);
+
+	return ret;
+}
+
+static void dm_remap_fn(void *context, struct dm_rb_range *node, struct bio *bio)
+{
+	struct mapped_device *md = context;
+
+	/* Set acceptor device. */
+	bio->bi_disk = md->disk;
+
+	/* Remap disks offset */
+	bio->bi_iter.bi_sector -= node->start;
+
+	/*
+	 * bio should be resubmitted.
+	 * We can just add bio to bio_list of the current process.
+	 * current->bio_list must be initialized when this function is called.
+	 * If call submit_bio_noacct(), the bio will be checked twice.
+	 */
+	BUG_ON(!current->bio_list);
+	bio_list_add(&current->bio_list[0], bio);
+}
+
+int dm_remap_install(struct mapped_device *md, const char *donor_device_name)
+{
+	int ret = 0;
+	struct block_device *donor_bdev;
+	fmode_t mode = FMODE_READ | FMODE_WRITE;
+
+	DMDEBUG("Dm remap install for mapped device %s and donor device %s",
+		md->name, donor_device_name);
+
+	donor_bdev = blkdev_get_by_path(donor_device_name, mode, "device-mapper remap");
+	if (IS_ERR(donor_bdev)) {
+		DMERR("Cannot open device [%s]", donor_device_name);
+		return PTR_ERR(donor_bdev);
+	}
+
+	do {
+		sector_t ofs = get_start_sect(donor_bdev);
+		sector_t len = bdev_nr_sectors(donor_bdev);
+
+		md->ip_dev = dm_interposer_new_dev(donor_bdev->bd_disk, ofs, len, md, dm_remap_fn);
+		if (!md->ip_dev) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		DMDEBUG("New interposed device 0x%p", md->ip_dev);
+		ret = dm_interposer_attach_dev(md->ip_dev);
+		if (ret) {
+			dm_interposer_free_dev(md->ip_dev);
+
+			md->ip_dev = NULL;
+			DMERR("Failed to attach dm interposer");
+			break;
+		}
+
+		DMDEBUG("Attached successfully.");
+	} while (false);
+
+	blkdev_put(donor_bdev, mode);
+
+	return ret;
+}
+
+int dm_remap_uninstall(struct mapped_device *md)
+{
+	int ret = 0;
+
+	DMDEBUG("Dm remap uninstall for mapped device %s ip_dev=0x%p", md->name, md->ip_dev);
+
+	if (!md->ip_dev) {
+		DMERR("Cannot detach dm interposer");
+		return -EINVAL;
+	}
+
+	ret = dm_interposer_detach_dev(md->ip_dev);
+	if (ret) {
+		DMERR("Failed to detach dm interposer");
+		return ret;
+	}
+
+	DMDEBUG("Detached successfully. %llu bios was interposed",
+		atomic64_read(&md->ip_dev->ip_cnt));
+	dm_interposer_free_dev(md->ip_dev);
+	md->ip_dev = NULL;
+
+	return 0;
+}
+
 static char *_dm_claim_ptr = "I belong to device-mapper";
 
 /*
  * Open a table device so we can use it as a map destination.
  */
 static int open_table_device(struct table_device *td, dev_t dev,
-			     struct mapped_device *md)
+			     struct mapped_device *md, bool non_exclusive)
 {
 	struct block_device *bdev;
-
-	int r;
+	int ret;
 
 	BUG_ON(td->dm_dev.bdev);
 
-	bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
-	if (IS_ERR(bdev))
-		return PTR_ERR(bdev);
+	if (non_exclusive)
+		bdev = blkdev_get_by_dev(dev, td->dm_dev.mode, NULL);
+	else
+		bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
 
-	r = bd_link_disk_holder(bdev, dm_disk(md));
-	if (r) {
-		blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
-		return r;
+	if (IS_ERR(bdev)) {
+		ret = PTR_ERR(bdev);
+		if (ret != -EBUSY)
+			return ret;
+	}
+
+	if (!non_exclusive) {
+		ret = bd_link_disk_holder(bdev, dm_disk(md));
+		if (ret) {
+			blkdev_put(bdev, td->dm_dev.mode);
+			return ret;
+		}
 	}
 
 	td->dm_dev.bdev = bdev;
@@ -2182,6 +2530,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
 
 	might_sleep();
 
+	if (md->ip_dev) {
+		if (dm_interposer_detach_dev(md->ip_dev))
+			DMERR("Failed to detach dm interposer");
+
+		dm_interposer_free_dev(md->ip_dev);
+		md->ip_dev = NULL;
+	}
+
 	spin_lock(&_minor_lock);
 	idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
 	set_bit(DMF_FREEING, &md->flags);
diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
index 4933b6b67b85..08d7dbff80f4 100644
--- a/include/uapi/linux/dm-ioctl.h
+++ b/include/uapi/linux/dm-ioctl.h
@@ -214,6 +214,15 @@ struct dm_target_msg {
 	char message[0];
 };
 
+enum {
+	REMAP_START_CMD = 1,
+	REMAP_FINISH_CMD,
+};
+
+struct dm_remap_param {
+	uint8_t cmd;
+	uint8_t params[0];
+};
 /*
  * If you change this make sure you make the corresponding change
  * to dm-ioctl.c:lookup_ioctl()
@@ -244,6 +253,7 @@ enum {
 	DM_DEV_SET_GEOMETRY_CMD,
 	DM_DEV_ARM_POLL_CMD,
 	DM_GET_TARGET_VERSION_CMD,
+	DM_DEV_REMAP_CMD
 };
 
 #define DM_IOCTL 0xfd
@@ -259,6 +269,7 @@ enum {
 #define DM_DEV_STATUS    _IOWR(DM_IOCTL, DM_DEV_STATUS_CMD, struct dm_ioctl)
 #define DM_DEV_WAIT      _IOWR(DM_IOCTL, DM_DEV_WAIT_CMD, struct dm_ioctl)
 #define DM_DEV_ARM_POLL  _IOWR(DM_IOCTL, DM_DEV_ARM_POLL_CMD, struct dm_ioctl)
+#define DM_DEV_REMAP     _IOWR(DM_IOCTL, DM_DEV_REMAP_CMD, struct dm_ioctl)
 
 #define DM_TABLE_LOAD    _IOWR(DM_IOCTL, DM_TABLE_LOAD_CMD, struct dm_ioctl)
 #define DM_TABLE_CLEAR   _IOWR(DM_IOCTL, DM_TABLE_CLEAR_CMD, struct dm_ioctl)
@@ -272,9 +283,9 @@ enum {
 #define DM_DEV_SET_GEOMETRY	_IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
 
 #define DM_VERSION_MAJOR	4
-#define DM_VERSION_MINOR	43
+#define DM_VERSION_MINOR	44
 #define DM_VERSION_PATCHLEVEL	0
-#define DM_VERSION_EXTRA	"-ioctl (2020-10-01)"
+#define DM_VERSION_EXTRA	"-ioctl (2020-12-25)"
 
 /* Status bits */
 #define DM_READONLY_FLAG	(1 << 0) /* In/Out */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] dm: add 'noexcl' option for dm-linear
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
                   ` (3 preceding siblings ...)
  2021-02-03 15:53 ` [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  2021-02-03 15:53 ` [PATCH v4 6/6] docs: device-mapper: " Sergei Shtepa
  5 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

The 'noexcl' option allow to open underlying block-device
without FMODE_EXCL.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 drivers/md/dm-linear.c        | 14 +++++++++++++-
 drivers/md/dm-table.c         | 12 +++++++-----
 drivers/md/dm.c               | 24 +++++++++++++++++-------
 drivers/md/dm.h               |  2 +-
 include/linux/device-mapper.h |  7 +++++++
 5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 00774b5d7668..b16d89802b9d 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -33,7 +33,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	char dummy;
 	int ret;
 
-	if (argc != 2) {
+	if ((argc < 2) || (argc > 3)) {
 		ti->error = "Invalid argument count";
 		return -EINVAL;
 	}
@@ -51,6 +51,18 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 	lc->start = tmp;
 
+	ti->non_exclusive = false;
+	if (argc > 2) {
+		if (strcmp("noexcl", argv[2]) == 0)
+			ti->non_exclusive = true;
+		else if (strcmp("excl", argv[2]) == 0)
+			ti->non_exclusive = false;
+		else {
+			ti->error = "Invalid exclusive option";
+			return -EINVAL;
+		}
+	}
+
 	ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &lc->dev);
 	if (ret) {
 		ti->error = "Device lookup failed";
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 4acf2342f7ad..f15bc2171f25 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -322,7 +322,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
  * device and not to touch the existing bdev field in case
  * it is accessed concurrently.
  */
-static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
+static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode, bool non_exclusive,
 			struct mapped_device *md)
 {
 	int r;
@@ -330,7 +330,7 @@ static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
 
 	old_dev = dd->dm_dev;
 
-	r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev,
+	r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, non_exclusive,
 				dd->dm_dev->mode | new_mode, &new_dev);
 	if (r)
 		return r;
@@ -387,7 +387,8 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 		if (!dd)
 			return -ENOMEM;
 
-		if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) {
+		r = dm_get_table_device(t->md, dev, mode, ti->non_exclusive, &dd->dm_dev);
+		if (r) {
 			kfree(dd);
 			return r;
 		}
@@ -396,8 +397,9 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
 		list_add(&dd->list, &t->devices);
 		goto out;
 
-	} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
-		r = upgrade_mode(dd, mode, t->md);
+	} else if ((dd->dm_dev->mode != (mode | dd->dm_dev->mode)) &&
+		   (dd->dm_dev->non_exclusive != ti->non_exclusive)) {
+		r = upgrade_mode(dd, mode, ti->non_exclusive, t->md);
 		if (r)
 			return r;
 	}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0b184dfec233..c34c75467399 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1107,6 +1107,7 @@ static int open_table_device(struct table_device *td, dev_t dev,
 
 	td->dm_dev.bdev = bdev;
 	td->dm_dev.dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
+	td->dm_dev.non_exclusive = non_exclusive;
 	return 0;
 }
 
@@ -1118,33 +1119,41 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 	if (!td->dm_dev.bdev)
 		return;
 
-	bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
-	blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+	if (td->dm_dev.non_exclusive)
+		blkdev_put(td->dm_dev.bdev, td->dm_dev.mode);
+	else {
+		bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
+		blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+	}
+
 	put_dax(td->dm_dev.dax_dev);
 	td->dm_dev.bdev = NULL;
 	td->dm_dev.dax_dev = NULL;
+	td->dm_dev.non_exclusive = false;
 }
 
 static struct table_device *find_table_device(struct list_head *l, dev_t dev,
-					      fmode_t mode)
+					      fmode_t mode, bool non_exclusive)
 {
 	struct table_device *td;
 
 	list_for_each_entry(td, l, list)
-		if (td->dm_dev.bdev->bd_dev == dev && td->dm_dev.mode == mode)
+		if (td->dm_dev.bdev->bd_dev == dev &&
+		    td->dm_dev.mode == mode &&
+		    td->dm_dev.non_exclusive == non_exclusive)
 			return td;
 
 	return NULL;
 }
 
-int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, bool non_exclusive,
 			struct dm_dev **result)
 {
 	int r;
 	struct table_device *td;
 
 	mutex_lock(&md->table_devices_lock);
-	td = find_table_device(&md->table_devices, dev, mode);
+	td = find_table_device(&md->table_devices, dev, mode, non_exclusive);
 	if (!td) {
 		td = kmalloc_node(sizeof(*td), GFP_KERNEL, md->numa_node_id);
 		if (!td) {
@@ -1155,7 +1164,8 @@ int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
 		td->dm_dev.mode = mode;
 		td->dm_dev.bdev = NULL;
 
-		if ((r = open_table_device(td, dev, md))) {
+		r = open_table_device(td, dev, md, non_exclusive);
+		if (r) {
 			mutex_unlock(&md->table_devices_lock);
 			kfree(td);
 			return r;
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index fffe1e289c53..7bf20fb2de74 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -179,7 +179,7 @@ int dm_open_count(struct mapped_device *md);
 int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only_deferred);
 int dm_cancel_deferred_remove(struct mapped_device *md);
 int dm_request_based(struct mapped_device *md);
-int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
+int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode, bool non_exclusive,
 			struct dm_dev **result);
 void dm_put_table_device(struct mapped_device *md, struct dm_dev *d);
 
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 61a66fb8ebb3..70002363bfc0 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -150,6 +150,7 @@ struct dm_dev {
 	struct block_device *bdev;
 	struct dax_device *dax_dev;
 	fmode_t mode;
+	bool non_exclusive;
 	char name[16];
 };
 
@@ -325,6 +326,12 @@ struct dm_target {
 	 * whether or not its underlying devices have support.
 	 */
 	bool discards_supported:1;
+
+	/*
+	 * Set if this target needs to open device without FMODE_EXCL
+	 * mode.
+	 */
+	bool non_exclusive:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] docs: device-mapper: 'noexcl' option for dm-linear
  2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
                   ` (4 preceding siblings ...)
  2021-02-03 15:53 ` [PATCH v4 5/6] dm: add 'noexcl' option for dm-linear Sergei Shtepa
@ 2021-02-03 15:53 ` Sergei Shtepa
  5 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-03 15:53 UTC (permalink / raw)
  To: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel
  Cc: sergei.shtepa, pavgel.tide

New 'noexcl' option allow to open underlying block-device
without FMODE_EXCL flag.

Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
---
 .../admin-guide/device-mapper/linear.rst      | 26 ++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/device-mapper/linear.rst b/Documentation/admin-guide/device-mapper/linear.rst
index 9d17fc6e64a9..f035cd7ad78c 100644
--- a/Documentation/admin-guide/device-mapper/linear.rst
+++ b/Documentation/admin-guide/device-mapper/linear.rst
@@ -6,12 +6,22 @@ Device-Mapper's "linear" target maps a linear range of the Device-Mapper
 device onto a linear range of another device.  This is the basic building
 block of logical volume managers.
 
-Parameters: <dev path> <offset>
+Parameters: <dev path> <offset> [<options>]
     <dev path>:
-	Full pathname to the underlying block-device, or a
+        Full pathname to the underlying block-device, or a
         "major:minor" device-number.
     <offset>:
-	Starting sector within the device.
+        Starting sector within the device.
+    <options>:
+        Options allow to set the exclusivity mode. The exclusivity mode
+        can be 'excl' and 'noexcl'. By default, then options is not set,
+        the 'excl' mode is used. 'noexcl' mode allow to open device
+        without FMODE_EXCL flag. This allow to create liner device with
+        underlying block-device that are already used by the system. For
+        example, the file system on this device is already mounted.
+        The 'noexcl' option should be used when creating dm devices that
+        will be used as acceptor when connecting the device mapper to an
+        existing block device with the 'dmsetup remap' command.
 
 
 Example scripts
@@ -61,3 +71,13 @@ Example scripts
   }
 
   `echo \"$table\" | dmsetup create $name`;
+
+::
+
+  #!/bin/sh
+  # Create linear device and remap all requests from the original device
+  # to new linear.
+  DEV=$1
+
+  echo "0 `blockdev --getsz $DEV` linear $DEV 0 noexcl" | dmsetup create dm-noexcl
+  dmsetup remap start dm-noexcl $DEV
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD
  2021-02-03 15:53 ` [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD Sergei Shtepa
@ 2021-02-03 16:00   ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2021-02-03 16:00 UTC (permalink / raw)
  To: Sergei Shtepa
  Cc: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel, pavgel.tide

On Wed, Feb 03, 2021 at 06:53:56PM +0300, Sergei Shtepa wrote:
> --- a/include/uapi/linux/dm-ioctl.h
> +++ b/include/uapi/linux/dm-ioctl.h
> @@ -214,6 +214,15 @@ struct dm_target_msg {
>  	char message[0];
>  };
>  
> +enum {
> +	REMAP_START_CMD = 1,
> +	REMAP_FINISH_CMD,

Don't you need to say what REMAP_FINISH_CMD is explicitly?

> +};
> +
> +struct dm_remap_param {
> +	uint8_t cmd;
> +	uint8_t params[0];

These need to be __u8, "uint8_t" is not a valid type that crosses the
user/kernel boundry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/6] block: add blk_mq_is_queue_frozen()
  2021-02-03 15:53 ` [PATCH v4 3/6] block: add blk_mq_is_queue_frozen() Sergei Shtepa
@ 2021-02-03 16:09   ` Mike Snitzer
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2021-02-03 16:09 UTC (permalink / raw)
  To: Sergei Shtepa
  Cc: Damien.LeMoal, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel, pavgel.tide

On Wed, Feb 03 2021 at 10:53am -0500,
Sergei Shtepa <sergei.shtepa@veeam.com> wrote:

> blk_mq_is_queue_frozen() allow to assert that the queue is frozen.
> 
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> ---
>  block/blk-mq.c         | 13 +++++++++++++
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f285a9123a8b..924ec26fae5f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -161,6 +161,19 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
>  
> +
> +bool blk_mq_is_queue_frozen(struct request_queue *q)
> +{
> +	bool ret;
> +
> +	mutex_lock(&q->mq_freeze_lock);
> +	ret = percpu_ref_is_dying(&q->q_usage_counter) && percpu_ref_is_zero(&q->q_usage_counter);
> +	mutex_unlock(&q->mq_freeze_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_is_queue_frozen);
> +
>  /*
>   * Guarantee no request is in use, so we can change any data structure of
>   * the queue afterward.
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index d705b174d346..9d1e8c4e922e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -525,6 +525,7 @@ void blk_freeze_queue_start(struct request_queue *q);
>  void blk_mq_freeze_queue_wait(struct request_queue *q);
>  int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
>  				     unsigned long timeout);
> +bool blk_mq_is_queue_frozen(struct request_queue *q);
>  
>  int blk_mq_map_queues(struct blk_mq_queue_map *qmap);
>  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
> -- 
> 2.20.1
> 

This needs to come before patch 2 (since patch 2 uses it).

Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/6] block: add blk_interposer
  2021-02-03 15:53 ` [PATCH v4 2/6] block: add blk_interposer Sergei Shtepa
@ 2021-02-03 16:18   ` Mike Snitzer
  2021-02-04 10:06     ` Sergei Shtepa
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2021-02-03 16:18 UTC (permalink / raw)
  To: Sergei Shtepa
  Cc: Damien.LeMoal, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel, pavgel.tide

On Wed, Feb 03 2021 at 10:53am -0500,
Sergei Shtepa <sergei.shtepa@veeam.com> wrote:

> blk_interposer allows to intercept bio requests, remap bio to another devices or add new bios.
> 
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> ---
>  block/bio.c               |  2 +
>  block/blk-core.c          | 33 ++++++++++++++++
>  block/genhd.c             | 82 +++++++++++++++++++++++++++++++++++++++
>  include/linux/blk_types.h |  6 ++-
>  include/linux/genhd.h     | 18 +++++++++
>  5 files changed, 139 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..f6f135eb84b5 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
>  	bio_set_flag(bio, BIO_CLONED);
>  	if (bio_flagged(bio_src, BIO_THROTTLED))
>  		bio_set_flag(bio, BIO_THROTTLED);
> +	if (bio_flagged(bio_src, BIO_INTERPOSED))
> +		bio_set_flag(bio, BIO_INTERPOSED);
>  	bio->bi_opf = bio_src->bi_opf;
>  	bio->bi_ioprio = bio_src->bi_ioprio;
>  	bio->bi_write_hint = bio_src->bi_write_hint;
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 7663a9b94b80..c84bc42ba88b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1032,6 +1032,32 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
>  	return ret;
>  }
>  
> +static blk_qc_t __submit_bio_interposed(struct bio *bio)
> +{
> +	struct bio_list bio_list[2] = { };
> +	blk_qc_t ret = BLK_QC_T_NONE;
> +
> +	current->bio_list = bio_list;
> +	if (likely(bio_queue_enter(bio) == 0)) {
> +		struct gendisk *disk = bio->bi_disk;
> +
> +		if (likely(blk_has_interposer(disk))) {
> +			bio_set_flag(bio, BIO_INTERPOSED);
> +			disk->interposer->ip_submit_bio(bio);
> +		} else /* interposer was removed */
> +			bio_list_add(&current->bio_list[0], bio);

style nit:

} else {
	/* interposer was removed */
	bio_list_add(&current->bio_list[0], bio);
}

> +
> +		blk_queue_exit(disk->queue);
> +	}
> +	current->bio_list = NULL;
> +
> +	/* Resubmit remaining bios */
> +	while ((bio = bio_list_pop(&bio_list[0])))
> +		ret = submit_bio_noacct(bio);
> +
> +	return ret;
> +}
> +
>  /**
>   * submit_bio_noacct - re-submit a bio to the block device layer for I/O
>   * @bio:  The bio describing the location in memory and on the device.
> @@ -1057,6 +1083,13 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
>  		return BLK_QC_T_NONE;
>  	}
>  
> +	/*
> +	 * Checking the BIO_INTERPOSED flag is necessary so that the bio
> +	 * created by the blk_interposer do not get to it for processing.
> +	 */
> +	if (blk_has_interposer(bio->bi_disk) &&
> +	    !bio_flagged(bio, BIO_INTERPOSED))
> +		return __submit_bio_interposed(bio);
>  	if (!bio->bi_disk->fops->submit_bio)
>  		return __submit_bio_noacct_mq(bio);
>  	return __submit_bio_noacct(bio);
> diff --git a/block/genhd.c b/block/genhd.c
> index 419548e92d82..39785a3ef703 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -30,6 +30,7 @@
>  static struct kobject *block_depr;
>  
>  DECLARE_RWSEM(bdev_lookup_sem);
> +DEFINE_MUTEX(bdev_interposer_mutex);

Seems you're using this mutex to protect access to disk->interposer in
attach/detach.  This is to prevent attach/detach races to same device?

Thankfully attach/detach isn't in the bio submission fast path but it'd
be helpful to document what this mutex is protecting).

A storm of attach or detach will all hit this global mutex though...

Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/6] docs: device-mapper: add remap_and_filter
  2021-02-03 15:53 ` [PATCH v4 1/6] docs: device-mapper: add remap_and_filter Sergei Shtepa
@ 2021-02-04  4:15   ` Randy Dunlap
  2021-02-04  9:44     ` Sergei Shtepa
  0 siblings, 1 reply; 13+ messages in thread
From: Randy Dunlap @ 2021-02-04  4:15 UTC (permalink / raw)
  To: Sergei Shtepa, Damien.LeMoal, snitzer, hare, ming.lei, agk,
	corbet, axboe, jack, johannes.thumshirn, gregkh, koct9i, steve,
	dm-devel, linux-block, linux-doc, linux-kernel
  Cc: pavgel.tide

On 2/3/21 7:53 AM, Sergei Shtepa wrote:
> remap_and_filter - describes the new features that
> blk_interposer provides for device mapper.
> 
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>

Hi--
Please see comments below.


> ---
>  .../admin-guide/device-mapper/index.rst       |   1 +
>  .../device-mapper/remap_and_filter.rst        | 132 ++++++++++++++++++
>  2 files changed, 133 insertions(+)
>  create mode 100644 Documentation/admin-guide/device-mapper/remap_and_filter.rst

> diff --git a/Documentation/admin-guide/device-mapper/remap_and_filter.rst b/Documentation/admin-guide/device-mapper/remap_and_filter.rst
> new file mode 100644
> index 000000000000..616b67998305
> --- /dev/null
> +++ b/Documentation/admin-guide/device-mapper/remap_and_filter.rst
> @@ -0,0 +1,132 @@
> +=================
> +DM remap & filter
> +=================
> +
> +Introduction
> +============
> +
> +Usually LVM should be used for new devices.
> +The administrator have to create logical volumes for the system partition

                     has

> +when installing the operating system. For a running system with
> +partitioned disk space and mounted file systems, it is quite difficult to
> +reconfigure to logical volumes. As a result, all the features that Device
> +Mapper provides are not available for non-LVM systems.
> +This problem is partially solved by the dm remap functionality, which
> +uses the kernel's blk_interposer.
> +
> +blk_interposer
> +==============
> +
> +Blk_interposer extends the capabilities of the DM, as it allows to
> +intercept and redirect bio requests for block devices that are not
> +dm device. At the same time, blk_interposer allows to attach and detach

      devices.

> +from devices "on the fly", without stopping the execution of user
> +programs.
> +
> +Blk_interposer allows to do two tasks: remap and filter.
> +Remap allows to redirect all requests from one block device to another.
> +Filter allows to do additional processing of the request, but without
> +redirection. An intercepted request can get to the block device to which
> +it was addressed, without changes.
> +
> +Remap
> +=====
> +
> +Consider the functionality of the remap. This will allow to connect
> +any block device with a dm device "on the fly".
> +Suppose we have a file system mounted on the block device /dev/sda1::
> +
> +  +-------------+
> +  | file system |
> +  +-------------+
> +        ||
> +        \/
> +  +-------------+
> +  | /dev/sda1   |
> +  +-------------+
> +
> +Creating a new DM device that will be mapped on the same /dev/sda1::

Sometimes it's "DM", other times it's "dm" device. Please be consistent.

> +
> +  +-------------+  +-----------+
> +  | file system |  | dm-linear |
> +  +-------------+  +-----------+
> +           ||         ||
> +           \/         \/
> +         +---------------+
> +         |   /dev/sda1   |
> +         +---------------+
> +
> +Redirecting all bio requests for the /dev/sda1 device to the new dm
> +device::
> +
> +  +-------------+
> +  | file system |
> +  +-------------+
> +        ||
> +        \/
> +   +----------+    +-----------+
> +   |  remap   | => | dm-linear |
> +   +----------+    +-----------+
> +                         ||
> +                         \/
> +                   +-----------+
> +                   | /dev/sda1 |
> +                   +-----------+
> +
> +To achieve this, you need to:
> +
> +Create new dm device with option 'noexcl'. It's allow to open

                                                   allowed to open the

> +underlying block-device without the FMODE_EXCL flag::
> +
> +  echo "0 `blockdev --getsz $1` linear $DEV 0 noexcl" | dmsetup create dm-noexcl
> +
> +Call remap command::
> +
> +  dmsetup remap start dm-noexcl $1
> +
> +Remap can be used to extend the functionality of dm-snap. This will allow
> +to take snapshots from any block devices, not just logical volumes.
> +
> +Filter
> +======
> +
> +Filter does not redirect the bio to another device. It does not create
> +a clone of the bio request. After receiving the request, the filter can
> +only add some processing, complete the bio request, or return the bio
> +for further processing.
> +
> +Suppose we have a file system mounted on the block device /dev/sda1::
> +
> +  +-------------+
> +  | file system |
> +  +-------------+
> +        ||
> +        \/
> +  +-------------+
> +  | /dev/sda1   |
> +  +-------------+
> +
> +Creating a new dm device that will implement filter::
> +
> +  +-------------+
> +  | file system |
> +  +-------------+
> +        ||
> +        \/
> +    +--------+    +----------+
> +    | filter | => | dm-delay |
> +    +--------+    +----------+
> +        ||
> +        \/
> +  +-------------+
> +  | /dev/sda1   |
> +  +-------------+
> +
> +Using filter we can change the behavior of debugging tools:
> + * dm-dust,
> + * dm-delay,
> + * dm-flakey,
> + * dm-verity.
> +
> +In the new version, they are will be able to change the behavior of any

          Either       they are able to change the behavior of any
            or         they will be able to change the behavior of any

I prefer the first choice.

> +existing block device, without creating a new one.


According to Documentation/doc-guide/sphinx.rst, here is how
chapters, sections, etc., should be indicated:


* Please stick to this order of heading adornments:

  1. ``=`` with overline for document title::

       ==============
       Document title
       ==============

  2. ``=`` for chapters::

       Chapters
       ========

  3. ``-`` for sections::

       Section
       -------

  4. ``~`` for subsections::

       Subsection
       ~~~~~~~~~~

  Although RST doesn't mandate a specific order ("Rather than imposing a fixed
  number and order of section title adornment styles, the order enforced will be
  the order as encountered."), having the higher levels the same overall makes
  it easier to follow the documents.



thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/6] docs: device-mapper: add remap_and_filter
  2021-02-04  4:15   ` Randy Dunlap
@ 2021-02-04  9:44     ` Sergei Shtepa
  0 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-04  9:44 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Damien.LeMoal, snitzer, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel, pavgel.tide

Thank you, I'll fix it all.

-- 
Sergei Shtepa
Veeam Software developer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/6] block: add blk_interposer
  2021-02-03 16:18   ` Mike Snitzer
@ 2021-02-04 10:06     ` Sergei Shtepa
  0 siblings, 0 replies; 13+ messages in thread
From: Sergei Shtepa @ 2021-02-04 10:06 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Damien.LeMoal, hare, ming.lei, agk, corbet, axboe, jack,
	johannes.thumshirn, gregkh, koct9i, steve, dm-devel, linux-block,
	linux-doc, linux-kernel, pavgel.tide

The 02/03/2021 11:18, Mike Snitzer wrote:
> On Wed, Feb 03 2021 at 10:53am -0500,
> Sergei Shtepa <sergei.shtepa@veeam.com> wrote:
> 
> > blk_interposer allows to intercept bio requests, remap bio to another devices or add new bios.
> > 
> > Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> > ---
> >  block/bio.c               |  2 +
> >  block/blk-core.c          | 33 ++++++++++++++++
> >  block/genhd.c             | 82 +++++++++++++++++++++++++++++++++++++++
> >  include/linux/blk_types.h |  6 ++-
> >  include/linux/genhd.h     | 18 +++++++++
> >  5 files changed, 139 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 1f2cc1fbe283..f6f135eb84b5 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
> >  	bio_set_flag(bio, BIO_CLONED);
> >  	if (bio_flagged(bio_src, BIO_THROTTLED))
> >  		bio_set_flag(bio, BIO_THROTTLED);
> > +	if (bio_flagged(bio_src, BIO_INTERPOSED))
> > +		bio_set_flag(bio, BIO_INTERPOSED);
> >  	bio->bi_opf = bio_src->bi_opf;
> >  	bio->bi_ioprio = bio_src->bi_ioprio;
> >  	bio->bi_write_hint = bio_src->bi_write_hint;
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 7663a9b94b80..c84bc42ba88b 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1032,6 +1032,32 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
> >  	return ret;
> >  }
> >  
> > +static blk_qc_t __submit_bio_interposed(struct bio *bio)
> > +{
> > +	struct bio_list bio_list[2] = { };
> > +	blk_qc_t ret = BLK_QC_T_NONE;
> > +
> > +	current->bio_list = bio_list;
> > +	if (likely(bio_queue_enter(bio) == 0)) {
> > +		struct gendisk *disk = bio->bi_disk;
> > +
> > +		if (likely(blk_has_interposer(disk))) {
> > +			bio_set_flag(bio, BIO_INTERPOSED);
> > +			disk->interposer->ip_submit_bio(bio);
> > +		} else /* interposer was removed */
> > +			bio_list_add(&current->bio_list[0], bio);
> 
> style nit:
> 
> } else {
> 	/* interposer was removed */
> 	bio_list_add(&current->bio_list[0], bio);
> }
> 
> > +
> > +		blk_queue_exit(disk->queue);
> > +	}
> > +	current->bio_list = NULL;
> > +
> > +	/* Resubmit remaining bios */
> > +	while ((bio = bio_list_pop(&bio_list[0])))
> > +		ret = submit_bio_noacct(bio);
> > +
> > +	return ret;
> > +}
> > +
> >  /**
> >   * submit_bio_noacct - re-submit a bio to the block device layer for I/O
> >   * @bio:  The bio describing the location in memory and on the device.
> > @@ -1057,6 +1083,13 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
> >  		return BLK_QC_T_NONE;
> >  	}
> >  
> > +	/*
> > +	 * Checking the BIO_INTERPOSED flag is necessary so that the bio
> > +	 * created by the blk_interposer do not get to it for processing.
> > +	 */
> > +	if (blk_has_interposer(bio->bi_disk) &&
> > +	    !bio_flagged(bio, BIO_INTERPOSED))
> > +		return __submit_bio_interposed(bio);
> >  	if (!bio->bi_disk->fops->submit_bio)
> >  		return __submit_bio_noacct_mq(bio);
> >  	return __submit_bio_noacct(bio);
> > diff --git a/block/genhd.c b/block/genhd.c
> > index 419548e92d82..39785a3ef703 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -30,6 +30,7 @@
> >  static struct kobject *block_depr;
> >  
> >  DECLARE_RWSEM(bdev_lookup_sem);
> > +DEFINE_MUTEX(bdev_interposer_mutex);
> 
> Seems you're using this mutex to protect access to disk->interposer in
> attach/detach.  This is to prevent attach/detach races to same device?

Yes. There is a probability of 0.00...01% that two different modules will
try to attach/detach to the same disk at the same time.
Since the attach/detach operation is infrequent, using mutex is quite appropriate.
> 
> Thankfully attach/detach isn't in the bio submission fast path but it'd
> be helpful to document what this mutex is protecting).

I'll think about the name of this mutex and add a comment.

> 
> A storm of attach or detach will all hit this global mutex though...
> 
> Mike
> 

Thank you for the review.
I am very interested in your opinion about [PATCH v4 4/6] and [PATCH v4 5/6].
However, the kernel test robot has already found something there on sparc.

-- 
Sergei Shtepa
Veeam Software developer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-04 10:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 15:53 [PATCH v4 0/6] block-layer interposer Sergei Shtepa
2021-02-03 15:53 ` [PATCH v4 1/6] docs: device-mapper: add remap_and_filter Sergei Shtepa
2021-02-04  4:15   ` Randy Dunlap
2021-02-04  9:44     ` Sergei Shtepa
2021-02-03 15:53 ` [PATCH v4 2/6] block: add blk_interposer Sergei Shtepa
2021-02-03 16:18   ` Mike Snitzer
2021-02-04 10:06     ` Sergei Shtepa
2021-02-03 15:53 ` [PATCH v4 3/6] block: add blk_mq_is_queue_frozen() Sergei Shtepa
2021-02-03 16:09   ` Mike Snitzer
2021-02-03 15:53 ` [PATCH v4 4/6] dm: new ioctl DM_DEV_REMAP_CMD Sergei Shtepa
2021-02-03 16:00   ` Greg KH
2021-02-03 15:53 ` [PATCH v4 5/6] dm: add 'noexcl' option for dm-linear Sergei Shtepa
2021-02-03 15:53 ` [PATCH v4 6/6] docs: device-mapper: " Sergei Shtepa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).