All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Sergei Shtepa <sergei.shtepa@veeam.com>
Cc: axboe@kernel.dk, corbet@lwn.net, linux-block@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism
Date: Tue, 31 Jan 2023 18:58:28 -0500	[thread overview]
Message-ID: <Y9mrJJDFnMNWR7Vn@redhat.com> (raw)
In-Reply-To: <20221209142331.26395-3-sergei.shtepa@veeam.com>

On Fri, Dec 09 2022 at  9:23P -0500,
Sergei Shtepa <sergei.shtepa@veeam.com> wrote:

> Allows to attach block device filters to the block devices. Kernel
> modules can use this functionality to extend the capabilities of the
> block layer.
> 
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> ---
>  block/bdev.c              | 70 ++++++++++++++++++++++++++++++++++++++
>  block/blk-core.c          | 19 +++++++++--
>  include/linux/blk_types.h |  2 ++
>  include/linux/blkdev.h    | 71 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 160 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index d699ecdb3260..b820178824b2 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -427,6 +427,7 @@ static void init_once(void *data)
>  
>  static void bdev_evict_inode(struct inode *inode)
>  {
> +	bdev_filter_detach(I_BDEV(inode));
>  	truncate_inode_pages_final(&inode->i_data);
>  	invalidate_inode_buffers(inode); /* is it needed here? */
>  	clear_inode(inode);
> @@ -502,6 +503,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>  		return NULL;
>  	}
>  	bdev->bd_disk = disk;
> +	bdev->bd_filter = NULL;
>  	return bdev;
>  }
>  
> @@ -1092,3 +1094,71 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>  
>  	blkdev_put_no_open(bdev);
>  }
> +
> +/**
> + * bdev_filter_attach - Attach the filter to the original block device.
> + * @bdev:
> + *	Block device.
> + * @flt:
> + *	Filter that needs to be attached to the block device.
> + *
> + * Before adding a filter, it is necessary to initialize &struct bdev_filter
> + * using a bdev_filter_init() function.
> + *
> + * The bdev_filter_detach() function allows to detach the filter from the block
> + * device.
> + *
> + * Return: 0 if succeeded, or -EALREADY if the filter already exists.
> + */
> +int bdev_filter_attach(struct block_device *bdev,
> +				     struct bdev_filter *flt)
> +{
> +	int ret = 0;
> +
> +	blk_mq_freeze_queue(bdev->bd_queue);
> +	blk_mq_quiesce_queue(bdev->bd_queue);
> +
> +	if (bdev->bd_filter)
> +		ret = -EALREADY;
> +	else
> +		bdev->bd_filter = flt;
> +
> +	blk_mq_unquiesce_queue(bdev->bd_queue);
> +	blk_mq_unfreeze_queue(bdev->bd_queue);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(bdev_filter_attach);
> +
> +/**
> + * bdev_filter_detach - Detach the filter from the block device.
> + * @bdev:
> + *	Block device.
> + *
> + * The filter should be added using the bdev_filter_attach() function.
> + *
> + * Return: 0 if succeeded, or -ENOENT if the filter was not found.
> + */
> +int bdev_filter_detach(struct block_device *bdev)
> +{
> +	int ret = 0;
> +	struct bdev_filter *flt = NULL;
> +
> +	blk_mq_freeze_queue(bdev->bd_queue);
> +	blk_mq_quiesce_queue(bdev->bd_queue);
> +
> +	flt = bdev->bd_filter;
> +	if (flt)
> +		bdev->bd_filter = NULL;
> +	else
> +		ret = -ENOENT;
> +
> +	blk_mq_unquiesce_queue(bdev->bd_queue);
> +	blk_mq_unfreeze_queue(bdev->bd_queue);
> +
> +	if (flt)
> +		bdev_filter_put(flt);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(bdev_filter_detach);

What about bio-based devices? (DM, MD, etc)

DM uses freeze_bdev() and thaw_bdev(), seems like you're missing some
work here.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5487912befe8..284b295a7b23 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -678,9 +678,24 @@ void submit_bio_noacct_nocheck(struct bio *bio)
>  	 * to collect a list of requests submited by a ->submit_bio method while
>  	 * it is active, and then process them after it returned.
>  	 */
> -	if (current->bio_list)
> +	if (current->bio_list) {
>  		bio_list_add(&current->bio_list[0], bio);
> -	else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
> +		return;
> +	}
> +
> +	if (bio->bi_bdev->bd_filter && !bio_flagged(bio, BIO_FILTERED)) {

Shouldn't this be: if (unlikely(...))?

But that obviously assumes a fair amount about the only consumer
(temporary filter that lasts as long as it takes to do a backup).

> +		bool pass;
> +
> +		pass = bio->bi_bdev->bd_filter->fops->submit_bio_cb(bio);
> +		bio_set_flag(bio, BIO_FILTERED);
> +		if (!pass) {
> +			bio->bi_status = BLK_STS_OK;
> +			bio_endio(bio);
> +			return;
> +		}
> +	}
> +
> +	if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>  		__submit_bio_noacct_mq(bio);
>  	else
>  		__submit_bio_noacct(bio);

And you currently don't allow for blkfilter to be involved if a bio
recurses (which is how bio splitting works now).  Not sure it
matters, just mentioning it...

But taking a step back, in the hopes of stepping out of your way:

Myself and others on the DM team (past and present) have always hoped
all block devices could have the flexibility of DM. It was that hope
that caused my frustration when I first saw your blkfilter approach.

But I was too idealistic that a byproduct of your efforts
(blk-interposer before and blkfilter now) would usher in _all_ block
devices being able to comprehensively change their identity (and IO
processing) like DM enjoys.

DM showcases all the extra code needed to achieve its extreme IO
remapping and stacking flexibilty -- I don't yet see a way to distill
the essence of what DM achieves without imposing too much on all block
core.

So I do think blkfilter is a pragmatic way to achieve your goals.

Mike

WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@kernel.org>
To: Sergei Shtepa <sergei.shtepa@veeam.com>
Cc: axboe@kernel.dk, linux-doc@vger.kernel.org, corbet@lwn.net,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: [dm-devel] [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism
Date: Tue, 31 Jan 2023 18:58:28 -0500	[thread overview]
Message-ID: <Y9mrJJDFnMNWR7Vn@redhat.com> (raw)
In-Reply-To: <20221209142331.26395-3-sergei.shtepa@veeam.com>

On Fri, Dec 09 2022 at  9:23P -0500,
Sergei Shtepa <sergei.shtepa@veeam.com> wrote:

> Allows to attach block device filters to the block devices. Kernel
> modules can use this functionality to extend the capabilities of the
> block layer.
> 
> Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
> ---
>  block/bdev.c              | 70 ++++++++++++++++++++++++++++++++++++++
>  block/blk-core.c          | 19 +++++++++--
>  include/linux/blk_types.h |  2 ++
>  include/linux/blkdev.h    | 71 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 160 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index d699ecdb3260..b820178824b2 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -427,6 +427,7 @@ static void init_once(void *data)
>  
>  static void bdev_evict_inode(struct inode *inode)
>  {
> +	bdev_filter_detach(I_BDEV(inode));
>  	truncate_inode_pages_final(&inode->i_data);
>  	invalidate_inode_buffers(inode); /* is it needed here? */
>  	clear_inode(inode);
> @@ -502,6 +503,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>  		return NULL;
>  	}
>  	bdev->bd_disk = disk;
> +	bdev->bd_filter = NULL;
>  	return bdev;
>  }
>  
> @@ -1092,3 +1094,71 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>  
>  	blkdev_put_no_open(bdev);
>  }
> +
> +/**
> + * bdev_filter_attach - Attach the filter to the original block device.
> + * @bdev:
> + *	Block device.
> + * @flt:
> + *	Filter that needs to be attached to the block device.
> + *
> + * Before adding a filter, it is necessary to initialize &struct bdev_filter
> + * using a bdev_filter_init() function.
> + *
> + * The bdev_filter_detach() function allows to detach the filter from the block
> + * device.
> + *
> + * Return: 0 if succeeded, or -EALREADY if the filter already exists.
> + */
> +int bdev_filter_attach(struct block_device *bdev,
> +				     struct bdev_filter *flt)
> +{
> +	int ret = 0;
> +
> +	blk_mq_freeze_queue(bdev->bd_queue);
> +	blk_mq_quiesce_queue(bdev->bd_queue);
> +
> +	if (bdev->bd_filter)
> +		ret = -EALREADY;
> +	else
> +		bdev->bd_filter = flt;
> +
> +	blk_mq_unquiesce_queue(bdev->bd_queue);
> +	blk_mq_unfreeze_queue(bdev->bd_queue);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(bdev_filter_attach);
> +
> +/**
> + * bdev_filter_detach - Detach the filter from the block device.
> + * @bdev:
> + *	Block device.
> + *
> + * The filter should be added using the bdev_filter_attach() function.
> + *
> + * Return: 0 if succeeded, or -ENOENT if the filter was not found.
> + */
> +int bdev_filter_detach(struct block_device *bdev)
> +{
> +	int ret = 0;
> +	struct bdev_filter *flt = NULL;
> +
> +	blk_mq_freeze_queue(bdev->bd_queue);
> +	blk_mq_quiesce_queue(bdev->bd_queue);
> +
> +	flt = bdev->bd_filter;
> +	if (flt)
> +		bdev->bd_filter = NULL;
> +	else
> +		ret = -ENOENT;
> +
> +	blk_mq_unquiesce_queue(bdev->bd_queue);
> +	blk_mq_unfreeze_queue(bdev->bd_queue);
> +
> +	if (flt)
> +		bdev_filter_put(flt);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(bdev_filter_detach);

What about bio-based devices? (DM, MD, etc)

DM uses freeze_bdev() and thaw_bdev(), seems like you're missing some
work here.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5487912befe8..284b295a7b23 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -678,9 +678,24 @@ void submit_bio_noacct_nocheck(struct bio *bio)
>  	 * to collect a list of requests submited by a ->submit_bio method while
>  	 * it is active, and then process them after it returned.
>  	 */
> -	if (current->bio_list)
> +	if (current->bio_list) {
>  		bio_list_add(&current->bio_list[0], bio);
> -	else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
> +		return;
> +	}
> +
> +	if (bio->bi_bdev->bd_filter && !bio_flagged(bio, BIO_FILTERED)) {

Shouldn't this be: if (unlikely(...))?

But that obviously assumes a fair amount about the only consumer
(temporary filter that lasts as long as it takes to do a backup).

> +		bool pass;
> +
> +		pass = bio->bi_bdev->bd_filter->fops->submit_bio_cb(bio);
> +		bio_set_flag(bio, BIO_FILTERED);
> +		if (!pass) {
> +			bio->bi_status = BLK_STS_OK;
> +			bio_endio(bio);
> +			return;
> +		}
> +	}
> +
> +	if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>  		__submit_bio_noacct_mq(bio);
>  	else
>  		__submit_bio_noacct(bio);

And you currently don't allow for blkfilter to be involved if a bio
recurses (which is how bio splitting works now).  Not sure it
matters, just mentioning it...

But taking a step back, in the hopes of stepping out of your way:

Myself and others on the DM team (past and present) have always hoped
all block devices could have the flexibility of DM. It was that hope
that caused my frustration when I first saw your blkfilter approach.

But I was too idealistic that a byproduct of your efforts
(blk-interposer before and blkfilter now) would usher in _all_ block
devices being able to comprehensively change their identity (and IO
processing) like DM enjoys.

DM showcases all the extra code needed to achieve its extreme IO
remapping and stacking flexibilty -- I don't yet see a way to distill
the essence of what DM achieves without imposing too much on all block
core.

So I do think blkfilter is a pragmatic way to achieve your goals.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  parent reply	other threads:[~2023-01-31 23:59 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 14:23 [PATCH v2 00/21] blksnap - block devices snapshots module Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 01/21] documentation, blkfilter: Block Device Filtering Mechanism Sergei Shtepa
2022-12-10  4:15   ` Bagas Sanjaya
2022-12-09 14:23 ` [PATCH v2 02/21] block, " Sergei Shtepa
2022-12-15  9:26   ` Christoph Hellwig
2022-12-15 10:46     ` Sergei Shtepa
2022-12-16  7:04       ` Christoph Hellwig
2023-01-31 23:58   ` Mike Snitzer [this message]
2023-01-31 23:58     ` [dm-devel] " Mike Snitzer
2023-02-01 11:09     ` Fabio Fantoni
2023-02-01 11:09       ` [dm-devel] " Fabio Fantoni
2023-02-01 13:16     ` Sergei Shtepa
2023-02-01 13:16       ` [dm-devel] " Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 03/21] documentation, capability: fix Generic Block Device Capability Sergei Shtepa
2022-12-13 12:13   ` Fabio Fantoni
2022-12-30 15:35     ` Fabio Fantoni
2022-12-09 14:23 ` [PATCH v2 04/21] documentation, blksnap: Block Devices Snapshots Module Sergei Shtepa
2022-12-10  3:50   ` Bagas Sanjaya
2022-12-09 14:23 ` [PATCH v2 05/21] block, blksnap: header file of the module interface Sergei Shtepa
2022-12-09 22:13   ` kernel test robot
2022-12-09 23:14   ` kernel test robot
2022-12-09 14:23 ` [PATCH v2 06/21] block, blksnap: module management interface functions Sergei Shtepa
2022-12-15  9:28   ` Christoph Hellwig
     [not found]   ` <CGME20230103153406eucas1p205c48bd767e6a86f6f1121db7eb5fc19@eucas1p2.samsung.com>
2023-01-03 15:26     ` Pankaj Raghav
2022-12-09 14:23 ` [PATCH v2 07/21] block, blksnap: init() and exit() functions Sergei Shtepa
2022-12-15  9:30   ` Christoph Hellwig
2022-12-09 14:23 ` [PATCH v2 08/21] block, blksnap: interaction with sysfs Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 09/21] block, blksnap: attaching and detaching the filter and handling I/O units Sergei Shtepa
2022-12-15 10:01   ` Christoph Hellwig
2022-12-09 14:23 ` [PATCH v2 10/21] block, blksnap: map of change block tracking Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 11/21] block, blksnap: minimum data storage unit of the original block device Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 12/21] block, blksnap: buffer in memory for the minimum data storage unit Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 13/21] block, blksnap: functions and structures for performing block I/O operations Sergei Shtepa
2022-12-15 10:06   ` Christoph Hellwig
2022-12-09 14:23 ` [PATCH v2 14/21] block, blksnap: storage for storing difference blocks Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 15/21] block, blksnap: event queue from the difference storage Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 16/21] block, blksnap: owner of information about overwritten blocks of the original block device Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 17/21] block, blksnap: snapshot image " Sergei Shtepa
2022-12-15  9:45   ` Christoph Hellwig
2023-01-01  7:18   ` Hillf Danton
2023-01-02  9:44     ` Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 18/21] block, blksnap: snapshot Sergei Shtepa
2023-01-01 11:05   ` Hillf Danton
2023-01-02  9:58     ` Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 19/21] block, blksnap: Kconfig and Makefile Sergei Shtepa
2022-12-09 14:23 ` [PATCH v2 20/21] block, blksnap: adds a blksnap to the kernel tree Sergei Shtepa
2022-12-09 21:53   ` kernel test robot
2022-12-09 14:23 ` [PATCH v2 21/21] block, blksnap: adds a maintainer for new files Sergei Shtepa
2022-12-10  3:23 ` [PATCH v2 00/21] blksnap - block devices snapshots module Bagas Sanjaya
2022-12-10 22:57   ` Sergei Shtepa
2023-01-17 21:04 ` Mike Snitzer
2023-01-18 10:51   ` Hannes Reinecke
2023-01-24 11:34   ` Sergei Shtepa
2023-01-31 20:47     ` [dm-devel] " Mike Snitzer
2023-01-31 20:47       ` Mike Snitzer
2023-02-01  2:32       ` Mason Giles
2023-02-01  2:32         ` [dm-devel] " Mason Giles

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9mrJJDFnMNWR7Vn@redhat.com \
    --to=snitzer@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=corbet@lwn.net \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sergei.shtepa@veeam.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.