All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: avoid to serialize blkdev_fallocate
@ 2022-05-12 13:48 Ming Lei
  2022-05-13 16:51 ` Bart Van Assche
  2022-05-16  9:52 ` Jan Kara
  0 siblings, 2 replies; 4+ messages in thread
From: Ming Lei @ 2022-05-12 13:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Jan Kara

Commit f278eb3d8178 ("block: hold ->invalidate_lock in blkdev_fallocate")
adds ->invalidate_lock in blkdev_fallocate() for preventing stale data
from being read during fallocate().

However, the side effect is that blkdev_fallocate() becomes serialized
since blkdev_fallocate() always blocks.

Add one atomic fallocate counter for allowing blkdev_fallocate() to
be run concurrently so that discard/write_zero bios from different
fallocate() can be submitted in parallel.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/fops.c              | 6 ++++--
 include/linux/blk_types.h | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 9f2ecec406b0..368866b15e68 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -651,7 +651,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
 		return -EINVAL;
 
-	filemap_invalidate_lock(inode->i_mapping);
+	if (atomic_inc_return(&bdev->bd_fallocate_count) == 1)
+		filemap_invalidate_lock(inode->i_mapping);
 
 	/* Invalidate the page cache, including dirty pages. */
 	error = truncate_bdev_range(bdev, file->f_mode, start, end);
@@ -679,7 +680,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	}
 
  fail:
-	filemap_invalidate_unlock(inode->i_mapping);
+	if (!atomic_dec_return(&bdev->bd_fallocate_count))
+		filemap_invalidate_unlock(inode->i_mapping);
 	return error;
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1973ef9bd40f..9ccd841ea8ed 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -58,6 +58,8 @@ struct block_device {
 	struct gendisk *	bd_disk;
 	struct request_queue *	bd_queue;
 
+	atomic_t		bd_fallocate_count;
+
 	/* The counter of freeze processes */
 	int			bd_fsfreeze_count;
 	/* Mutex for freeze */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: avoid to serialize blkdev_fallocate
  2022-05-12 13:48 [PATCH] block: avoid to serialize blkdev_fallocate Ming Lei
@ 2022-05-13 16:51 ` Bart Van Assche
  2022-05-16  9:52 ` Jan Kara
  1 sibling, 0 replies; 4+ messages in thread
From: Bart Van Assche @ 2022-05-13 16:51 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: linux-block, Jan Kara

On 5/12/22 06:48, Ming Lei wrote:
> Commit f278eb3d8178 ("block: hold ->invalidate_lock in blkdev_fallocate")
> adds ->invalidate_lock in blkdev_fallocate() for preventing stale data
> from being read during fallocate().
> 
> However, the side effect is that blkdev_fallocate() becomes serialized
> since blkdev_fallocate() always blocks.
> 
> Add one atomic fallocate counter for allowing blkdev_fallocate() to
> be run concurrently so that discard/write_zero bios from different
> fallocate() can be submitted in parallel.

Please consider adding a "Fixes:" tag since otherwise this patch won't 
be integrated automatically into any of the stable kernel trees.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: avoid to serialize blkdev_fallocate
  2022-05-12 13:48 [PATCH] block: avoid to serialize blkdev_fallocate Ming Lei
  2022-05-13 16:51 ` Bart Van Assche
@ 2022-05-16  9:52 ` Jan Kara
  2022-05-31  7:29   ` Christoph Hellwig
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Kara @ 2022-05-16  9:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Jan Kara, Christoph Hellwig

On Thu 12-05-22 21:48:14, Ming Lei wrote:
> Commit f278eb3d8178 ("block: hold ->invalidate_lock in blkdev_fallocate")
> adds ->invalidate_lock in blkdev_fallocate() for preventing stale data
> from being read during fallocate().
> 
> However, the side effect is that blkdev_fallocate() becomes serialized
> since blkdev_fallocate() always blocks.
> 
> Add one atomic fallocate counter for allowing blkdev_fallocate() to
> be run concurrently so that discard/write_zero bios from different
> fallocate() can be submitted in parallel.
> 
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Couple of points:
* What about blk_ioctl_zeroout() or blk_ioctl_discard()?
* This way processes calling discard / zeroout can easily starve process
  wanting to do read which does not seem ideal. Somewhat related is that I
  know that Christoph wanted to modify how we use filemap_invalidate_lock()
  so that huge zeroouts performed by actually writing zeros will not block
  readers for ages and this seems to be going in somewhat opposite direction
  favoring writers even more.
* I suspect this is going to upset lockdep (and lock owner tracking for the
  purposes of spinning) pretty badly because the lock owner may be returning
  to userspace without unlocking the lock.

								Honza

> ---
>  block/fops.c              | 6 ++++--
>  include/linux/blk_types.h | 2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index 9f2ecec406b0..368866b15e68 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -651,7 +651,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
>  		return -EINVAL;
>  
> -	filemap_invalidate_lock(inode->i_mapping);
> +	if (atomic_inc_return(&bdev->bd_fallocate_count) == 1)
> +		filemap_invalidate_lock(inode->i_mapping);
>  
>  	/* Invalidate the page cache, including dirty pages. */
>  	error = truncate_bdev_range(bdev, file->f_mode, start, end);
> @@ -679,7 +680,8 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>  	}
>  
>   fail:
> -	filemap_invalidate_unlock(inode->i_mapping);
> +	if (!atomic_dec_return(&bdev->bd_fallocate_count))
> +		filemap_invalidate_unlock(inode->i_mapping);
>  	return error;
>  }
>  
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 1973ef9bd40f..9ccd841ea8ed 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -58,6 +58,8 @@ struct block_device {
>  	struct gendisk *	bd_disk;
>  	struct request_queue *	bd_queue;
>  
> +	atomic_t		bd_fallocate_count;
> +
>  	/* The counter of freeze processes */
>  	int			bd_fsfreeze_count;
>  	/* Mutex for freeze */
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block: avoid to serialize blkdev_fallocate
  2022-05-16  9:52 ` Jan Kara
@ 2022-05-31  7:29   ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2022-05-31  7:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig

On Mon, May 16, 2022 at 11:52:39AM +0200, Jan Kara wrote:
> * This way processes calling discard / zeroout can easily starve process
>   wanting to do read which does not seem ideal. Somewhat related is that I
>   know that Christoph wanted to modify how we use filemap_invalidate_lock()
>   so that huge zeroouts performed by actually writing zeros will not block
>   readers for ages and this seems to be going in somewhat opposite direction
>   favoring writers even more.
> * I suspect this is going to upset lockdep (and lock owner tracking for the
>   purposes of spinning) pretty badly because the lock owner may be returning
>   to userspace without unlocking the lock.

Yes to all of the above.  I think we need to just pick a smaller
defauly zeroout size (and make that tunable) to avoid holding the lock
over really long operations.  Or turn the invalidate lock into a range
lock..

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-31  7:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 13:48 [PATCH] block: avoid to serialize blkdev_fallocate Ming Lei
2022-05-13 16:51 ` Bart Van Assche
2022-05-16  9:52 ` Jan Kara
2022-05-31  7:29   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.