io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: John Garry <john.g.garry@oracle.com>,
	axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	dchinner@redhat.com, jack@suse.cz
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org,
	tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org,
	ojaswin@linux.ibm.com, linux-aio@kvack.org,
	linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org,
	nilay@linux.ibm.com, John Garry <john.g.garry@oracle.com>
Subject: Re: [PATCH v4 07/11] block: Add fops atomic write support
Date: Sun, 25 Feb 2024 20:16:15 +0530	[thread overview]
Message-ID: <87cysk1u14.fsf@doe.com> (raw)
In-Reply-To: <20240219130109.341523-8-john.g.garry@oracle.com>

John Garry <john.g.garry@oracle.com> writes:

> Support atomic writes by submitting a single BIO with the REQ_ATOMIC set.
>
> It must be ensured that the atomic write adheres to its rules, like
> naturally aligned offset, so call blkdev_dio_invalid() ->
> blkdev_atomic_write_valid() [with renaming blkdev_dio_unaligned() to
> blkdev_dio_invalid()] for this purpose.
>
> In blkdev_direct_IO(), if the nr_pages exceeds BIO_MAX_VECS, then we cannot
> produce a single BIO, so error in this case.

BIO_MAX_VECS is 256. So around 1MB limit with 4k pagesize. 
Any mention of why this limit for now? Is it due to code complexity that
we only support a single bio? 
As I see it, you have still enabled req merging in block layer for
atomic requests. So it can essentially submit bio chains to the device
driver? So why not support this case for user to submit a req. larger
than 1 MB? 

>
> Finally set FMODE_CAN_ATOMIC_WRITE when the bdev can support atomic writes
> and the associated file flag is for O_DIRECT.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  block/fops.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/block/fops.c b/block/fops.c
> index 28382b4d097a..563189c2fc5a 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -34,13 +34,27 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb)
>  	return opf;
>  }
>  
> -static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
> -			      struct iov_iter *iter)
> +static bool blkdev_atomic_write_valid(struct block_device *bdev, loff_t pos,
> +				      struct iov_iter *iter)
>  {
> +	struct request_queue *q = bdev_get_queue(bdev);
> +	unsigned int min_bytes = queue_atomic_write_unit_min_bytes(q);
> +	unsigned int max_bytes = queue_atomic_write_unit_max_bytes(q);
> +
> +	return atomic_write_valid(pos, iter, min_bytes, max_bytes);

generic_atomic_write_valid() would be better for this function. However,
I have any commented about this in some previous

> +}
> +
> +static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos,
> +				struct iov_iter *iter, bool atomic_write)

bool "is_atomic" or "is_atomic_write" perhaps? 
we anyway know that we only support atomic writes and RWF_ATOMIC
operation is made -EOPNOTSUPP for reads in kiocb_set_rw_flags().
So we may as well make it "is_atomic" for bools.

> +{
> +	if (atomic_write && !blkdev_atomic_write_valid(bdev, pos, iter))
> +		return true;
> +
>  	return pos & (bdev_logical_block_size(bdev) - 1) ||
>  		!bdev_iter_is_aligned(bdev, iter);
>  }
>  
> +
>  #define DIO_INLINE_BIO_VECS 4
>  
>  static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
> @@ -71,6 +85,8 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
>  	}
>  	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
>  	bio.bi_ioprio = iocb->ki_ioprio;
> +	if (iocb->ki_flags & IOCB_ATOMIC)
> +		bio.bi_opf |= REQ_ATOMIC;
>  
>  	ret = bio_iov_iter_get_pages(&bio, iter);
>  	if (unlikely(ret))
> @@ -341,6 +357,9 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
>  		task_io_account_write(bio->bi_iter.bi_size);
>  	}
>  
> +	if (iocb->ki_flags & IOCB_ATOMIC)
> +		bio->bi_opf |= REQ_ATOMIC;
> +
>  	if (iocb->ki_flags & IOCB_NOWAIT)
>  		bio->bi_opf |= REQ_NOWAIT;
>  
> @@ -357,13 +376,14 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
>  static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
> +	bool atomic_write = iocb->ki_flags & IOCB_ATOMIC;

ditto, bool is_atomic perhaps?

>  	loff_t pos = iocb->ki_pos;
>  	unsigned int nr_pages;
>  
>  	if (!iov_iter_count(iter))
>  		return 0;
>  
> -	if (blkdev_dio_unaligned(bdev, pos, iter))
> +	if (blkdev_dio_invalid(bdev, pos, iter, atomic_write))
>  		return -EINVAL;
>  
>  	nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
> @@ -371,6 +391,8 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  		if (is_sync_kiocb(iocb))
>  			return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
>  		return __blkdev_direct_IO_async(iocb, iter, nr_pages);
> +	} else if (atomic_write) {
> +		return -EINVAL;
>  	}
>  	return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages));
>  }
> @@ -616,6 +638,9 @@ static int blkdev_open(struct inode *inode, struct file *filp)
>  	if (bdev_nowait(handle->bdev))
>  		filp->f_mode |= FMODE_NOWAIT;
>  
> +	if (bdev_can_atomic_write(handle->bdev) && filp->f_flags & O_DIRECT)
> +		filp->f_mode |= FMODE_CAN_ATOMIC_WRITE;
> +
>  	filp->f_mapping = handle->bdev->bd_inode->i_mapping;
>  	filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
>  	filp->private_data = handle;
> -- 
> 2.31.1

  reply	other threads:[~2024-02-25 14:46 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 13:00 [PATCH v4 00/11] block atomic writes John Garry
2024-02-19 13:00 ` [PATCH v4 01/11] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-19 13:01 ` [PATCH v4 02/11] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-20  8:31     ` John Garry
2024-02-20  6:54   ` Christoph Hellwig
2024-02-19 13:01 ` [PATCH v4 03/11] fs: Initial atomic write support John Garry
2024-02-19 19:16   ` David Sterba
2024-02-20  8:13     ` John Garry
2024-02-19 22:44   ` Dave Chinner
2024-02-20  9:52     ` John Garry
2024-02-24 18:16   ` Ritesh Harjani
2024-02-24 18:20     ` Ritesh Harjani
2024-02-26  8:58       ` John Garry
2024-02-26  9:13         ` Ritesh Harjani
2024-02-26  9:46           ` John Garry
2024-02-26  8:51     ` John Garry
2024-02-19 13:01 ` [PATCH v4 04/11] fs: Add initial atomic write support info to statx John Garry
2024-02-19 22:28   ` Dave Chinner
2024-02-20  9:40     ` John Garry
2024-02-20  8:20   ` Christoph Hellwig
2024-02-20  9:01     ` John Garry
2024-02-24 18:46   ` Ritesh Harjani
2024-02-26  9:07     ` John Garry
2024-02-19 13:01 ` [PATCH v4 05/11] block: Add core atomic write support John Garry
2024-02-19 22:58   ` Dave Chinner
2024-02-20  8:22     ` Christoph Hellwig
2024-02-20 10:01     ` John Garry
2024-02-25 12:09   ` Ritesh Harjani
2024-02-25 12:21     ` Ritesh Harjani
2024-02-26  9:23     ` John Garry
2024-02-19 13:01 ` [PATCH v4 06/11] block: Add atomic write support for statx John Garry
2024-02-20  8:29   ` Christoph Hellwig
2024-02-20  9:35     ` John Garry
2024-02-25 14:20   ` Ritesh Harjani
2024-02-26  9:36     ` John Garry
2024-02-19 13:01 ` [PATCH v4 07/11] block: Add fops atomic write support John Garry
2024-02-25 14:46   ` Ritesh Harjani [this message]
2024-02-26  9:46     ` John Garry
2024-02-19 13:01 ` [PATCH v4 08/11] scsi: sd: Atomic " John Garry
2024-02-19 13:01 ` [PATCH v4 09/11] scsi: scsi_debug: " John Garry
2024-02-20  7:12   ` Ojaswin Mujoo
2024-02-20  9:01     ` John Garry
2024-02-19 13:01 ` [PATCH v4 10/11] nvme: " John Garry
2024-02-19 19:21   ` Keith Busch
2024-02-20  6:55     ` Christoph Hellwig
2024-02-20  8:19       ` John Garry
2024-02-20  8:31   ` Christoph Hellwig
2024-02-20  8:50     ` John Garry
2024-02-19 13:01 ` [PATCH v4 11/11] nvme: Ensure atomic writes will be executed atomically John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cysk1u14.fsf@doe.com \
    --to=ritesh.list@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nilay@linux.ibm.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).