io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Kanchan Joshi <joshi.k@samsung.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"bcrl@kvack.org" <bcrl@kvack.org>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-aio@kvack.org" <linux-aio@kvack.org>,
	"io-uring@vger.kernel.org" <io-uring@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"selvakuma.s1@samsung.com" <selvakuma.s1@samsung.com>,
	"nj.shetty@samsung.com" <nj.shetty@samsung.com>,
	"javier.gonz@samsung.com" <javier.gonz@samsung.com>
Subject: Re: [PATCH 1/3] fs,block: Introduce IOCB_ZONE_APPEND and direct-io handling
Date: Thu, 18 Jun 2020 07:16:19 +0000	[thread overview]
Message-ID: <CY4PR04MB3751810405442801C115CAEBE79B0@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 1592414619-5646-2-git-send-email-joshi.k@samsung.com

On 2020/06/18 2:27, Kanchan Joshi wrote:
> From: Selvakumar S <selvakuma.s1@samsung.com>
> 
> Introduce IOCB_ZONE_APPEND flag, which is set in kiocb->ki_flags for
> zone-append. Direct I/O submission path uses this flag to send bio with
> append op. And completion path uses the same to return zone-relative
> offset to upper layer.
> 
> Signed-off-by: SelvaKumar S <selvakuma.s1@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Javier Gonzalez <javier.gonz@samsung.com>
> ---
>  fs/block_dev.c     | 19 ++++++++++++++++++-
>  include/linux/fs.h |  1 +
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 47860e5..4c84b4d0 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -185,6 +185,10 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
>  	/* avoid the need for a I/O completion work item */
>  	if (iocb->ki_flags & IOCB_DSYNC)
>  		op |= REQ_FUA;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +	if (iocb->ki_flags & IOCB_ZONE_APPEND)
> +		op |= REQ_OP_ZONE_APPEND | REQ_NOMERGE;
> +#endif

No need for the #ifdef. And no need for the REQ_NOMERGE either since
REQ_OP_ZONE_APPEND requests are defined as not mergeable already.

>  	return op;
>  }
>  
> @@ -295,6 +299,14 @@ static int blkdev_iopoll(struct kiocb *kiocb, bool wait)
>  	return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait);
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +static inline long blkdev_bio_end_io_append(struct bio *bio)
> +{
> +	return (bio->bi_iter.bi_sector %
> +		blk_queue_zone_sectors(bio->bi_disk->queue)) << SECTOR_SHIFT;

A zone size is at most 4G sectors as defined by the queue chunk_sectors limit
(unsigned int). It means that the return value here can overflow due to the
shift, at least on 32bits arch.

And as Pavel already commented, zone sizes are power of 2 so you can use
bitmasks instead of costly divisions.

> +}
> +#endif
> +
>  static void blkdev_bio_end_io(struct bio *bio)
>  {
>  	struct blkdev_dio *dio = bio->bi_private;
> @@ -307,15 +319,20 @@ static void blkdev_bio_end_io(struct bio *bio)
>  		if (!dio->is_sync) {
>  			struct kiocb *iocb = dio->iocb;
>  			ssize_t ret;
> +			long res = 0;
>  
>  			if (likely(!dio->bio.bi_status)) {
>  				ret = dio->size;
>  				iocb->ki_pos += ret;
> +#ifdef CONFIG_BLK_DEV_ZONED
> +				if (iocb->ki_flags & IOCB_ZONE_APPEND)
> +					res = blkdev_bio_end_io_append(bio);

overflow... And no need for the #ifdef.

> +#endif
>  			} else {
>  				ret = blk_status_to_errno(dio->bio.bi_status);
>  			}
>  
> -			dio->iocb->ki_complete(iocb, ret, 0);
> +			dio->iocb->ki_complete(iocb, ret, res);

ki_complete interface also needs to be changed to have a 64bit last argument to
avoid overflow.

And this all does not work anyway because __blkdev_direct_IO() and
__blkdev_direct_IO_simple() both call bio_iov_iter_get_pages() *before*
dio_bio_write_op() is called. This means that bio_iov_iter_get_pages() will not
see that it needs to get the pages for a zone append command and so will not
call __bio_iov_append_get_pages(). That leads to BIO split and potential
corruption of the user data as fragments of the kiocb may get reordered.

There is a lot more to do to the block_dev direct IO code for this to work.


>  			if (dio->multi_bio)
>  				bio_put(&dio->bio);
>  		} else {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6c4ab4d..dc547b9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -315,6 +315,7 @@ enum rw_hint {
>  #define IOCB_SYNC		(1 << 5)
>  #define IOCB_WRITE		(1 << 6)
>  #define IOCB_NOWAIT		(1 << 7)
> +#define IOCB_ZONE_APPEND	(1 << 8)
>  
>  struct kiocb {
>  	struct file		*ki_filp;
> 


-- 
Damien Le Moal
Western Digital Research

  parent reply	other threads:[~2020-06-18  7:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200617172653epcas5p488de50090415eb802e62acc0e23d8812@epcas5p4.samsung.com>
2020-06-17 17:23 ` [PATCH 0/3] zone-append support in aio and io-uring Kanchan Joshi
     [not found]   ` <CGME20200617172702epcas5p4dbf4729d31d9a85ab1d261d04f238e61@epcas5p4.samsung.com>
2020-06-17 17:23     ` [PATCH 1/3] fs,block: Introduce IOCB_ZONE_APPEND and direct-io handling Kanchan Joshi
2020-06-17 19:02       ` Pavel Begunkov
2020-06-18  7:16       ` Damien Le Moal [this message]
2020-06-18 18:35         ` Kanchan Joshi
     [not found]   ` <CGME20200617172706epcas5p4dcbc164063f58bad95b211b9d6dfbfa9@epcas5p4.samsung.com>
2020-06-17 17:23     ` [PATCH 2/3] aio: add support for zone-append Kanchan Joshi
2020-06-18  7:33       ` Damien Le Moal
     [not found]   ` <CGME20200617172713epcas5p352f2907a12bd4ee3c97be1c7d8e1569e@epcas5p3.samsung.com>
2020-06-17 17:23     ` [PATCH 3/3] io_uring: " Kanchan Joshi
2020-06-17 18:55       ` Pavel Begunkov
2020-06-18  7:39       ` Damien Le Moal
2020-06-18  8:35         ` javier.gonz@samsung.com
2020-06-18  8:47           ` Damien Le Moal
2020-06-18  9:11             ` javier.gonz@samsung.com
2020-06-19  9:41               ` javier.gonz@samsung.com
2020-06-19 11:15                 ` Matias Bjørling
2020-06-19 14:18                   ` Jens Axboe
2020-06-19 15:14                     ` Matias Bjørling
2020-06-19 15:20                       ` Jens Axboe
2020-06-19 15:40                         ` Matias Bjørling
2020-06-19 15:44                           ` Jens Axboe
2020-06-21 18:55                             ` javier.gonz@samsung.com
2020-06-19 14:15                 ` Jens Axboe
2020-06-19 14:59                   ` Pavel Begunkov
2020-06-19 15:02                     ` Jens Axboe
2020-06-21 18:52                       ` javier.gonz@samsung.com
2020-06-17 17:42   ` [PATCH 0/3] zone-append support in aio and io-uring Matthew Wilcox
2020-06-18  6:56   ` Christoph Hellwig
2020-06-18  8:29     ` Javier González
2020-06-18 17:52     ` Kanchan Joshi
2020-06-19  3:08       ` Damien Le Moal
2020-06-19  7:56       ` Christoph Hellwig
2020-06-18  8:04   ` Matias Bjørling
2020-06-18  8:27     ` Javier González
2020-06-18  8:32       ` Matias Bjørling
2020-06-18  8:39         ` Javier González
2020-06-18  8:46           ` Matias Bjørling
2020-06-18 14:16     ` Christoph Hellwig
2020-06-18 19:21     ` Kanchan Joshi
2020-06-18 20:04       ` Matias Bjørling
2020-06-19  1:03         ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR04MB3751810405442801C115CAEBE79B0@CY4PR04MB3751.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=io-uring@vger.kernel.org \
    --cc=javier.gonz@samsung.com \
    --cc=joshi.k@samsung.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nj.shetty@samsung.com \
    --cc=selvakuma.s1@samsung.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).