From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Jul 2018 20:20:51 +0800 From: Ming Lei To: Jan Kara Cc: Martin Wilck , Jens Axboe , Jan Kara , Hannes Reinecke , Johannes Thumshirn , Kent Overstreet , Christoph Hellwig , linux-block@vger.kernel.org Subject: Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Message-ID: <20180719122050.GA5306@ming.t460p> References: <20180718075440.GA15254@ming.t460p> <20180719093918.28876-1-mwilck@suse.com> <20180719093918.28876-3-mwilck@suse.com> <20180719110444.GC20700@ming.t460p> <20180719115616.mje75jts2vekgch6@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180719115616.mje75jts2vekgch6@quack2.suse.cz> List-ID: On Thu, Jul 19, 2018 at 01:56:16PM +0200, Jan Kara wrote: > On Thu 19-07-18 19:04:46, Ming Lei wrote: > > On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote: > > > bio_iov_iter_get_pages() returns only pages for a single non-empty > > > segment of the input iov_iter's iovec. This may be much less than the number > > > of pages __blkdev_direct_IO_simple() is supposed to process. Call > > > bio_iov_iter_get_pages() repeatedly until either the requested number > > > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done, > > > short writes or reads may occur for direct synchronous IOs with multiple > > > iovec slots (such as generated by writev()). In that case, > > > __generic_file_write_iter() falls back to buffered writes, which > > > has been observed to cause data corruption in certain workloads. > > > > > > Note: if segments aren't page-aligned in the input iovec, this patch may > > > result in multiple adjacent slots of the bi_io_vec array to reference the same > > > page (the byte ranges are guaranteed to be disjunct if the preceding patch is > > > applied). We haven't seen problems with that in our and the customer's > > > tests. It'd be possible to detect this situation and merge bi_io_vec slots > > > that refer to the same page, but I prefer to keep it simple for now. > > > > > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") > > > Signed-off-by: Martin Wilck > > > --- > > > fs/block_dev.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > > index 0dd87aa..41643c4 100644 > > > --- a/fs/block_dev.c > > > +++ b/fs/block_dev.c > > > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > > > > > ret = bio_iov_iter_get_pages(&bio, iter); > > > if (unlikely(ret)) > > > - return ret; > > > + goto out; > > > + > > > + while (ret == 0 && > > > + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0) > > > + ret = bio_iov_iter_get_pages(&bio, iter); > > > + > > > ret = bio.bi_iter.bi_size; > > > > > > if (iov_iter_rw(iter) == READ) { > > > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > > put_page(bvec->bv_page); > > > } > > > > > > +out: > > > if (vecs != inline_vecs) > > > kfree(vecs); > > > > > > > You might put the 'vecs' leak fix into another patch, and resue the > > current code block for that. > > > > Looks all users of bio_iov_iter_get_pages() need this kind of fix, so > > what do you think about the following way? > > No. AFAICT all the other users of bio_iov_iter_get_pages() are perfectly > fine with it returning less pages and they loop appropriately. OK, but this way still may make one bio to hold more data, especially the comment of bio_iov_iter_get_pages() says that 'Pins as many pages from *iter', so looks it is the correct way to do. thanks, Ming