From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Jul 2018 13:56:16 +0200 From: Jan Kara To: Ming Lei Cc: Martin Wilck , Jens Axboe , Jan Kara , Hannes Reinecke , Johannes Thumshirn , Kent Overstreet , Christoph Hellwig , linux-block@vger.kernel.org Subject: Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Message-ID: <20180719115616.mje75jts2vekgch6@quack2.suse.cz> References: <20180718075440.GA15254@ming.t460p> <20180719093918.28876-1-mwilck@suse.com> <20180719093918.28876-3-mwilck@suse.com> <20180719110444.GC20700@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180719110444.GC20700@ming.t460p> List-ID: On Thu 19-07-18 19:04:46, Ming Lei wrote: > On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote: > > bio_iov_iter_get_pages() returns only pages for a single non-empty > > segment of the input iov_iter's iovec. This may be much less than the number > > of pages __blkdev_direct_IO_simple() is supposed to process. Call > > bio_iov_iter_get_pages() repeatedly until either the requested number > > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done, > > short writes or reads may occur for direct synchronous IOs with multiple > > iovec slots (such as generated by writev()). In that case, > > __generic_file_write_iter() falls back to buffered writes, which > > has been observed to cause data corruption in certain workloads. > > > > Note: if segments aren't page-aligned in the input iovec, this patch may > > result in multiple adjacent slots of the bi_io_vec array to reference the same > > page (the byte ranges are guaranteed to be disjunct if the preceding patch is > > applied). We haven't seen problems with that in our and the customer's > > tests. It'd be possible to detect this situation and merge bi_io_vec slots > > that refer to the same page, but I prefer to keep it simple for now. > > > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") > > Signed-off-by: Martin Wilck > > --- > > fs/block_dev.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > index 0dd87aa..41643c4 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > > > ret = bio_iov_iter_get_pages(&bio, iter); > > if (unlikely(ret)) > > - return ret; > > + goto out; > > + > > + while (ret == 0 && > > + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0) > > + ret = bio_iov_iter_get_pages(&bio, iter); > > + > > ret = bio.bi_iter.bi_size; > > > > if (iov_iter_rw(iter) == READ) { > > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > put_page(bvec->bv_page); > > } > > > > +out: > > if (vecs != inline_vecs) > > kfree(vecs); > > > > You might put the 'vecs' leak fix into another patch, and resue the > current code block for that. > > Looks all users of bio_iov_iter_get_pages() need this kind of fix, so > what do you think about the following way? No. AFAICT all the other users of bio_iov_iter_get_pages() are perfectly fine with it returning less pages and they loop appropriately. Honza -- Jan Kara SUSE Labs, CR