From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Jul 2018 12:45:51 +0200 From: Jan Kara To: Martin Wilck Cc: Jens Axboe , Ming Lei , Jan Kara , Hannes Reinecke , Johannes Thumshirn , Kent Overstreet , Christoph Hellwig , linux-block@vger.kernel.org Subject: Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Message-ID: <20180719104551.jqndys6uxgglsbfh@quack2.suse.cz> References: <20180718075440.GA15254@ming.t460p> <20180719093918.28876-1-mwilck@suse.com> <20180719093918.28876-3-mwilck@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20180719093918.28876-3-mwilck@suse.com> List-ID: On Thu 19-07-18 11:39:18, Martin Wilck wrote: > bio_iov_iter_get_pages() returns only pages for a single non-empty > segment of the input iov_iter's iovec. This may be much less than the number > of pages __blkdev_direct_IO_simple() is supposed to process. Call > bio_iov_iter_get_pages() repeatedly until either the requested number > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done, > short writes or reads may occur for direct synchronous IOs with multiple > iovec slots (such as generated by writev()). In that case, > __generic_file_write_iter() falls back to buffered writes, which > has been observed to cause data corruption in certain workloads. > > Note: if segments aren't page-aligned in the input iovec, this patch may > result in multiple adjacent slots of the bi_io_vec array to reference the same > page (the byte ranges are guaranteed to be disjunct if the preceding patch is > applied). We haven't seen problems with that in our and the customer's > tests. It'd be possible to detect this situation and merge bi_io_vec slots > that refer to the same page, but I prefer to keep it simple for now. > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") > Signed-off-by: Martin Wilck > --- > fs/block_dev.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 0dd87aa..41643c4 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > ret = bio_iov_iter_get_pages(&bio, iter); > if (unlikely(ret)) > - return ret; > + goto out; > + > + while (ret == 0 && > + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0) > + ret = bio_iov_iter_get_pages(&bio, iter); > + I have two suggestions here (posting them now in public): Condition bio.bi_vcnt < bio.bi_max_vecs should always be true - we made sure we have enough vecs for pages in iter. So I'd WARN if this isn't true. Secondly, I don't think it is good to discard error from bio_iov_iter_get_pages() here and just submit partial IO. It will again lead to part of IO being done as direct and part attempted to be done as buffered. Also the "slow" direct IO path in __blkdev_direct_IO() behaves differently - it aborts and returns error if bio_iov_iter_get_pages() ever returned error. IMO we should do the same here. Honza -- Jan Kara SUSE Labs, CR