From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:38782 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935118AbdJQKoG (ORCPT ); Tue, 17 Oct 2017 06:44:06 -0400 Date: Tue, 17 Oct 2017 12:44:02 +0200 From: Lukas Czerner To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, eguan@redhat.com, axboe@kernel.dk Subject: Re: [PATCH] fs: Avoid invalidation in interrupt context in dio_complete() Message-ID: <20171017104402.mm45va67ejxtgjta@rh_laptop> References: <1508231484-30966-1-git-send-email-lczerner@redhat.com> <20171017095356.GQ9762@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171017095356.GQ9762@quack2.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Oct 17, 2017 at 11:53:56AM +0200, Jan Kara wrote: > On Tue 17-10-17 11:11:24, Lukas Czerner wrote: > > Currently we try to defer completion of async DIO to the process context > > in case there are any mapped pages associated with the inode so that we > > can invalidate the pages when the IO completes. However the check is racy > > and the pages can be mapped afterwards. If this happens we might end up > > calling invalidate_inode_pages2_range() in dio_complete() in interrupt > > context which could sleep. This can be reproduced by generic/451. > > > > Fix this by passing the information whether we can or can't invalidate > > to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara > > for suggesting a fix. > > > > Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO") > > Signed-off-by: Lukas Czerner > > Reported-by: Eryu Guan > > Looks good to me. You can add: > > Reviewed-by: Jan Kara > > BTW, I guess you should CC someone who can merge the patch... Jens, could you please take this through your tree ? Thanks! -Lukas > > Honza > > > --- > > fs/direct-io.c | 19 +++++++++++++------ > > 1 file changed, 13 insertions(+), 6 deletions(-) > > > > diff --git a/fs/direct-io.c b/fs/direct-io.c > > index 96415c6..5632548 100644 > > --- a/fs/direct-io.c > > +++ b/fs/direct-io.c > > @@ -45,6 +45,12 @@ > > #define DIO_PAGES 64 > > > > /* > > + * Flags for dio_complete() > > + */ > > +#define DIO_COMPLETE_ASYNC 0x01 /* This is async IO */ > > +#define DIO_COMPLETE_INVALIDATE 0x02 /* Can invalidate pages */ > > + > > +/* > > * This code generally works in units of "dio_blocks". A dio_block is > > * somewhere between the hard sector size and the filesystem block size. it > > * is determined on a per-invocation basis. When talking to the filesystem > > @@ -225,7 +231,7 @@ static inline struct page *dio_get_page(struct dio *dio, > > * filesystems can use it to hold additional state between get_block calls and > > * dio_complete. > > */ > > -static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) > > +static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags) > > { > > loff_t offset = dio->iocb->ki_pos; > > ssize_t transferred = 0; > > @@ -266,7 +272,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) > > * one is a pretty crazy thing to do, so we don't support it 100%. If > > * this invalidation fails, tough, the write still worked... > > */ > > - if (ret > 0 && dio->op == REQ_OP_WRITE && > > + if (flags & DIO_COMPLETE_INVALIDATE && > > + ret > 0 && dio->op == REQ_OP_WRITE && > > dio->inode->i_mapping->nrpages) { > > err = invalidate_inode_pages2_range(dio->inode->i_mapping, > > offset >> PAGE_SHIFT, > > @@ -285,7 +292,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) > > if (!(dio->flags & DIO_SKIP_DIO_COUNT)) > > inode_dio_end(dio->inode); > > > > - if (is_async) { > > + if (flags & DIO_COMPLETE_ASYNC) { > > /* > > * generic_write_sync expects ki_pos to have been updated > > * already, but the submission path only does this for > > @@ -306,7 +313,7 @@ static void dio_aio_complete_work(struct work_struct *work) > > { > > struct dio *dio = container_of(work, struct dio, complete_work); > > > > - dio_complete(dio, 0, true); > > + dio_complete(dio, 0, DIO_COMPLETE_ASYNC | DIO_COMPLETE_INVALIDATE); > > } > > > > static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio); > > @@ -348,7 +355,7 @@ static void dio_bio_end_aio(struct bio *bio) > > queue_work(dio->inode->i_sb->s_dio_done_wq, > > &dio->complete_work); > > } else { > > - dio_complete(dio, 0, true); > > + dio_complete(dio, 0, DIO_COMPLETE_ASYNC); > > } > > } > > } > > @@ -1360,7 +1367,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > dio_await_completion(dio); > > > > if (drop_refcount(dio) == 0) { > > - retval = dio_complete(dio, retval, false); > > + retval = dio_complete(dio, retval, DIO_COMPLETE_INVALIDATE); > > } else > > BUG_ON(retval != -EIOCBQUEUED); > > > > -- > > 2.7.5 > > > -- > Jan Kara > SUSE Labs, CR