Re: Loophole in async page I/O

From: Matthew Wilcox <willy@infradead.org>
To: Hao_Xu <haoxu@linux.alibaba.com>
Cc: io-uring@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: Loophole in async page I/O
Date: Tue, 13 Oct 2020 13:01:19 +0100	[thread overview]
Message-ID: <20201013120119.GD20115@casper.infradead.org> (raw)
In-Reply-To: <6e341fd1-bd2a-7774-5323-41f3a0531295@linux.alibaba.com>

On Tue, Oct 13, 2020 at 01:13:48PM +0800, Hao_Xu wrote:
> 在 2020/10/13 上午5:13, Matthew Wilcox 写道:
> > This one's pretty unlikely, but there's a case in buffered reads where
> > an IOCB_WAITQ read can end up sleeping.
> > 
> > generic_file_buffered_read():
> >                  page = find_get_page(mapping, index);
> > ...
> >                  if (!PageUptodate(page)) {
> > ...
> >                          if (iocb->ki_flags & IOCB_WAITQ) {
> > ...
> >                                  error = wait_on_page_locked_async(page,
> >                                                                  iocb->ki_waitq);
> > wait_on_page_locked_async():
> >          if (!PageLocked(page))
> >                  return 0;
> > (back to generic_file_buffered_read):
> >                          if (!mapping->a_ops->is_partially_uptodate(page,
> >                                                          offset, iter->count))
> >                                  goto page_not_up_to_date_locked;
> > 
> > page_not_up_to_date_locked:
> >                  if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) {
> >                          unlock_page(page);
> >                          put_page(page);
> >                          goto would_block;
> >                  }
> > ...
> >                  error = mapping->a_ops->readpage(filp, page);
> > (will unlock page on I/O completion)
> >                  if (!PageUptodate(page)) {
> >                          error = lock_page_killable(page);
> > 
> > So if we have IOCB_WAITQ set but IOCB_NOWAIT clear, we'll call ->readpage()
> > and wait for the I/O to complete.  I can't quite figure out if this is
> > intentional -- I think not; if I understand the semantics right, we
> > should be returning -EIOCBQUEUED and punting to an I/O thread to
> > kick off the I/O and wait.
> > 
> > I think the right fix is to return -EIOCBQUEUED from
> > wait_on_page_locked_async() if the page isn't locked.  ie this:
> > 
> > @@ -1258,7 +1258,7 @@ static int wait_on_page_locked_async(struct page *page,
> >                                       struct wait_page_queue *wait)
> >   {
> >          if (!PageLocked(page))
> > -               return 0;
> > +               return -EIOCBQUEUED;
> >          return __wait_on_page_locked_async(compound_head(page), wait, false);
> >   }
> > But as I said, I'm not sure what the semantics are supposed to be.
> > 
> Hi Matthew,
> which kernel version are you use, I believe I've fixed this case in the
> commit c8d317aa1887b40b188ec3aaa6e9e524333caed1

Ah, I don't have that commit in my tree.

Nevertheless, there is still a problem.  The ->readpage implementation
is not required to execute asynchronously.  For example, it may enter
page reclaim by using GFP_KERNEL.  Indeed, I feel it is better if it
works synchronously as it can then report the actual error from an I/O
instead of the almost-meaningless -EIO.

This patch series documents 12 filesystems which implement ->readpage
in a synchronous way today (for at least some cases) and converts iomap
to be synchronous (making two more filesystems synchronous).

https://lore.kernel.org/linux-fsdevel/20201009143104.22673-1-willy@infradead.org/