Loophole in async page I/O

From: Matthew Wilcox <willy@infradead.org>
To: io-uring@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>, Jens Axboe <axboe@kernel.dk>
Subject: Loophole in async page I/O
Date: Mon, 12 Oct 2020 22:13:55 +0100	[thread overview]
Message-ID: <20201012211355.GC20115@casper.infradead.org> (raw)

This one's pretty unlikely, but there's a case in buffered reads where
an IOCB_WAITQ read can end up sleeping.

generic_file_buffered_read():
                page = find_get_page(mapping, index);
...
                if (!PageUptodate(page)) {
...
                        if (iocb->ki_flags & IOCB_WAITQ) {
...
                                error = wait_on_page_locked_async(page,
                                                                iocb->ki_waitq);
wait_on_page_locked_async():
        if (!PageLocked(page))
                return 0;
(back to generic_file_buffered_read):
                        if (!mapping->a_ops->is_partially_uptodate(page,
                                                        offset, iter->count))
                                goto page_not_up_to_date_locked;

page_not_up_to_date_locked:
                if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) {
                        unlock_page(page);
                        put_page(page);
                        goto would_block;
                }
...
                error = mapping->a_ops->readpage(filp, page);
(will unlock page on I/O completion)
                if (!PageUptodate(page)) {
                        error = lock_page_killable(page);

So if we have IOCB_WAITQ set but IOCB_NOWAIT clear, we'll call ->readpage()
and wait for the I/O to complete.  I can't quite figure out if this is
intentional -- I think not; if I understand the semantics right, we
should be returning -EIOCBQUEUED and punting to an I/O thread to
kick off the I/O and wait.

I think the right fix is to return -EIOCBQUEUED from
wait_on_page_locked_async() if the page isn't locked.  ie this:

@@ -1258,7 +1258,7 @@ static int wait_on_page_locked_async(struct page *page,
                                     struct wait_page_queue *wait)
 {
        if (!PageLocked(page))
-               return 0;
+               return -EIOCBQUEUED;
        return __wait_on_page_locked_async(compound_head(page), wait, false);
 }
 
But as I said, I'm not sure what the semantics are supposed to be.