On 10/12/20 11:31 PM, Hao_Xu wrote: > 在 2020/10/13 上午6:08, Jens Axboe 写道: >> On 10/12/20 3:13 PM, Matthew Wilcox wrote: >>> This one's pretty unlikely, but there's a case in buffered reads where >>> an IOCB_WAITQ read can end up sleeping. >>> >>> generic_file_buffered_read(): >>> page = find_get_page(mapping, index); >>> ... >>> if (!PageUptodate(page)) { >>> ... >>> if (iocb->ki_flags & IOCB_WAITQ) { >>> ... >>> error = wait_on_page_locked_async(page, >>> iocb->ki_waitq); >>> wait_on_page_locked_async(): >>> if (!PageLocked(page)) >>> return 0; >>> (back to generic_file_buffered_read): >>> if (!mapping->a_ops->is_partially_uptodate(page, >>> offset, iter->count)) >>> goto page_not_up_to_date_locked; >>> >>> page_not_up_to_date_locked: >>> if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) { >>> unlock_page(page); >>> put_page(page); >>> goto would_block; >>> } >>> ... >>> error = mapping->a_ops->readpage(filp, page); >>> (will unlock page on I/O completion) >>> if (!PageUptodate(page)) { >>> error = lock_page_killable(page); >>> >>> So if we have IOCB_WAITQ set but IOCB_NOWAIT clear, we'll call ->readpage() >>> and wait for the I/O to complete. I can't quite figure out if this is >>> intentional -- I think not; if I understand the semantics right, we >>> should be returning -EIOCBQUEUED and punting to an I/O thread to >>> kick off the I/O and wait. >>> >>> I think the right fix is to return -EIOCBQUEUED from >>> wait_on_page_locked_async() if the page isn't locked. ie this: >>> >>> @@ -1258,7 +1258,7 @@ static int wait_on_page_locked_async(struct page *page, >>> struct wait_page_queue *wait) >>> { >>> if (!PageLocked(page)) >>> - return 0; >>> + return -EIOCBQUEUED; >>> return __wait_on_page_locked_async(compound_head(page), wait, false); >>> } >>> >>> But as I said, I'm not sure what the semantics are supposed to be. >> >> If NOWAIT isn't set, then the issue attempt is from the helper thread >> already, and IOCB_WAITQ shouldn't be set either (the latter doesn't >> matter for this discussion). So it's totally fine and expected to block >> at that point. >> >> Hmm actually, I believe that: >> >> commit c8d317aa1887b40b188ec3aaa6e9e524333caed1 >> Author: Hao Xu >> Date: Tue Sep 29 20:00:45 2020 +0800 >> >> io_uring: fix async buffered reads when readahead is disabled >> >> maybe messed up that case, so we could block off the retry-path. I'll >> take a closer look, looks like that can be the case if read-ahead is >> disabled. >> >> In general, we can only return -EIOCBQUEUED if the IO has been started >> or is in progress already. That means we can safely rely on being told >> when it's unlocked/done. If we need to block, we should be returning >> -EAGAIN, which would punt to a worker thread. >> > Hi Jens, > My undertanding of io_uring buffered reads process after the commit > c8d317aa1887b40b188ec3aaa6e9e524333caed1 has been merged is: > the first io_uring IO try is with IOCB_NOWAIT, the second retry in the > same context is with IOCB_WAITQ but without IOCB_NOWAIT. > so in Matthew's case, lock_page_async() will be called after calling > mapping->a_ops->readpage(), So it won't end up sleeping. > Actually this case is what happens when readahead is disabled or somehow > skipped for reasons like blk_cgroup_congested() returns true. And this > case is my commit c8d317aa1887b40b188e for. Well, try the patches. I agree it's not going to sleep with the previous fix, but we're definitely driving a lower utilization by not utilizing read-ahead even if disabled. Re-run your previous tests with these two applied and see what you get. -- Jens Axboe