io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Raiber <martin@urbackup.org>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org
Subject: Re: [PATCH] btrfs: Prevent nowait or async read from doing sync IO
Date: Mon, 8 Mar 2021 19:03:30 +0000	[thread overview]
Message-ID: <0102017813390ff7-879ae32a-99ac-4c52-a01d-58c85686d2f8-000000@eu-west-1.amazonses.com> (raw)
In-Reply-To: <20210226170030.GN7604@twin.jikos.cz>

On 26.02.2021 18:00 David Sterba wrote:
> On Fri, Jan 08, 2021 at 12:02:48AM +0000, Martin Raiber wrote:
>> When reading from btrfs file via io_uring I get following
>> call traces:
>>
>> [<0>] wait_on_page_bit+0x12b/0x270
>> [<0>] read_extent_buffer_pages+0x2ad/0x360
>> [<0>] btree_read_extent_buffer_pages+0x97/0x110
>> [<0>] read_tree_block+0x36/0x60
>> [<0>] read_block_for_search.isra.0+0x1a9/0x360
>> [<0>] btrfs_search_slot+0x23d/0x9f0
>> [<0>] btrfs_lookup_csum+0x75/0x170
>> [<0>] btrfs_lookup_bio_sums+0x23d/0x630
>> [<0>] btrfs_submit_data_bio+0x109/0x180
>> [<0>] submit_one_bio+0x44/0x70
>> [<0>] extent_readahead+0x37a/0x3a0
>> [<0>] read_pages+0x8e/0x1f0
>> [<0>] page_cache_ra_unbounded+0x1aa/0x1f0
>> [<0>] generic_file_buffered_read+0x3eb/0x830
>> [<0>] io_iter_do_read+0x1a/0x40
>> [<0>] io_read+0xde/0x350
>> [<0>] io_issue_sqe+0x5cd/0xed0
>> [<0>] __io_queue_sqe+0xf9/0x370
>> [<0>] io_submit_sqes+0x637/0x910
>> [<0>] __x64_sys_io_uring_enter+0x22e/0x390
>> [<0>] do_syscall_64+0x33/0x80
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Prevent those by setting IOCB_NOIO before calling
>> generic_file_buffered_read.
>>
>> Async read has the same problem. So disable that by removing
>> FMODE_BUF_RASYNC. This was added with commit
>> 8730f12b7962b21ea9ad2756abce1e205d22db84 ("btrfs: flag files as
>> supporting buffered async reads") with 5.9. Io_uring will read
>> the data via worker threads if it can't be read without sync IO
>> this way.
>>
>> Signed-off-by: Martin Raiber <martin@urbackup.org>
>> ---
>>  fs/btrfs/file.c | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 0e41459b8..8bb561f6d 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -3589,7 +3589,7 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence)
>>  
>>  static int btrfs_file_open(struct inode *inode, struct file *filp)
>>  {
>> -	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
>> +	filp->f_mode |= FMODE_NOWAIT;
>>  	return generic_file_open(inode, filp);
>>  }
>>  
>> @@ -3639,7 +3639,18 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>  			return ret;
>>  	}
>>  
>> -	return generic_file_buffered_read(iocb, to, ret);
>> +	if (iocb->ki_flags & IOCB_NOWAIT)
>> +		iocb->ki_flags |= IOCB_NOIO;
>> +
>> +	ret = generic_file_buffered_read(iocb, to, ret);
>> +
>> +	if (iocb->ki_flags & IOCB_NOWAIT) {
>> +		iocb->ki_flags &= ~IOCB_NOIO;
>> +		if (ret == 0)
>> +			ret = -EAGAIN;
>> +	}
> Christoph has some doubts about the code,
> https://lore.kernel.org/lkml/20210226051626.GA2072@lst.de/
>
> The patch has been in for-next but as I'm not sure it's correct and
> don't have a reproducer, I'll remove it again. We do want to fix the
> warning, maybe there's only something trivial missing but we need to be
> sure, I don't have enough expertise here.

The general gist of the critism is kind of correct. It is generic_file_buffered_read/filemap_read that handles the IOCB_NOIO, however. It is only used from gfs2 since 5.8 and IOCB_NOIO was added to 5.8 with 41da51bce36f44eefc1e3d0f47d18841cbd065ba ....

However, I cannot see how to find out if readahead was called with IOCB_NOWAIT from extent_readahead/btrfs_readahead/readahead_control. So add an additional parameter to address_space_operations.readahead ? As mentioned, not too relevant to btrfs (because of the CRC calculation), but making readahead async in all cases (incl. IOCB_WAITQ) would be the proper solution.

W.r.t. testing: The most low-effort way I can think of is to add an io_uring switch to xfs_io, so that xfstests can be run using io_uring (where possible). Then check via tracing/perf that there aren't any call stacks with both io_uring_enter and wait_on_page_bit (or any other blocking call) in them.


      reply	other threads:[~2021-03-08 19:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08  0:02 [PATCH] btrfs: Prevent nowait or async read from doing sync IO Martin Raiber
2021-01-12 15:36 ` David Sterba
2021-01-12 17:01   ` Pavel Begunkov
2021-01-24 19:09     ` Martin Raiber
2021-02-26 17:00 ` David Sterba
2021-03-08 19:03   ` Martin Raiber [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0102017813390ff7-879ae32a-99ac-4c52-a01d-58c85686d2f8-000000@eu-west-1.amazonses.com \
    --to=martin@urbackup.org \
    --cc=dsterba@suse.cz \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).