linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	viro@zeniv.linux.org.uk, jack@suse.cz,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] fs: use READ_ONCE/WRITE_ONCE with the i_size helpers
Date: Mon, 14 Oct 2019 10:51:25 +0200	[thread overview]
Message-ID: <20191014085125.GB5939@quack2.suse.cz> (raw)
In-Reply-To: <20191011202050.8656-1-josef@toxicpanda.com>

On Fri 11-10-19 16:20:50, Josef Bacik wrote:
> I spent the last few weeks running down a weird regression in btrfs we
> were seeing in production.  It turned out to be introduced by
> 62b37622718c, which took the following
> 
> loff_t isize = i_size_read(inode);
> 
> actual_end = min_t(u64, isize, end + 1);
> 
> and turned it into
> 
> actual_end = min_t(u64, i_size_read(inode), end + 1);
> 
> The problem here is that the compiler is optimizing out the temporary
> variables used in __cmp_once, so the resulting assembly looks like this
> 
> 498             actual_end = min_t(u64, i_size_read(inode), end + 1);
>    0xffffffff814b08c1 <+145>:   48 8b 44 24 28  mov    0x28(%rsp),%rax
>    0xffffffff814b08c6 <+150>:   48 39 45 50     cmp    %rax,0x50(%rbp)
>    0xffffffff814b08ca <+154>:   48 89 c6        mov    %rax,%rsi
>    0xffffffff814b08cd <+157>:   48 0f 46 75 50  cmovbe 0x50(%rbp),%rsi
> 
> as you can see we read the value of the inode to compare, and then we
> read it a second time to assign it.
> 
> This code is simply an optimization, so there's no locking to keep
> i_size from changing, however we really need min_t to actually return
> the minimum value for these two values, which it is failing to do.
> 
> We've reverted that patch for now to fix the problem, but it's only a
> matter of time before the compiler becomes smart enough to optimize out
> the loff_t isize intermediate variable as well.
> 
> Instead we want to make it explicit that i_size_read() should only read
> the value once.  This will keep this class of problem from happening in
> the future, regardless of what the compiler chooses to do.  With this
> change we get the following assembly generated for this code
> 
> 491             actual_end = min_t(u64, i_size_read(inode), end + 1);
>    0xffffffff8148f625 <+149>:   48 8b 44 24 20  mov    0x20(%rsp),%rax
> 
> ./include/linux/compiler.h:
> 199             __READ_ONCE_SIZE;
>    0xffffffff8148f62a <+154>:   4c 8b 75 50     mov    0x50(%rbp),%r14
> 
> fs/btrfs/inode.c:
> 491             actual_end = min_t(u64, i_size_read(inode), end + 1);
>    0xffffffff8148f62e <+158>:   49 39 c6        cmp    %rax,%r14
>    0xffffffff8148f631 <+161>:   4c 0f 47 f0     cmova  %rax,%r14
> 
> and this works out properly, we only read the value once and so we won't
> trip over this problem again.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Yeah, given i_size_read() is specifically meant for unlocked access to
i_size, it makes sense to hide the READ_ONCE() in that function (can
corresponding WRITE_ONCE() in i_size_write()). So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e0d909d35763..0e3f887e2dc5 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -863,7 +863,7 @@ static inline loff_t i_size_read(const struct inode *inode)
>  	preempt_enable();
>  	return i_size;
>  #else
> -	return inode->i_size;
> +	return READ_ONCE(inode->i_size);
>  #endif
>  }
>  
> @@ -885,7 +885,7 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
>  	inode->i_size = i_size;
>  	preempt_enable();
>  #else
> -	inode->i_size = i_size;
> +	WRITE_ONCE(inode->i_size, i_size);
>  #endif
>  }
>  
> -- 
> 2.21.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2019-10-14  8:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-11 20:20 [PATCH] fs: use READ_ONCE/WRITE_ONCE with the i_size helpers Josef Bacik
2019-10-12  0:34 ` Rik van Riel
2019-10-14  8:51 ` Jan Kara [this message]
2019-10-14 12:02 ` David Sterba
2019-10-24 12:08 ` Josef Bacik
2019-10-29 17:22   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191014085125.GB5939@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).