linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	viro@ZenIV.linux.org.uk, jack@suse.cz,
	linux-btrfs@vger.kernel.org
Subject: [PATCH] fs: use READ_ONCE/WRITE_ONCE with the i_size helpers
Date: Fri, 11 Oct 2019 16:20:50 -0400	[thread overview]
Message-ID: <20191011202050.8656-1-josef@toxicpanda.com> (raw)

I spent the last few weeks running down a weird regression in btrfs we
were seeing in production.  It turned out to be introduced by
62b37622718c, which took the following

loff_t isize = i_size_read(inode);

actual_end = min_t(u64, isize, end + 1);

and turned it into

actual_end = min_t(u64, i_size_read(inode), end + 1);

The problem here is that the compiler is optimizing out the temporary
variables used in __cmp_once, so the resulting assembly looks like this

498             actual_end = min_t(u64, i_size_read(inode), end + 1);
   0xffffffff814b08c1 <+145>:   48 8b 44 24 28  mov    0x28(%rsp),%rax
   0xffffffff814b08c6 <+150>:   48 39 45 50     cmp    %rax,0x50(%rbp)
   0xffffffff814b08ca <+154>:   48 89 c6        mov    %rax,%rsi
   0xffffffff814b08cd <+157>:   48 0f 46 75 50  cmovbe 0x50(%rbp),%rsi

as you can see we read the value of the inode to compare, and then we
read it a second time to assign it.

This code is simply an optimization, so there's no locking to keep
i_size from changing, however we really need min_t to actually return
the minimum value for these two values, which it is failing to do.

We've reverted that patch for now to fix the problem, but it's only a
matter of time before the compiler becomes smart enough to optimize out
the loff_t isize intermediate variable as well.

Instead we want to make it explicit that i_size_read() should only read
the value once.  This will keep this class of problem from happening in
the future, regardless of what the compiler chooses to do.  With this
change we get the following assembly generated for this code

491             actual_end = min_t(u64, i_size_read(inode), end + 1);
   0xffffffff8148f625 <+149>:   48 8b 44 24 20  mov    0x20(%rsp),%rax

./include/linux/compiler.h:
199             __READ_ONCE_SIZE;
   0xffffffff8148f62a <+154>:   4c 8b 75 50     mov    0x50(%rbp),%r14

fs/btrfs/inode.c:
491             actual_end = min_t(u64, i_size_read(inode), end + 1);
   0xffffffff8148f62e <+158>:   49 39 c6        cmp    %rax,%r14
   0xffffffff8148f631 <+161>:   4c 0f 47 f0     cmova  %rax,%r14

and this works out properly, we only read the value once and so we won't
trip over this problem again.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 include/linux/fs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..0e3f887e2dc5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -863,7 +863,7 @@ static inline loff_t i_size_read(const struct inode *inode)
 	preempt_enable();
 	return i_size;
 #else
-	return inode->i_size;
+	return READ_ONCE(inode->i_size);
 #endif
 }
 
@@ -885,7 +885,7 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
 	inode->i_size = i_size;
 	preempt_enable();
 #else
-	inode->i_size = i_size;
+	WRITE_ONCE(inode->i_size, i_size);
 #endif
 }
 
-- 
2.21.0


             reply	other threads:[~2019-10-11 20:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-11 20:20 Josef Bacik [this message]
2019-10-12  0:34 ` [PATCH] fs: use READ_ONCE/WRITE_ONCE with the i_size helpers Rik van Riel
2019-10-14  8:51 ` Jan Kara
2019-10-14 12:02 ` David Sterba
2019-10-24 12:08 ` Josef Bacik
2019-10-29 17:22   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191011202050.8656-1-josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).