All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Marco Elver <elver@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] ext4: fix a data race in EXT4_I(inode)->i_disksize
Date: Fri, 07 Feb 2020 10:38:50 -0500	[thread overview]
Message-ID: <1581089930.7365.20.camel@lca.pw> (raw)
In-Reply-To: <CANpmjNNqNzfMbFPGkSQgC7Q7yti30K0xcZmUsG9EtVdXsppjnw@mail.gmail.com>

On Fri, 2020-02-07 at 16:12 +0100, Marco Elver wrote:
> On Fri, 7 Feb 2020 at 15:29, Qian Cai <cai@lca.pw> wrote:
> > 
> > EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> > KCSAN,
> > 
> >  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
> > 
> >  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
> >   ext4_write_end+0x4e3/0x750 [ext4]
> >   ext4_update_i_disksize at fs/ext4/ext4.h:3032
> >   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
> >   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
> >   generic_perform_write+0x208/0x2a0
> >   ext4_buffered_write_iter+0x11f/0x210 [ext4]
> >   ext4_file_write_iter+0xce/0x9e0 [ext4]
> >   new_sync_write+0x29c/0x3b0
> >   __vfs_write+0x92/0xa0
> >   vfs_write+0x103/0x260
> >   ksys_write+0x9d/0x130
> >   __x64_sys_write+0x4c/0x60
> >   do_syscall_64+0x91/0xb47
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> >  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
> >   ext4_writepages+0x10ac/0x1d00 [ext4]
> >   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
> >   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
> >   do_writepages+0x5e/0x130
> >   __writeback_single_inode+0xeb/0xb20
> >   writeback_sb_inodes+0x429/0x900
> >   __writeback_inodes_wb+0xc4/0x150
> >   wb_writeback+0x4bd/0x870
> >   wb_workfn+0x6b4/0x960
> >   process_one_work+0x54c/0xbe0
> >   worker_thread+0x80/0x650
> >   kthread+0x1e0/0x200
> >   ret_from_fork+0x27/0x50
> > 
> >  Reported by Kernel Concurrency Sanitizer on:
> >  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
> >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >  Workqueue: writeback wb_workfn (flush-7:0)
> > 
> > Since only the read is operating as lockless (outside of the
> > "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> > adding READ_ONCE() for the read and WRITE_ONCE() for the write.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > 
> > v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.
> 
> Just a note: I keep seeing 'load tearing' mentioned as the only reason:
> 
>   - The WRITE_ONCE avoids store-tearing (and other optimizations).
> 
>   - We're not only interested in avoiding load/store tearing. There
> are plenty other compiler optimizations that can break concurrent
> code: https://lwn.net/Articles/793253/

I also realized that from that article, store tearing is strictly from multiple
concurrent writers. However, in the sense of without the WRITE_ONCE() here,
compilers could still have 2 store instructions, so

CPU0:	CPU1:
	store #1
read
	store #2

which was not mentioned in that article. I called it also load tearing, but
maybe you will call that store tearing. Do I understand correctly?

> 
> Thanks,
> -- Marco
> 
> 
> >  fs/ext4/ext4.h  | 2 +-
> >  fs/ext4/inode.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 9a2ee2428ecc..8329ccc82fa9 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -3029,7 +3029,7 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
> >                      !inode_is_locked(inode));
> >         down_write(&EXT4_I(inode)->i_data_sem);
> >         if (newsize > EXT4_I(inode)->i_disksize)
> > -               EXT4_I(inode)->i_disksize = newsize;
> > +               WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
> >         up_write(&EXT4_I(inode)->i_data_sem);
> >  }
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 3313168b680f..6f9862bf63f1 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2465,7 +2465,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
> >          * truncate are avoided by checking i_size under i_data_sem.
> >          */
> >         disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
> > -       if (disksize > EXT4_I(inode)->i_disksize) {
> > +       if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
> >                 int err2;
> >                 loff_t i_size;
> > 
> > --
> > 1.8.3.1
> > 

  parent reply	other threads:[~2020-02-07 15:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 14:29 [PATCH v2] ext4: fix a data race in EXT4_I(inode)->i_disksize Qian Cai
2020-02-07 15:12 ` Marco Elver
2020-02-07 15:25   ` Qian Cai
2020-02-07 15:38   ` Qian Cai [this message]
2020-02-07 16:08     ` Marco Elver
2020-02-20  4:16 ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1581089930.7365.20.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=adilger.kernel@dilger.ca \
    --cc=elver@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.