All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Smith <msmith626@gmail.com>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Coly Li <colyli@suse.de>, linux-bcache@vger.kernel.org
Subject: Re: [PATCH 12/12] closures: fix a race on wakeup from closure_sync
Date: Wed, 25 Nov 2020 13:10:47 -0500	[thread overview]
Message-ID: <CAH6h+heN_CVDiWDFBfE1utsk8B7u4hUTwD-kQwTy8SUawvRw+A@mail.gmail.com> (raw)
In-Reply-To: <CAH6h+heM6asTvfY8zo_-vra8cuavNih1K-jukwQt-E1UoK51eQ@mail.gmail.com>

On Tue, Nov 24, 2020 at 6:07 PM Marc Smith <msmith626@gmail.com> wrote:
>
> Hi,
>
>
> On Mon, Jul 22, 2019 at 1:51 PM Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> >
> > On Thu, Jul 18, 2019 at 03:46:46PM +0800, Coly Li wrote:
> > > On 2019/7/16 6:47 下午, Coly Li wrote:
> > > > Hi Kent,
> > > >
> > > > On 2019/6/11 3:14 上午, Kent Overstreet wrote:
> > > >> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
> > > > Acked-by: Coly Li <colyli@suse.de>
> > > >
> > > > And also I receive report for suspicious closure race condition in
> > > > bcache, and people ask for having this patch into Linux v5.3.
> > > >
> > > > So before this patch gets merged into upstream, I plan to rebase it to
> > > > drivers/md/bcache/closure.c at this moment. Of cause the author is you.
> > > >
> > > > When lib/closure.c merged into upstream, I will rebase all closure usage
> > > > from bcache to use lib/closure.{c,h}.
> > >
> > > Hi Kent,
> > >
> > > The race bug reporter replies me that the closure race bug is very rare
> > > to reproduce, after applying the patch and testing, they are not sure
> > > whether their closure race problem is fixed or not.
> > >
> > > And I notice rcu_read_lock()/rcu_read_unlock() is used here, but it is
> > > not clear to me what is the functionality of the rcu read lock in
> > > closure_sync_fn(). I believe you have reason to use the rcu stuffs here,
> > > could you please provide some hints to help me to understand the change
> > > better ?
> >
> > The race was when a thread using closure_sync() notices cl->s->done == 1 before
> > the thread calling closure_put() calls wake_up_process(). Then, it's possible
> > for that thread to return and exit just before wake_up_process() is called - so
> > we're trying to wake up a process that no longer exists.
> >
> > rcu_read_lock() is sufficient to protect against this, as there's an rcu barrier
> > somewhere in the process teardown path.
>
> Is rcu_read_lock() sufficient even in a kernel that uses
> "CONFIG_PREEMPT_NONE=y"? I ask because I hit this panic quite
> frequently and it sounds like what you described as this patch is
> supposed to fix/prevent:
> crash> bt
> PID: 0      TASK: ffff88862461ab00  CPU: 2   COMMAND: "swapper/2"
>  #0 [ffffc90000100a88] machine_kexec at ffffffff8103d6b5
>  #1 [ffffc90000100ad0] __crash_kexec at ffffffff8110d37b
>  #2 [ffffc90000100b98] crash_kexec at ffffffff8110e07d
>  #3 [ffffc90000100ba8] oops_end at ffffffff8101a9de
>  #4 [ffffc90000100bc8] no_context at ffffffff81045e99
>  #5 [ffffc90000100c30] async_page_fault at ffffffff81e010cf
>     [exception RIP: atomic_try_cmpxchg+2]
>     RIP: ffffffff810d3e3b  RSP: ffffc90000100ce8  RFLAGS: 00010046
>     RAX: 0000000000000000  RBX: 0000000000000003  RCX: 0000000080080007
>     RDX: 0000000000000001  RSI: ffffc90000100cf4  RDI: 0000000100000a5b
>     RBP: 00000000ffffffff   R8: 0000000000000001   R9: ffffffffa0422d4e
>     R10: ffff888532779000  R11: ffff888532779000  R12: 0000000000000046
>     R13: 0000000000000000  R14: ffff88858a062000  R15: 0000000100000a5b
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffffc90000100ce8] _raw_spin_lock_irqsave at ffffffff81cf7d7d
>  #7 [ffffc90000100d08] try_to_wake_up at ffffffff810c1624
>  #8 [ffffc90000100d58] closure_sync_fn at ffffffffa0419b07 [bcache]
>  #9 [ffffc90000100d60] clone_endio at ffffffff81aac48c
> #10 [ffffc90000100d90] call_bio_endio at ffffffff81a78e20
> #11 [ffffc90000100da8] raid_end_bio_io at ffffffff81a78e69
> #12 [ffffc90000100dd8] raid1_end_write_request at ffffffff81a79ad9
> #13 [ffffc90000100e48] blk_update_request at ffffffff814c3ab1
> #14 [ffffc90000100e88] blk_mq_end_request at ffffffff814caaf2
> #15 [ffffc90000100ea0] blk_mq_complete_request at ffffffff814c91c1
> #16 [ffffc90000100ec8] nvme_complete_cqes at ffffffffa002fb03 [nvme]
> #17 [ffffc90000100f08] nvme_irq at ffffffffa002fb7f [nvme]
> #18 [ffffc90000100f30] __handle_irq_event_percpu at ffffffff810e0d60
> #19 [ffffc90000100f70] handle_irq_event_percpu at ffffffff810e0e65
> #20 [ffffc90000100f98] handle_irq_event at ffffffff810e0ecb
> #21 [ffffc90000100fb0] handle_edge_irq at ffffffff810e494d
> #22 [ffffc90000100fc8] do_IRQ at ffffffff81e01900
> --- <IRQ stack> ---
> #23 [ffffc90000077e38] ret_from_intr at ffffffff81e00a0f
>     [exception RIP: default_idle+24]
>     RIP: ffffffff81cf7938  RSP: ffffc90000077ee0  RFLAGS: 00000246
>     RAX: 0000000000000000  RBX: ffff88862461ab00  RCX: ffff888627a96000
>     RDX: 0000000000000001  RSI: 0000000000000002  RDI: 0000000000000001
>     RBP: 0000000000000000   R8: 000000cd42e4dffb   R9: 00023d75261d1b20
>     R10: 00000000000001f0  R11: 0000000000000000  R12: 0000000000000000
>     R13: 0000000000000002  R14: 0000000000000000  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffdb  CS: 0010  SS: 0018
> #24 [ffffc90000077ee0] do_idle at ffffffff810c4ee4
> #25 [ffffc90000077f20] cpu_startup_entry at ffffffff810c514f
> #26 [ffffc90000077f30] start_secondary at ffffffff81037045
> #27 [ffffc90000077f50] secondary_startup_64 at ffffffff810000d4
> crash> dis closure_sync_fn
> 0xffffffffa0419af4 <closure_sync_fn>:   mov    0x8(%rdi),%rax
> 0xffffffffa0419af8 <closure_sync_fn+4>: movl   $0x1,0x8(%rax)
> 0xffffffffa0419aff <closure_sync_fn+11>:        mov    (%rax),%rdi
> 0xffffffffa0419b02 <closure_sync_fn+14>:        callq
> 0xffffffff810c185f <wake_up_process>
> 0xffffffffa0419b07 <closure_sync_fn+19>:        retq
> crash>
>
> I'm using Linux 5.4.69.

Alright, I was a bit overzealous with that statement... it's nothing
to do with CONFIG_PREEMPT_NONE / RCU... the problem is it seems GCC is
optimizing the code somehow and changing the assembly from what is
expected in looking at the code:

I'm using GCC 9.1.0; when examining closure_sync_fn() it looks like this:
0000000000008af4 <closure_sync_fn>:
    8af4:       48 8b 47 08             mov    0x8(%rdi),%rax
    8af8:       c7 40 08 01 00 00 00    movl   $0x1,0x8(%rax)
    8aff:       48 8b 38                mov    (%rax),%rdi
    8b02:       e8 00 00 00 00          callq  8b07 <closure_sync_fn+0x13>
    8b07:       c3                      retq

This is what the function looks like in bcache:
static void closure_sync_fn(struct closure *cl)
{
        struct closure_syncer *s = cl->s;
        struct task_struct *p;

        rcu_read_lock();
        p = READ_ONCE(s->task);
        s->done = 1;
        wake_up_process(p);
        rcu_read_unlock();
}

In the assembly shown above "p = READ_ONCE(s->task);" and "s->done =
1;" are swapped, making them effectively in this order:
s->done = 1;
p = READ_ONCE(s->task);

I believe the crash is then caused because in some cases p is invalid
since it finishes in another thread (because done was set to 1 before
reading the value for p), then we call wake_up_process() with a bad p
value. Please let me know if I'm misunderstanding that.

I applied the following patch to bcache in 5.4.69:
--- closure.c.orig 2020-11-25 01:03:03.899962395 -0500
+++ closure.c 2020-11-24 23:59:29.555465288 -0500
@@ -110,7 +110,7 @@

  rcu_read_lock();
  p = READ_ONCE(s->task);
- s->done = 1;
+ WRITE_ONCE(s->done, 1);
  wake_up_process(p);
  rcu_read_unlock();
 }
@@ -123,8 +123,10 @@
  continue_at(cl, closure_sync_fn, NULL);

  while (1) {
+ int done;
  set_current_state(TASK_UNINTERRUPTIBLE);
- if (s.done)
+ done = READ_ONCE(s.done);
+ if (done)
  break;
  schedule();
  }


(I modified __closure_sync() as well to be sure using READ_ONCE() but
that may not be needed.)


Then after applying that change and building with GCC 9.1.0 the
closure_sync_fn() function looks like this (correct):
0000000000008af4 <closure_sync_fn>:
    8af4:       48 8b 47 08             mov    0x8(%rdi),%rax
    8af8:       48 8b 38                mov    (%rax),%rdi
    8afb:       c7 40 08 01 00 00 00    movl   $0x1,0x8(%rax)
    8b02:       e8 00 00 00 00          callq  8b07 <closure_sync_fn+0x13>
    8b07:       c3                      retq


I built with an older GCC version and do not see this line swap with
the binary it produces.


--Marc


>
> --Marc

  reply	other threads:[~2020-11-25 18:11 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 19:14 bcachefs status update (it's done cooking; let's get this sucker merged) Kent Overstreet
2019-06-10 19:14 ` [PATCH 01/12] Compiler Attributes: add __flatten Kent Overstreet
2019-06-12 17:16   ` Greg KH
2019-06-10 19:14 ` [PATCH 02/12] locking: SIX locks (shared/intent/exclusive) Kent Overstreet
2019-06-10 19:14 ` [PATCH 03/12] mm: pagecache add lock Kent Overstreet
2019-06-10 19:14 ` [PATCH 04/12] mm: export find_get_pages() Kent Overstreet
2019-06-10 19:14 ` [PATCH 05/12] fs: insert_inode_locked2() Kent Overstreet
2019-06-10 19:14 ` [PATCH 06/12] fs: factor out d_mark_tmpfile() Kent Overstreet
2019-06-10 19:14 ` [PATCH 07/12] Propagate gfp_t when allocating pte entries from __vmalloc Kent Overstreet
2019-06-10 19:14 ` [PATCH 08/12] block: Add some exports for bcachefs Kent Overstreet
2019-06-10 19:14 ` [PATCH 09/12] bcache: optimize continue_at_nobarrier() Kent Overstreet
2019-06-10 19:14 ` [PATCH 10/12] bcache: move closures to lib/ Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-13  7:28   ` Christoph Hellwig
2019-06-13 11:04     ` Kent Overstreet
2019-06-10 19:14 ` [PATCH 11/12] closures: closure_wait_event() Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-12 17:17   ` Greg KH
2019-06-10 19:14 ` [PATCH 12/12] closures: fix a race on wakeup from closure_sync Kent Overstreet
2019-07-16 10:47   ` Coly Li
2019-07-18  7:46     ` Coly Li
2019-07-22 17:22       ` Kent Overstreet
2020-11-24 23:07         ` Marc Smith
2020-11-25 18:10           ` Marc Smith [this message]
2019-06-10 20:46 ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11  1:17   ` Kent Overstreet
2019-06-11  4:33     ` Dave Chinner
2019-06-12 16:21       ` Kent Overstreet
2019-06-12 23:02         ` Dave Chinner
2019-06-13 18:36           ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-13 21:13             ` Andreas Dilger
2019-06-13 21:21               ` Kent Overstreet
2019-06-14  0:35                 ` Dave Chinner
2019-06-13 23:55             ` Dave Chinner
2019-06-14  2:30               ` Linus Torvalds
2019-06-14  7:30                 ` Dave Chinner
2019-06-15  1:15                   ` Linus Torvalds
2019-06-14  3:08               ` Linus Torvalds
2019-06-15  4:01                 ` Linus Torvalds
2019-06-17 22:47                   ` Dave Chinner
2019-06-17 23:38                     ` Linus Torvalds
2019-06-18  4:21                       ` Amir Goldstein
2019-06-19 10:38                         ` Jan Kara
2019-06-19 22:37                           ` Dave Chinner
2019-07-03  0:04                             ` pagecache locking Boaz Harrosh
     [not found]                               ` <DM6PR19MB250857CB8A3A1C8279D6F2F3C5FB0@DM6PR19MB2508.namprd19.prod.outlook.com>
2019-07-03  1:25                                 ` Boaz Harrosh
2019-07-05 23:31                               ` Dave Chinner
2019-07-07 15:05                                 ` Boaz Harrosh
2019-07-07 23:55                                   ` Dave Chinner
2019-07-08 13:31                                 ` Jan Kara
2019-07-09 23:47                                   ` Dave Chinner
2019-07-10  8:41                                     ` Jan Kara
2019-06-14 17:08               ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-19  8:21           ` bcachefs status update (it's done cooking; let's get this sucker merged) Jan Kara
2019-07-03  1:04             ` [PATCH] mm: Support madvise_willneed override by Filesystems Boaz Harrosh
2019-07-03 17:21               ` Jan Kara
2019-07-03 18:03                 ` Boaz Harrosh
2019-06-11  4:55     ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11 14:26       ` Matthew Wilcox
2019-06-11  4:10   ` Dave Chinner
2019-06-11  4:39     ` Linus Torvalds
2019-06-11  7:10       ` Dave Chinner
2019-06-12  2:07         ` Linus Torvalds
2019-07-03  5:59 ` Stefan K
2020-06-10 11:02 ` Stefan K
2020-06-10 11:32   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH6h+heN_CVDiWDFBfE1utsk8B7u4hUTwD-kQwTy8SUawvRw+A@mail.gmail.com \
    --to=msmith626@gmail.com \
    --cc=colyli@suse.de \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.