linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
       [not found] <20191003015247.GI13108@magnolia>
@ 2019-10-03  6:40 ` Dave Chinner
  2019-10-03  8:41   ` Gao Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2019-10-03  6:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, tj, linux-fsdevel, linux-block

[cc linux-fsdevel, linux-block, tejun ]

On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> Hi everyone,
> 
> Does anyone /else/ see this crash in generic/299 on a V4 filesystem (tho
> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop up
> on generic/299 though only 80% of the time.
> 
> --D
> 
> [ 1806.186197] run fstests generic/299 at 2019-10-02 18:15:30
> [ 1808.279874] XFS (sdb): Mounting V4 Filesystem
> [ 1808.283519] XFS (sdb): Ending clean mount
> [ 1808.284530] XFS (sdb): Quotacheck needed: Please wait.
> [ 1808.317062] XFS (sdb): Quotacheck: Done.
> [ 1808.319821] Mounted xfs file system at /opt supports timestamps until 2038 (0x7fffffff)
> [ 1886.218794] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [ 1886.219787] #PF: supervisor read access in kernel mode
> [ 1886.220638] #PF: error_code(0x0000) - not-present page
> [ 1886.221496] PGD 0 P4D 0 
> [ 1886.221970] Oops: 0000 [#1] PREEMPT SMP
> [ 1886.222596] CPU: 2 PID: 227320 Comm: kworker/u10:2 Tainted: G        W         5.4.0-rc1-djw #rc1
> [ 1886.224016] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> [ 1886.225261] Workqueue: writeback wb_workfn (flush-8:16)
> [ 1886.225926] RIP: 0010:__lock_acquire+0x4c3/0x1490
> [ 1886.226595] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> [ 1886.229146] RSP: 0000:ffffc900052c3bc0 EFLAGS: 00010002
> [ 1886.230008] RAX: 0000000000000000 RBX: 0000000000000018 RCX: 0000000000000000
> [ 1886.231238] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
> [ 1886.236382] RBP: ffff888077f80000 R08: 0000000000000001 R09: 0000000000000001
> [ 1886.241630] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 1886.243530] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000018
> [ 1886.244669] FS:  0000000000000000(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
> [ 1886.245941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1886.246913] CR2: 0000000000000018 CR3: 0000000072f7b003 CR4: 00000000001606a0
> [ 1886.247834] Call Trace:
> [ 1886.248217]  ? mark_held_locks+0x47/0x70
> [ 1886.248810]  ? trace_hardirqs_on_thunk+0x1a/0x20
> [ 1886.249445]  lock_acquire+0x90/0x180
> [ 1886.249876]  ? __wake_up_common_lock+0x62/0xc0
> [ 1886.250577]  _raw_spin_lock_irqsave+0x3e/0x80
> [ 1886.251327]  ? __wake_up_common_lock+0x62/0xc0
> [ 1886.252538]  __wake_up_common_lock+0x62/0xc0
> [ 1886.257318]  wb_workfn+0x10e/0x610
> [ 1886.260171]  ? __lock_acquire+0x268/0x1490
> [ 1886.266124]  ? process_one_work+0x1da/0x5d0
> [ 1886.266941]  process_one_work+0x25b/0x5d0
> [ 1886.267759]  worker_thread+0x3d/0x3a0
> [ 1886.268497]  ? process_one_work+0x5d0/0x5d0
> [ 1886.269285]  kthread+0x121/0x140
> [ 1886.269808]  ? kthread_park+0x80/0x80
> [ 1886.270317]  ret_from_fork+0x3a/0x50
> [ 1886.270811] Modules linked in: xfs libcrc32c dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp bfq xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> [ 1886.274144] Dumping ftrace buffer:
> [ 1886.274637]    (ftrace buffer empty)
> [ 1886.275129] CR2: 0000000000000018
> [ 1886.275567] ---[ end trace 20db199015efe614 ]---
> [ 1886.278601] RIP: 0010:__lock_acquire+0x4c3/0x1490
> [ 1886.283408] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> 

generic/270 w/ xfs-dax on pmem triggers it within 3 runs every time
here. Bisect points to this commit:

$ git bisect bad
5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d is the first bad commit
commit 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Aug 26 09:06:52 2019 -0700

    writeback: Generalize and expose wb_completion
    
    wb_completion is used to track writeback completions.  We want to use
    it from memcg side for foreign inode flushes.  This patch updates it
    to remember the target waitq instead of assuming bdi->wb_waitq and
    expose it outside of fs-writeback.c.
    
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

 fs/fs-writeback.c                | 47 ++++++++++++----------------------------
 include/linux/backing-dev-defs.h | 20 +++++++++++++++++
 include/linux/backing-dev.h      |  2 ++
 3 files changed, 36 insertions(+), 33 deletions(-)


$ git bisect log
git bisect start
# bad: [3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7] writeback: add tracepoints for cgroup foreign writebacks
git bisect bad 3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7
# good: [b8e24a9300b0836a9d39f6b20746766b3b81f1bd] block: annotate refault stalls from IO submission
git bisect good b8e24a9300b0836a9d39f6b20746766b3b81f1bd
# bad: [ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9] writeback: Separate out wb_get_lookup() from wb_get_create()
git bisect bad ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9
# good: [320ea869a12cec206756207c6ca5f817ec45c7f2] block: improve the gap check in __bio_add_pc_page
git bisect good 320ea869a12cec206756207c6ca5f817ec45c7f2
# good: [7ea88e229e9df18ecd624b0d39f3dba87432ba33] null_blk: create a helper for mem-backed ops
git bisect good 7ea88e229e9df18ecd624b0d39f3dba87432ba33
# good: [38b4e09fbccab6457536563823222921c49601bb] null_blk: fix inline misuse
git bisect good 38b4e09fbccab6457536563823222921c49601bb
# bad: [34f8fe501f0624de115d087680c84000b5d9abc9] bdi: Add bdi->id
git bisect bad 34f8fe501f0624de115d087680c84000b5d9abc9
# bad: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completion
git bisect bad 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
# first bad commit: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completio
$

Not obvious to me what is wrong with that commit right now, but the
bisect is solid. Kinda surprised to see such significant
fs-writeback changes in 5.4, though, because there was nothing sent
to the -fsdevel list for review in the last dev cycle.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03  6:40 ` [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?) Dave Chinner
@ 2019-10-03  8:41   ` Gao Xiang
  2019-10-03 14:01     ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2019-10-03  8:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, xfs, tj, linux-fsdevel, linux-block

Hi,

On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> [cc linux-fsdevel, linux-block, tejun ]
> 
> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> > Hi everyone,
> > 
> > Does anyone /else/ see this crash in generic/299 on a V4 filesystem (tho
> > afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop up
> > on generic/299 though only 80% of the time.
> > 
> > --D
> > 
> > [ 1806.186197] run fstests generic/299 at 2019-10-02 18:15:30
> > [ 1808.279874] XFS (sdb): Mounting V4 Filesystem
> > [ 1808.283519] XFS (sdb): Ending clean mount
> > [ 1808.284530] XFS (sdb): Quotacheck needed: Please wait.
> > [ 1808.317062] XFS (sdb): Quotacheck: Done.
> > [ 1808.319821] Mounted xfs file system at /opt supports timestamps until 2038 (0x7fffffff)
> > [ 1886.218794] BUG: kernel NULL pointer dereference, address: 0000000000000018
> > [ 1886.219787] #PF: supervisor read access in kernel mode
> > [ 1886.220638] #PF: error_code(0x0000) - not-present page
> > [ 1886.221496] PGD 0 P4D 0 
> > [ 1886.221970] Oops: 0000 [#1] PREEMPT SMP
> > [ 1886.222596] CPU: 2 PID: 227320 Comm: kworker/u10:2 Tainted: G        W         5.4.0-rc1-djw #rc1
> > [ 1886.224016] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> > [ 1886.225261] Workqueue: writeback wb_workfn (flush-8:16)
> > [ 1886.225926] RIP: 0010:__lock_acquire+0x4c3/0x1490
> > [ 1886.226595] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> > [ 1886.229146] RSP: 0000:ffffc900052c3bc0 EFLAGS: 00010002
> > [ 1886.230008] RAX: 0000000000000000 RBX: 0000000000000018 RCX: 0000000000000000
> > [ 1886.231238] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
> > [ 1886.236382] RBP: ffff888077f80000 R08: 0000000000000001 R09: 0000000000000001
> > [ 1886.241630] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [ 1886.243530] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000018
> > [ 1886.244669] FS:  0000000000000000(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
> > [ 1886.245941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1886.246913] CR2: 0000000000000018 CR3: 0000000072f7b003 CR4: 00000000001606a0
> > [ 1886.247834] Call Trace:
> > [ 1886.248217]  ? mark_held_locks+0x47/0x70
> > [ 1886.248810]  ? trace_hardirqs_on_thunk+0x1a/0x20
> > [ 1886.249445]  lock_acquire+0x90/0x180
> > [ 1886.249876]  ? __wake_up_common_lock+0x62/0xc0
> > [ 1886.250577]  _raw_spin_lock_irqsave+0x3e/0x80
> > [ 1886.251327]  ? __wake_up_common_lock+0x62/0xc0
> > [ 1886.252538]  __wake_up_common_lock+0x62/0xc0
> > [ 1886.257318]  wb_workfn+0x10e/0x610
> > [ 1886.260171]  ? __lock_acquire+0x268/0x1490
> > [ 1886.266124]  ? process_one_work+0x1da/0x5d0
> > [ 1886.266941]  process_one_work+0x25b/0x5d0
> > [ 1886.267759]  worker_thread+0x3d/0x3a0
> > [ 1886.268497]  ? process_one_work+0x5d0/0x5d0
> > [ 1886.269285]  kthread+0x121/0x140
> > [ 1886.269808]  ? kthread_park+0x80/0x80
> > [ 1886.270317]  ret_from_fork+0x3a/0x50
> > [ 1886.270811] Modules linked in: xfs libcrc32c dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp bfq xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > [ 1886.274144] Dumping ftrace buffer:
> > [ 1886.274637]    (ftrace buffer empty)
> > [ 1886.275129] CR2: 0000000000000018
> > [ 1886.275567] ---[ end trace 20db199015efe614 ]---
> > [ 1886.278601] RIP: 0010:__lock_acquire+0x4c3/0x1490
> > [ 1886.283408] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> > 

Just a quick glance, I guess there could is a race between (complete guess):


 160 static void finish_writeback_work(struct bdi_writeback *wb,
 161                                   struct wb_writeback_work *work)
 162 {
 163         struct wb_completion *done = work->done;
 164 
 165         if (work->auto_free)
 166                 kfree(work);
 167         if (done && atomic_dec_and_test(&done->cnt))

 ^^^ here

 168                 wake_up_all(done->waitq);
 169 }

since new wake_up_all(done->waitq); is completely on-stack,
 	if (done && atomic_dec_and_test(&done->cnt))
-		wake_up_all(&wb->bdi->wb_waitq);
+		wake_up_all(done->waitq);
 }

which could cause use after free if on-stack wb_completion is gone...
(however previous wb->bdi is solid since it is not on-stack)

see generic on-stack completion which takes a wait_queue spin_lock between
test and wake_up...

If I am wrong, ignore me, hmm...

Thanks,
Gao Xiang

> 
> generic/270 w/ xfs-dax on pmem triggers it within 3 runs every time
> here. Bisect points to this commit:
> 
> $ git bisect bad
> 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d is the first bad commit
> commit 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
> Author: Tejun Heo <tj@kernel.org>
> Date:   Mon Aug 26 09:06:52 2019 -0700
> 
>     writeback: Generalize and expose wb_completion
>     
>     wb_completion is used to track writeback completions.  We want to use
>     it from memcg side for foreign inode flushes.  This patch updates it
>     to remember the target waitq instead of assuming bdi->wb_waitq and
>     expose it outside of fs-writeback.c.
>     
>     Reviewed-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Tejun Heo <tj@kernel.org>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
>  fs/fs-writeback.c                | 47 ++++++++++++----------------------------
>  include/linux/backing-dev-defs.h | 20 +++++++++++++++++
>  include/linux/backing-dev.h      |  2 ++
>  3 files changed, 36 insertions(+), 33 deletions(-)
> 
> 
> $ git bisect log
> git bisect start
> # bad: [3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7] writeback: add tracepoints for cgroup foreign writebacks
> git bisect bad 3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7
> # good: [b8e24a9300b0836a9d39f6b20746766b3b81f1bd] block: annotate refault stalls from IO submission
> git bisect good b8e24a9300b0836a9d39f6b20746766b3b81f1bd
> # bad: [ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9] writeback: Separate out wb_get_lookup() from wb_get_create()
> git bisect bad ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9
> # good: [320ea869a12cec206756207c6ca5f817ec45c7f2] block: improve the gap check in __bio_add_pc_page
> git bisect good 320ea869a12cec206756207c6ca5f817ec45c7f2
> # good: [7ea88e229e9df18ecd624b0d39f3dba87432ba33] null_blk: create a helper for mem-backed ops
> git bisect good 7ea88e229e9df18ecd624b0d39f3dba87432ba33
> # good: [38b4e09fbccab6457536563823222921c49601bb] null_blk: fix inline misuse
> git bisect good 38b4e09fbccab6457536563823222921c49601bb
> # bad: [34f8fe501f0624de115d087680c84000b5d9abc9] bdi: Add bdi->id
> git bisect bad 34f8fe501f0624de115d087680c84000b5d9abc9
> # bad: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completion
> git bisect bad 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
> # first bad commit: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completio
> $
> 
> Not obvious to me what is wrong with that commit right now, but the
> bisect is solid. Kinda surprised to see such significant
> fs-writeback changes in 5.4, though, because there was nothing sent
> to the -fsdevel list for review in the last dev cycle.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03  8:41   ` Gao Xiang
@ 2019-10-03 14:01     ` Chris Mason
  2019-10-03 14:05       ` Jens Axboe
  2019-10-03 14:40       ` Gao Xiang
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Mason @ 2019-10-03 14:01 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Dave Chinner, Darrick J. Wong, xfs, tj, linux-fsdevel, linux-block



On 3 Oct 2019, at 4:41, Gao Xiang wrote:

> Hi,
>
> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
>> [cc linux-fsdevel, linux-block, tejun ]
>>
>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
>>> Hi everyone,
>>>
>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem 
>>> (tho
>>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop 
>>> up
>>> on generic/299 though only 80% of the time.
>>>
>
> Just a quick glance, I guess there could is a race between (complete 
> guess):
>
>
>  160 static void finish_writeback_work(struct bdi_writeback *wb,
>  161                                   struct wb_writeback_work *work)
>  162 {
>  163         struct wb_completion *done = work->done;
>  164
>  165         if (work->auto_free)
>  166                 kfree(work);
>  167         if (done && atomic_dec_and_test(&done->cnt))
>
>  ^^^ here
>
>  168                 wake_up_all(done->waitq);
>  169 }
>
> since new wake_up_all(done->waitq); is completely on-stack,
>  	if (done && atomic_dec_and_test(&done->cnt))
> -		wake_up_all(&wb->bdi->wb_waitq);
> +		wake_up_all(done->waitq);
>  }
>
> which could cause use after free if on-stack wb_completion is gone...
> (however previous wb->bdi is solid since it is not on-stack)
>
> see generic on-stack completion which takes a wait_queue spin_lock 
> between
> test and wake_up...
>
> If I am wrong, ignore me, hmm...

It's a good guess ;)  Jens should have this queued up already:

https://lkml.org/lkml/2019/9/23/972

-chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03 14:01     ` Chris Mason
@ 2019-10-03 14:05       ` Jens Axboe
  2019-10-03 18:37         ` Darrick J. Wong
  2019-10-03 14:40       ` Gao Xiang
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2019-10-03 14:05 UTC (permalink / raw)
  To: Chris Mason, Gao Xiang
  Cc: Dave Chinner, Darrick J. Wong, xfs, tj, linux-fsdevel, linux-block

On 10/3/19 8:01 AM, Chris Mason wrote:
> 
> 
> On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> 
>> Hi,
>>
>> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
>>> [cc linux-fsdevel, linux-block, tejun ]
>>>
>>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
>>>> Hi everyone,
>>>>
>>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
>>>> (tho
>>>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop
>>>> up
>>>> on generic/299 though only 80% of the time.
>>>>
>>
>> Just a quick glance, I guess there could is a race between (complete
>> guess):
>>
>>
>>   160 static void finish_writeback_work(struct bdi_writeback *wb,
>>   161                                   struct wb_writeback_work *work)
>>   162 {
>>   163         struct wb_completion *done = work->done;
>>   164
>>   165         if (work->auto_free)
>>   166                 kfree(work);
>>   167         if (done && atomic_dec_and_test(&done->cnt))
>>
>>   ^^^ here
>>
>>   168                 wake_up_all(done->waitq);
>>   169 }
>>
>> since new wake_up_all(done->waitq); is completely on-stack,
>>   	if (done && atomic_dec_and_test(&done->cnt))
>> -		wake_up_all(&wb->bdi->wb_waitq);
>> +		wake_up_all(done->waitq);
>>   }
>>
>> which could cause use after free if on-stack wb_completion is gone...
>> (however previous wb->bdi is solid since it is not on-stack)
>>
>> see generic on-stack completion which takes a wait_queue spin_lock
>> between
>> test and wake_up...
>>
>> If I am wrong, ignore me, hmm...
> 
> It's a good guess ;)  Jens should have this queued up already:
> 
> https://lkml.org/lkml/2019/9/23/972

Yes indeed, it'll go out today or tomorrow for -rc2.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03 14:01     ` Chris Mason
  2019-10-03 14:05       ` Jens Axboe
@ 2019-10-03 14:40       ` Gao Xiang
  1 sibling, 0 replies; 8+ messages in thread
From: Gao Xiang @ 2019-10-03 14:40 UTC (permalink / raw)
  To: Chris Mason
  Cc: Dave Chinner, Darrick J. Wong, xfs, tj, linux-fsdevel, linux-block

Hi Chris,

On Thu, Oct 03, 2019 at 02:01:13PM +0000, Chris Mason wrote:
> 
> 
> On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> 
> > Hi,
> >
> > On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> >> [cc linux-fsdevel, linux-block, tejun ]
> >>
> >> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> >>> Hi everyone,
> >>>
> >>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem 
> >>> (tho
> >>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop 
> >>> up
> >>> on generic/299 though only 80% of the time.
> >>>
> >
> > Just a quick glance, I guess there could is a race between (complete 
> > guess):
> >
> >
> >  160 static void finish_writeback_work(struct bdi_writeback *wb,
> >  161                                   struct wb_writeback_work *work)
> >  162 {
> >  163         struct wb_completion *done = work->done;
> >  164
> >  165         if (work->auto_free)
> >  166                 kfree(work);
> >  167         if (done && atomic_dec_and_test(&done->cnt))
> >
> >  ^^^ here
> >
> >  168                 wake_up_all(done->waitq);
> >  169 }
> >
> > since new wake_up_all(done->waitq); is completely on-stack,
> >  	if (done && atomic_dec_and_test(&done->cnt))
> > -		wake_up_all(&wb->bdi->wb_waitq);
> > +		wake_up_all(done->waitq);
> >  }
> >
> > which could cause use after free if on-stack wb_completion is gone...
> > (however previous wb->bdi is solid since it is not on-stack)
> >
> > see generic on-stack completion which takes a wait_queue spin_lock 
> > between
> > test and wake_up...
> >
> > If I am wrong, ignore me, hmm...
> 
> It's a good guess ;)  Jens should have this queued up already:
> 
> https://lkml.org/lkml/2019/9/23/972

Oh, I didn't notice that, it's great to be already resolved. :)

It was not fully guess though, we once had a some similar
pattern at the very early stage last year (a given IO balance
counter, wait_queue. but completion is too heavy), which resolved
in commit 848bd9acdcd0 last year. Therefore I'm experienced
with such cases.

Just saw mailing list regularly and be of some help here...
Sorry about the noise...

Thanks,
Gao Xiang

> 
> -chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03 14:05       ` Jens Axboe
@ 2019-10-03 18:37         ` Darrick J. Wong
  2019-10-06 22:30           ` Darrick J. Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2019-10-03 18:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Chris Mason, Gao Xiang, Dave Chinner, xfs, tj, linux-fsdevel,
	linux-block

On Thu, Oct 03, 2019 at 08:05:42AM -0600, Jens Axboe wrote:
> On 10/3/19 8:01 AM, Chris Mason wrote:
> > 
> > 
> > On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> > 
> >> Hi,
> >>
> >> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> >>> [cc linux-fsdevel, linux-block, tejun ]
> >>>
> >>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> >>>> Hi everyone,
> >>>>
> >>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
> >>>> (tho
> >>>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop
> >>>> up
> >>>> on generic/299 though only 80% of the time.
> >>>>
> >>
> >> Just a quick glance, I guess there could is a race between (complete
> >> guess):
> >>
> >>
> >>   160 static void finish_writeback_work(struct bdi_writeback *wb,
> >>   161                                   struct wb_writeback_work *work)
> >>   162 {
> >>   163         struct wb_completion *done = work->done;
> >>   164
> >>   165         if (work->auto_free)
> >>   166                 kfree(work);
> >>   167         if (done && atomic_dec_and_test(&done->cnt))
> >>
> >>   ^^^ here
> >>
> >>   168                 wake_up_all(done->waitq);
> >>   169 }
> >>
> >> since new wake_up_all(done->waitq); is completely on-stack,
> >>   	if (done && atomic_dec_and_test(&done->cnt))
> >> -		wake_up_all(&wb->bdi->wb_waitq);
> >> +		wake_up_all(done->waitq);
> >>   }
> >>
> >> which could cause use after free if on-stack wb_completion is gone...
> >> (however previous wb->bdi is solid since it is not on-stack)
> >>
> >> see generic on-stack completion which takes a wait_queue spin_lock
> >> between
> >> test and wake_up...
> >>
> >> If I am wrong, ignore me, hmm...
> > 
> > It's a good guess ;)  Jens should have this queued up already:
> > 
> > https://lkml.org/lkml/2019/9/23/972
> 
> Yes indeed, it'll go out today or tomorrow for -rc2.

The patch fixes the problems I've been seeing, so:
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>

Thank you for taking care of this. :)

--D

> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-03 18:37         ` Darrick J. Wong
@ 2019-10-06 22:30           ` Darrick J. Wong
  2019-10-08  3:20             ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2019-10-06 22:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Chris Mason, Gao Xiang, Dave Chinner, xfs, tj, linux-fsdevel,
	linux-block

On Thu, Oct 03, 2019 at 11:37:46AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 03, 2019 at 08:05:42AM -0600, Jens Axboe wrote:
> > On 10/3/19 8:01 AM, Chris Mason wrote:
> > > 
> > > 
> > > On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> > > 
> > >> Hi,
> > >>
> > >> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> > >>> [cc linux-fsdevel, linux-block, tejun ]
> > >>>
> > >>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> > >>>> Hi everyone,
> > >>>>
> > >>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
> > >>>> (tho
> > >>>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop
> > >>>> up
> > >>>> on generic/299 though only 80% of the time.
> > >>>>
> > >>
> > >> Just a quick glance, I guess there could is a race between (complete
> > >> guess):
> > >>
> > >>
> > >>   160 static void finish_writeback_work(struct bdi_writeback *wb,
> > >>   161                                   struct wb_writeback_work *work)
> > >>   162 {
> > >>   163         struct wb_completion *done = work->done;
> > >>   164
> > >>   165         if (work->auto_free)
> > >>   166                 kfree(work);
> > >>   167         if (done && atomic_dec_and_test(&done->cnt))
> > >>
> > >>   ^^^ here
> > >>
> > >>   168                 wake_up_all(done->waitq);
> > >>   169 }
> > >>
> > >> since new wake_up_all(done->waitq); is completely on-stack,
> > >>   	if (done && atomic_dec_and_test(&done->cnt))
> > >> -		wake_up_all(&wb->bdi->wb_waitq);
> > >> +		wake_up_all(done->waitq);
> > >>   }
> > >>
> > >> which could cause use after free if on-stack wb_completion is gone...
> > >> (however previous wb->bdi is solid since it is not on-stack)
> > >>
> > >> see generic on-stack completion which takes a wait_queue spin_lock
> > >> between
> > >> test and wake_up...
> > >>
> > >> If I am wrong, ignore me, hmm...
> > > 
> > > It's a good guess ;)  Jens should have this queued up already:
> > > 
> > > https://lkml.org/lkml/2019/9/23/972
> > 
> > Yes indeed, it'll go out today or tomorrow for -rc2.
> 
> The patch fixes the problems I've been seeing, so:
> Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Thank you for taking care of this. :)

Hmm, I don't see this patch in -rc2; did it not go out in time, or were
there further complications?

--D

> --D
> 
> > -- 
> > Jens Axboe
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
  2019-10-06 22:30           ` Darrick J. Wong
@ 2019-10-08  3:20             ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2019-10-08  3:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Chris Mason, Gao Xiang, Dave Chinner, xfs, tj, linux-fsdevel,
	linux-block

On 10/6/19 4:30 PM, Darrick J. Wong wrote:
> On Thu, Oct 03, 2019 at 11:37:46AM -0700, Darrick J. Wong wrote:
>> On Thu, Oct 03, 2019 at 08:05:42AM -0600, Jens Axboe wrote:
>>> On 10/3/19 8:01 AM, Chris Mason wrote:
>>>>
>>>>
>>>> On 3 Oct 2019, at 4:41, Gao Xiang wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
>>>>>> [cc linux-fsdevel, linux-block, tejun ]
>>>>>>
>>>>>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
>>>>>>> (tho
>>>>>>> afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop
>>>>>>> up
>>>>>>> on generic/299 though only 80% of the time.
>>>>>>>
>>>>>
>>>>> Just a quick glance, I guess there could is a race between (complete
>>>>> guess):
>>>>>
>>>>>
>>>>>    160 static void finish_writeback_work(struct bdi_writeback *wb,
>>>>>    161                                   struct wb_writeback_work *work)
>>>>>    162 {
>>>>>    163         struct wb_completion *done = work->done;
>>>>>    164
>>>>>    165         if (work->auto_free)
>>>>>    166                 kfree(work);
>>>>>    167         if (done && atomic_dec_and_test(&done->cnt))
>>>>>
>>>>>    ^^^ here
>>>>>
>>>>>    168                 wake_up_all(done->waitq);
>>>>>    169 }
>>>>>
>>>>> since new wake_up_all(done->waitq); is completely on-stack,
>>>>>    	if (done && atomic_dec_and_test(&done->cnt))
>>>>> -		wake_up_all(&wb->bdi->wb_waitq);
>>>>> +		wake_up_all(done->waitq);
>>>>>    }
>>>>>
>>>>> which could cause use after free if on-stack wb_completion is gone...
>>>>> (however previous wb->bdi is solid since it is not on-stack)
>>>>>
>>>>> see generic on-stack completion which takes a wait_queue spin_lock
>>>>> between
>>>>> test and wake_up...
>>>>>
>>>>> If I am wrong, ignore me, hmm...
>>>>
>>>> It's a good guess ;)  Jens should have this queued up already:
>>>>
>>>> https://lkml.org/lkml/2019/9/23/972
>>>
>>> Yes indeed, it'll go out today or tomorrow for -rc2.
>>
>> The patch fixes the problems I've been seeing, so:
>> Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
>>
>> Thank you for taking care of this. :)
> 
> Hmm, I don't see this patch in -rc2; did it not go out in time, or were
> there further complications?

Andrew had it queued up, apparently my memory was bad. It's in now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-08  3:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191003015247.GI13108@magnolia>
2019-10-03  6:40 ` [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?) Dave Chinner
2019-10-03  8:41   ` Gao Xiang
2019-10-03 14:01     ` Chris Mason
2019-10-03 14:05       ` Jens Axboe
2019-10-03 18:37         ` Darrick J. Wong
2019-10-06 22:30           ` Darrick J. Wong
2019-10-08  3:20             ` Jens Axboe
2019-10-03 14:40       ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).