linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@aol.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs <linux-xfs@vger.kernel.org>,
	tj@kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org
Subject: Re: [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?)
Date: Thu, 3 Oct 2019 16:41:56 +0800	[thread overview]
Message-ID: <20191003084149.GA16347@hsiangkao-HP-ZHAN-66-Pro-G1> (raw)
In-Reply-To: <20191003064022.GX16973@dread.disaster.area>

Hi,

On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
> [cc linux-fsdevel, linux-block, tejun ]
> 
> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
> > Hi everyone,
> > 
> > Does anyone /else/ see this crash in generic/299 on a V4 filesystem (tho
> > afaict V5 configs crash too) and a 5.4-rc1 kernel?  It seems to pop up
> > on generic/299 though only 80% of the time.
> > 
> > --D
> > 
> > [ 1806.186197] run fstests generic/299 at 2019-10-02 18:15:30
> > [ 1808.279874] XFS (sdb): Mounting V4 Filesystem
> > [ 1808.283519] XFS (sdb): Ending clean mount
> > [ 1808.284530] XFS (sdb): Quotacheck needed: Please wait.
> > [ 1808.317062] XFS (sdb): Quotacheck: Done.
> > [ 1808.319821] Mounted xfs file system at /opt supports timestamps until 2038 (0x7fffffff)
> > [ 1886.218794] BUG: kernel NULL pointer dereference, address: 0000000000000018
> > [ 1886.219787] #PF: supervisor read access in kernel mode
> > [ 1886.220638] #PF: error_code(0x0000) - not-present page
> > [ 1886.221496] PGD 0 P4D 0 
> > [ 1886.221970] Oops: 0000 [#1] PREEMPT SMP
> > [ 1886.222596] CPU: 2 PID: 227320 Comm: kworker/u10:2 Tainted: G        W         5.4.0-rc1-djw #rc1
> > [ 1886.224016] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
> > [ 1886.225261] Workqueue: writeback wb_workfn (flush-8:16)
> > [ 1886.225926] RIP: 0010:__lock_acquire+0x4c3/0x1490
> > [ 1886.226595] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> > [ 1886.229146] RSP: 0000:ffffc900052c3bc0 EFLAGS: 00010002
> > [ 1886.230008] RAX: 0000000000000000 RBX: 0000000000000018 RCX: 0000000000000000
> > [ 1886.231238] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
> > [ 1886.236382] RBP: ffff888077f80000 R08: 0000000000000001 R09: 0000000000000001
> > [ 1886.241630] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [ 1886.243530] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000018
> > [ 1886.244669] FS:  0000000000000000(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
> > [ 1886.245941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1886.246913] CR2: 0000000000000018 CR3: 0000000072f7b003 CR4: 00000000001606a0
> > [ 1886.247834] Call Trace:
> > [ 1886.248217]  ? mark_held_locks+0x47/0x70
> > [ 1886.248810]  ? trace_hardirqs_on_thunk+0x1a/0x20
> > [ 1886.249445]  lock_acquire+0x90/0x180
> > [ 1886.249876]  ? __wake_up_common_lock+0x62/0xc0
> > [ 1886.250577]  _raw_spin_lock_irqsave+0x3e/0x80
> > [ 1886.251327]  ? __wake_up_common_lock+0x62/0xc0
> > [ 1886.252538]  __wake_up_common_lock+0x62/0xc0
> > [ 1886.257318]  wb_workfn+0x10e/0x610
> > [ 1886.260171]  ? __lock_acquire+0x268/0x1490
> > [ 1886.266124]  ? process_one_work+0x1da/0x5d0
> > [ 1886.266941]  process_one_work+0x25b/0x5d0
> > [ 1886.267759]  worker_thread+0x3d/0x3a0
> > [ 1886.268497]  ? process_one_work+0x5d0/0x5d0
> > [ 1886.269285]  kthread+0x121/0x140
> > [ 1886.269808]  ? kthread_park+0x80/0x80
> > [ 1886.270317]  ret_from_fork+0x3a/0x50
> > [ 1886.270811] Modules linked in: xfs libcrc32c dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp bfq xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > [ 1886.274144] Dumping ftrace buffer:
> > [ 1886.274637]    (ftrace buffer empty)
> > [ 1886.275129] CR2: 0000000000000018
> > [ 1886.275567] ---[ end trace 20db199015efe614 ]---
> > [ 1886.278601] RIP: 0010:__lock_acquire+0x4c3/0x1490
> > [ 1886.283408] Code: 00 00 00 48 8b 74 24 48 65 48 33 34 25 28 00 00 00 8b 44 24 04 0f 85 a1 0f 00 00 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <48> 81 3f 20 66 6d 82 41 ba 00 00 00 00 45 0f 45 d0 83 fe 01 0f 87
> > 

Just a quick glance, I guess there could is a race between (complete guess):


 160 static void finish_writeback_work(struct bdi_writeback *wb,
 161                                   struct wb_writeback_work *work)
 162 {
 163         struct wb_completion *done = work->done;
 164 
 165         if (work->auto_free)
 166                 kfree(work);
 167         if (done && atomic_dec_and_test(&done->cnt))

 ^^^ here

 168                 wake_up_all(done->waitq);
 169 }

since new wake_up_all(done->waitq); is completely on-stack,
 	if (done && atomic_dec_and_test(&done->cnt))
-		wake_up_all(&wb->bdi->wb_waitq);
+		wake_up_all(done->waitq);
 }

which could cause use after free if on-stack wb_completion is gone...
(however previous wb->bdi is solid since it is not on-stack)

see generic on-stack completion which takes a wait_queue spin_lock between
test and wake_up...

If I am wrong, ignore me, hmm...

Thanks,
Gao Xiang

> 
> generic/270 w/ xfs-dax on pmem triggers it within 3 runs every time
> here. Bisect points to this commit:
> 
> $ git bisect bad
> 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d is the first bad commit
> commit 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
> Author: Tejun Heo <tj@kernel.org>
> Date:   Mon Aug 26 09:06:52 2019 -0700
> 
>     writeback: Generalize and expose wb_completion
>     
>     wb_completion is used to track writeback completions.  We want to use
>     it from memcg side for foreign inode flushes.  This patch updates it
>     to remember the target waitq instead of assuming bdi->wb_waitq and
>     expose it outside of fs-writeback.c.
>     
>     Reviewed-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Tejun Heo <tj@kernel.org>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
>  fs/fs-writeback.c                | 47 ++++++++++++----------------------------
>  include/linux/backing-dev-defs.h | 20 +++++++++++++++++
>  include/linux/backing-dev.h      |  2 ++
>  3 files changed, 36 insertions(+), 33 deletions(-)
> 
> 
> $ git bisect log
> git bisect start
> # bad: [3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7] writeback: add tracepoints for cgroup foreign writebacks
> git bisect bad 3a8e9ac89e6a5106cfb6b85d4c9cf9bfa3519bc7
> # good: [b8e24a9300b0836a9d39f6b20746766b3b81f1bd] block: annotate refault stalls from IO submission
> git bisect good b8e24a9300b0836a9d39f6b20746766b3b81f1bd
> # bad: [ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9] writeback: Separate out wb_get_lookup() from wb_get_create()
> git bisect bad ed288dc0d4aa29f65bd25b31b5cb866aa5664ff9
> # good: [320ea869a12cec206756207c6ca5f817ec45c7f2] block: improve the gap check in __bio_add_pc_page
> git bisect good 320ea869a12cec206756207c6ca5f817ec45c7f2
> # good: [7ea88e229e9df18ecd624b0d39f3dba87432ba33] null_blk: create a helper for mem-backed ops
> git bisect good 7ea88e229e9df18ecd624b0d39f3dba87432ba33
> # good: [38b4e09fbccab6457536563823222921c49601bb] null_blk: fix inline misuse
> git bisect good 38b4e09fbccab6457536563823222921c49601bb
> # bad: [34f8fe501f0624de115d087680c84000b5d9abc9] bdi: Add bdi->id
> git bisect bad 34f8fe501f0624de115d087680c84000b5d9abc9
> # bad: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completion
> git bisect bad 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
> # first bad commit: [5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d] writeback: Generalize and expose wb_completio
> $
> 
> Not obvious to me what is wrong with that commit right now, but the
> bisect is solid. Kinda surprised to see such significant
> fs-writeback changes in 5.4, though, because there was nothing sent
> to the -fsdevel list for review in the last dev cycle.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2019-10-03  8:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191003015247.GI13108@magnolia>
2019-10-03  6:40 ` [5.4-rc1, regression] wb_workfn wakeup oops (was Re: frequent 5.4-rc1 crash?) Dave Chinner
2019-10-03  8:41   ` Gao Xiang [this message]
2019-10-03 14:01     ` Chris Mason
2019-10-03 14:05       ` Jens Axboe
2019-10-03 18:37         ` Darrick J. Wong
2019-10-06 22:30           ` Darrick J. Wong
2019-10-08  3:20             ` Jens Axboe
2019-10-03 14:40       ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191003084149.GA16347@hsiangkao-HP-ZHAN-66-Pro-G1 \
    --to=hsiangkao@aol.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).