All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boyang Xue <bxue@redhat.com>
To: Roman Gushchin <guro@fb.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash
Date: Sat, 17 Jul 2021 20:00:42 +0800	[thread overview]
Message-ID: <CAHLe9YZLrYJvuXBiZvu0BLVth0Cuxw4Ja1DKgyH0Q43-V62AsA@mail.gmail.com> (raw)
In-Reply-To: <YPHmLwF09QCPB7tw@carbon.dhcp.thefacebook.com>

Testing fstests on aarch64, x86_64, s390x all passed. There's a
shortage of ppc64le systems, so I can't provide the ppc64le test
result for now, but I hope I can report the result next week.

Thanks,
Boyang

On Sat, Jul 17, 2021 at 4:04 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Jul 16, 2021 at 09:23:40AM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 15, 2021 at 03:28:12PM -0700, Darrick J. Wong wrote:
> > > On Thu, Jul 15, 2021 at 01:08:15PM -0700, Roman Gushchin wrote:
> > > > On Thu, Jul 15, 2021 at 10:10:50AM -0700, Darrick J. Wong wrote:
> > > > > On Thu, Jul 15, 2021 at 11:51:50AM +0800, Boyang Xue wrote:
> > > > > > On Thu, Jul 15, 2021 at 10:36 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote:
> > > > > > > > It's unclear to me that where to find the required address in the
> > > > > > > > addr2line command line, i.e.
> > > > > > > >
> > > > > > > > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > > > > <what address here?>
> > > > > > >
> > > > > > > ./scripts/faddr2line /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux cleanup_offline_cgwbs_workfn+0x320/0x394
> > > > > > >
> > > > > >
> > > > > > Thanks! The result is the same as the
> > > > > >
> > > > > > addr2line -i -e
> > > > > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > > FFFF8000102D6DD0
> > > > > >
> > > > > > But this script is very handy.
> > > > > >
> > > > > > # /usr/src/kernels/5.14.0-0.rc1.15.bx.el9.aarch64/scripts/faddr2line
> > > > > > /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > > > > > cleanup_offlin
> > > > > > e_cgwbs_workfn+0x320/0x394
> > > > > > cleanup_offline_cgwbs_workfn+0x320/0x394:
> > > > > > arch_atomic64_fetch_add_unless at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265
> > > > > > (inlined by) arch_atomic64_add_unless at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290
> > > > > > (inlined by) atomic64_add_unless at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149
> > > > > > (inlined by) atomic_long_add_unless at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491
> > > > > > (inlined by) percpu_ref_tryget_many at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247
> > > > > > (inlined by) percpu_ref_tryget at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266
> > > > > > (inlined by) wb_tryget at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227
> > > > > > (inlined by) wb_tryget at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224
> > > > > > (inlined by) cleanup_offline_cgwbs_workfn at
> > > > > > /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679
> > > > > >
> > > > > > # vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c
> > > > > > ```
> > > > > > static void cleanup_offline_cgwbs_workfn(struct work_struct *work)
> > > > > > {
> > > > > >         struct bdi_writeback *wb;
> > > > > >         LIST_HEAD(processed);
> > > > > >
> > > > > >         spin_lock_irq(&cgwb_lock);
> > > > > >
> > > > > >         while (!list_empty(&offline_cgwbs)) {
> > > > > >                 wb = list_first_entry(&offline_cgwbs, struct bdi_writeback,
> > > > > >                                       offline_node);
> > > > > >                 list_move(&wb->offline_node, &processed);
> > > > > >
> > > > > >                 /*
> > > > > >                  * If wb is dirty, cleaning up the writeback by switching
> > > > > >                  * attached inodes will result in an effective removal of any
> > > > > >                  * bandwidth restrictions, which isn't the goal.  Instead,
> > > > > >                  * it can be postponed until the next time, when all io
> > > > > >                  * will be likely completed.  If in the meantime some inodes
> > > > > >                  * will get re-dirtied, they should be eventually switched to
> > > > > >                  * a new cgwb.
> > > > > >                  */
> > > > > >                 if (wb_has_dirty_io(wb))
> > > > > >                         continue;
> > > > > >
> > > > > >                 if (!wb_tryget(wb))  <=== line#679
> > > > > >                         continue;
> > > > > >
> > > > > >                 spin_unlock_irq(&cgwb_lock);
> > > > > >                 while (cleanup_offline_cgwb(wb))
> > > > > >                         cond_resched();
> > > > > >                 spin_lock_irq(&cgwb_lock);
> > > > > >
> > > > > >                 wb_put(wb);
> > > > > >         }
> > > > > >
> > > > > >         if (!list_empty(&processed))
> > > > > >                 list_splice_tail(&processed, &offline_cgwbs);
> > > > > >
> > > > > >         spin_unlock_irq(&cgwb_lock);
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > BTW, this bug can be only reproduced on a non-debug production built
> > > > > > kernel (a.k.a kernel rpm package), it's not reproducible on a debug
> > > > > > build with various debug configuration enabled (a.k.a kernel-debug rpm
> > > > > > package)
> > > > >
> > > > > FWIW I've also seen this regularly on x86_64 kernels on ext4 with all
> > > > > default mkfs settings when running generic/256.
> > > >
> > > > Oh, that's a useful information, thank you!
> > > >
> > > > Btw, would you mind to give a patch from an earlier message in the thread
> > > > a test? I'd highly appreciate it.
> > > >
> > > > Thanks!
> > >
> > > Will do.
> >
> > fstests passed here, so
> >
> > Tested-by: Darrick J. Wong <djwong@kernel.org>
>
> Great, thank you!
>


  reply	other threads:[~2021-07-17 12:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  3:21 Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash Boyang Xue
2021-07-14  3:57 ` Boyang Xue
2021-07-14  4:11 ` Roman Gushchin
2021-07-14  8:44   ` Boyang Xue
2021-07-14  9:26     ` Jan Kara
2021-07-14 16:22       ` Boyang Xue
2021-07-14 23:46         ` Roman Gushchin
2021-07-15  1:42           ` Boyang Xue
2021-07-15  9:31             ` Jan Kara
2021-07-15 16:04               ` Roman Gushchin
2021-07-16  1:37                 ` Boyang Xue
2021-07-15  2:35         ` Matthew Wilcox
2021-07-15  3:51           ` Boyang Xue
2021-07-15 17:10             ` Darrick J. Wong
2021-07-15 20:08               ` Roman Gushchin
2021-07-15 22:28                 ` Darrick J. Wong
2021-07-16 16:23                   ` Darrick J. Wong
2021-07-16 20:03                     ` Roman Gushchin
2021-07-17 12:00                       ` Boyang Xue [this message]
2021-07-22  5:29                         ` Boyang Xue
2021-07-22  5:41                           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHLe9YZLrYJvuXBiZvu0BLVth0Cuxw4Ja1DKgyH0Q43-V62AsA@mail.gmail.com \
    --to=bxue@redhat.com \
    --cc=djwong@kernel.org \
    --cc=guro@fb.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.