All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boyang Xue <bxue@redhat.com>
To: Roman Gushchin <guro@fb.com>
Cc: Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org
Subject: Re: Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash
Date: Thu, 15 Jul 2021 09:42:06 +0800	[thread overview]
Message-ID: <CAHLe9YaNtmJ8xx=A+6Ki+Fc2Kx=5jL745NJ8PL+w95-WhJrG3g@mail.gmail.com> (raw)
In-Reply-To: <YO93VTcLDNisdHRf@carbon.dhcp.thefacebook.com>

On Thu, Jul 15, 2021 at 7:46 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Jul 15, 2021 at 12:22:28AM +0800, Boyang Xue wrote:
> > Hi Jan,
> >
> > On Wed, Jul 14, 2021 at 5:26 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 14-07-21 16:44:33, Boyang Xue wrote:
> > > > Hi Roman,
> > > >
> > > > On Wed, Jul 14, 2021 at 12:12 PM Roman Gushchin <guro@fb.com> wrote:
> > > > >
> > > > > On Wed, Jul 14, 2021 at 11:21:12AM +0800, Boyang Xue wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I'm not sure if this is the right place to report this bug, please
> > > > > > correct me if I'm wrong.
> > > > > >
> > > > > > I found kernel-5.14.0-rc1 (built from the Linus tree) crash when it's
> > > > > > running xfstests generic/256 on ext4 [1]. Looking at the call trace,
> > > > > > it looks like the bug had been introduced by the commit
> > > > > >
> > > > > > c22d70a162d3 writeback, cgroup: release dying cgwbs by switching attached inodes
> > > > > >
> > > > > > It only happens on aarch64, not on x86_64, ppc64le and s390x. Testing
> > > > > > was performed with the latest xfstests, and the bug can be reproduced
> > > > > > on ext{2, 3, 4} with {1k, 2k, 4k} block sizes.
> > > > >
> > > > > Hello Boyang,
> > > > >
> > > > > thank you for the report!
> > > > >
> > > > > Do you know on which line the oops happens?
> > > >
> > > > I was trying to inspect the vmcore with crash utility, but
> > > > unfortunately it doesn't work.
> > >
> > > Thanks for report!  Have you tried addr2line utility? Looking at the oops I
> > > can see:
> >
> > Thanks for the tips!
> >
> > It's unclear to me that where to find the required address in the
> > addr2line command line, i.e.
> >
> > addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
> > <what address here?>
>
> You can use $nm <vmlinux> to get an address of cleanup_offline_cgwbs_workfn()
> and then add 0x320.

Thanks! Hope the following helps:

# grep  cleanup_offline_cgwbs_workfn
/boot/System.map-5.14.0-0.rc1.15.bx.el9.aarch64
ffff8000102d6ab0 t cleanup_offline_cgwbs_workfn

## ffff8000102d6ab0+0x320=FFFF8000102D6DD0

# addr2line -e /usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
FFFF8000102D6DD0
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265
# vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h
```
arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
{
        s64 c = arch_atomic64_read(v); <=== line#2265

        do {
                if (unlikely(c == u))
                        break;
        } while (!arch_atomic64_try_cmpxchg(v, &c, c + a));

        return c;
}
```

# addr2line -i -e
/usr/lib/debug/lib/modules/5.14.0-0.rc1.15.bx.el9.aarch64/vmlinux
FFFF8000102D6DD0
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2265
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/atomic-arch-fallback.h:2290
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-instrumented.h:1149
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/asm-generic/atomic-long.h:491
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:247
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/percpu-refcount.h:266
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:227
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/./include/linux/backing-dev-defs.h:224
/usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c:679
# vi /usr/src/debug/kernel-5.14.0-0.rc1.15.bx/linux-5.14.0-0.rc1.15.bx.el9.aarch64/mm/backing-dev.c
```
static void cleanup_offline_cgwbs_workfn(struct work_struct *work)
{
        struct bdi_writeback *wb;
        LIST_HEAD(processed);

        spin_lock_irq(&cgwb_lock);

        while (!list_empty(&offline_cgwbs)) {
                wb = list_first_entry(&offline_cgwbs, struct bdi_writeback,
                                      offline_node);
                list_move(&wb->offline_node, &processed);

                /*
                 * If wb is dirty, cleaning up the writeback by switching
                 * attached inodes will result in an effective removal of any
                 * bandwidth restrictions, which isn't the goal.  Instead,
                 * it can be postponed until the next time, when all io
                 * will be likely completed.  If in the meantime some inodes
                 * will get re-dirtied, they should be eventually switched to
                 * a new cgwb.
                 */
                if (wb_has_dirty_io(wb))
                        continue;

                if (!wb_tryget(wb))  <=== line#679
                        continue;

                spin_unlock_irq(&cgwb_lock);
                while (cleanup_offline_cgwb(wb))
                        cond_resched();
                spin_lock_irq(&cgwb_lock);

                wb_put(wb);
        }

        if (!list_empty(&processed))
                list_splice_tail(&processed, &offline_cgwbs);

        spin_unlock_irq(&cgwb_lock);
}
```

>
> Alternatively, maybe you can put the image you're using somewhere?

I put those rpms in the Google Drive
https://drive.google.com/drive/folders/1aw-WK2yWD11UWB059bJt6WKNW1OP_fex?usp=sharing

>
> I'm working on getting my arm64 setup and reproduce the problem, but it takes
> time, and I'm not sure I'll be able to reproduce it in qemu running on top of x86.

Thanks! It's only reproducible on aarch64 and ppc64le in my test. I'm
happy to help test patch, if it would help.

>
> Thanks!
>


  reply	other threads:[~2021-07-15  1:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  3:21 Patch 'writeback, cgroup: release dying cgwbs by switching attached inodes' leads to kernel crash Boyang Xue
2021-07-14  3:57 ` Boyang Xue
2021-07-14  4:11 ` Roman Gushchin
2021-07-14  8:44   ` Boyang Xue
2021-07-14  9:26     ` Jan Kara
2021-07-14 16:22       ` Boyang Xue
2021-07-14 23:46         ` Roman Gushchin
2021-07-15  1:42           ` Boyang Xue [this message]
2021-07-15  9:31             ` Jan Kara
2021-07-15 16:04               ` Roman Gushchin
2021-07-16  1:37                 ` Boyang Xue
2021-07-15  2:35         ` Matthew Wilcox
2021-07-15  3:51           ` Boyang Xue
2021-07-15 17:10             ` Darrick J. Wong
2021-07-15 20:08               ` Roman Gushchin
2021-07-15 22:28                 ` Darrick J. Wong
2021-07-16 16:23                   ` Darrick J. Wong
2021-07-16 20:03                     ` Roman Gushchin
2021-07-17 12:00                       ` Boyang Xue
2021-07-22  5:29                         ` Boyang Xue
2021-07-22  5:41                           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHLe9YaNtmJ8xx=A+6Ki+Fc2Kx=5jL745NJ8PL+w95-WhJrG3g@mail.gmail.com' \
    --to=bxue@redhat.com \
    --cc=guro@fb.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.