linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: soft lockup in __kmalloc_node() with KFENCE enabled
@ 2021-10-10 13:53 Andrea Righi
  2021-10-11  6:00 ` Marco Elver
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-10 13:53 UTC (permalink / raw)
  To: Alexander Potapenko, Marco Elver, Dmitry Vyukov
  Cc: kasan-dev, linux-mm, linux-kernel

I can systematically reproduce the following soft lockup w/ the latest
5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
tested so far).

I've found this issue by running systemd autopkgtest (I'm using the
latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
any recent version of systemd).

I'm running this test inside a local KVM instance and apparently systemd
is starting up its own KVM instances to run its tests, so the context is
a nested KVM scenario (even if I don't think the nested KVM part really
matters).

Here's the oops:

[   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
[   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
[   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
[   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   36.466565] RIP: 0010:__kmalloc_node+0xcf/0x370
[   36.466565] Code: 00 85 c0 74 20 49 8b 45 00 a8 03 0f 85 8e 02 00 00 65 48 ff 08 e8 01 7a e3 ff 45 31 ff e9 a3 00 00 00 45 31 ed 4d 85 f6 74 f0 <0f> 1f 44 00 00 48 c7 45 c8 00 00 00 00 4d 8b 06 65 4c 03 05 19 87
[   36.466565] RSP: 0018:ffffb512802c3aa8 EFLAGS: 00000286
[   36.466565] RAX: 0000000000000000 RBX: 0000000000000dc0 RCX: 0000000000000000
[   36.466565] RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff998601042700
[   36.466565] RBP: ffffb512802c3af0 R08: ffffeb90800b5c00 R09: 0000000000000000
[   36.466565] R10: 0000000000000293 R11: ffff99861e02f448 R12: 0000000000000dc0
[   36.466565] R13: 0000000000000000 R14: ffff998601042700 R15: ffffffff8b519c4d
[   36.466565] FS:  00007f08d0674d00(0000) GS:ffff99861e000000(0000) knlGS:0000000000000000
[   36.466565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.466565] CR2: 00005622f86e3108 CR3: 00000000033c8000 CR4: 00000000000006f0
[   36.466565] Call Trace:
[   36.466565]  memcg_alloc_page_obj_cgroups+0x3d/0x90
[   36.466565]  allocate_slab+0xc1/0x440
[   36.466565]  ? get_any_partial+0x85/0x190
[   36.466565]  new_slab+0x3a/0x60
[   36.466565]  ___slab_alloc+0x562/0x6b0
[   36.466565]  ? __alloc_file+0x28/0xe0
[   36.466565]  ? __alloc_file+0x28/0xe0
[   36.466565]  ? dput+0x84/0x180
[   36.466565]  ? step_into+0x312/0x380
[   36.466565]  kmem_cache_alloc+0x299/0x2d0
[   36.466565]  ? __alloc_file+0x28/0xe0
[   36.466565]  __alloc_file+0x28/0xe0
[   36.466565]  alloc_empty_file+0x45/0xc0
[   36.466565]  path_openat+0x47/0x2b0
[   36.466565]  do_filp_open+0xb2/0x150
[   36.466565]  ? putname+0x55/0x60
[   36.466565]  ? __check_object_size.part.0+0x128/0x150
[   36.466565]  ? __check_object_size+0x19/0x20
[   36.466565]  ? alloc_fd+0x53/0x170
[   36.466565]  do_sys_openat2+0x9b/0x160
[   36.466565]  __x64_sys_openat+0x55/0x90
[   36.466565]  do_syscall_64+0x5c/0xc0
[   36.466565]  ? exit_to_user_mode_prepare+0x37/0xb0
[   36.466565]  ? syscall_exit_to_user_mode+0x27/0x50
[   36.466565]  ? __x64_sys_newfstatat+0x1c/0x20
[   36.466565]  ? do_syscall_64+0x69/0xc0
[   36.466565]  ? do_syscall_64+0x69/0xc0
[   36.466565]  ? do_syscall_64+0x69/0xc0
[   36.466565]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   36.466565] RIP: 0033:0x7f08d0c0466b
[   36.466565] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
[   36.466565] RSP: 002b:00007ffe10be08f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[   36.466565] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f08d0c0466b
[   36.466565] RDX: 0000000000080000 RSI: 00007ffe10be0a70 RDI: 00000000ffffff9c
[   36.466565] RBP: 00007ffe10be0a70 R08: fefefefefefefeff R09: fefefeff716e6c71
[   36.466565] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000080000
[   36.466565] R13: 00005622f86e2520 R14: 0000000000000003 R15: 00007ffe10be0990

It looks like this is pointing at asm_volatile_goto():

crash> gdb list *(__kmalloc_node+0xcf)
0xffffffff8b4f8e3f is in __kmalloc_node (/build/impish/arch/x86/include/asm/jump_label.h:27).
22      
23      #ifdef CONFIG_STACK_VALIDATION
24      
25      static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
26      {
27              asm_volatile_goto("1:"
28                      "jmp %l[l_yes] # objtool NOPs this \n\t"
29                      JUMP_TABLE_ENTRY
30                      : :  "i" (key), "i" (2 | branch) : : l_yes);
31      

And it seems that we're hitting an int3:

crash> bt 1
PID: 1      TASK: ffff998601248000  CPU: 3   COMMAND: "systemd"
    [exception RIP: poke_int3_handler+89]
    RIP: ffffffff8be6ac89  RSP: ffffb51280013b80  RFLAGS: 00000046
    RAX: 000000008c0010a7  RBX: 0000000000000000  RCX: ffffb51280013ba8
    RDX: 0000000000000000  RSI: ffffffff8c000a99  RDI: ffffb51280013ba8
    RBP: ffffb51280013b98   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffb51280013ba8
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    CS: 0010  SS: 0018
 #0 [ffffb51280013b80] exc_int3 at ffffffff8be69e80
 #1 [ffffb51280013ba0] asm_exc_int3 at ffffffff8c000aa1
 #2 [ffffb51280013c28] __kmalloc_node at ffffffff8b4f8e40
 #3 [ffffb51280013c80] __kmalloc_node at ffffffff8b4f8e16
 #4 [ffffb51280013cd8] kvmalloc_node at ffffffff8b4885c8
 #5 [ffffb51280013d00] seq_read_iter at ffffffff8b56139c
 #6 [ffffb51280013d60] seq_read at ffffffff8b561585
 #7 [ffffb51280013de8] vfs_read at ffffffff8b531b0f
 #8 [ffffb51280013e28] ksys_read at ffffffff8b5321d7
 #9 [ffffb51280013e68] __x64_sys_read at ffffffff8b532269
#10 [ffffb51280013e78] do_syscall_64 at ffffffff8be6900c
#11 [ffffb51280013f50] entry_SYSCALL_64_after_hwframe at ffffffff8c00007c
    RIP: 00007fc076f2e912  RSP: 00007ffde04ba888  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000055e3b310bc30  RCX: 00007fc076f2e912
    RDX: 0000000000000400  RSI: 000055e3b31415e0  RDI: 0000000000000029
    RBP: 00007fc077031560   R8: 0000000000000029   R9: 0000000000000050
    R10: 0000000000001000  R11: 0000000000000246  R12: 00007fc07679a6c0
    R13: 0000000000000d68  R14: 00007fc077030960  R15: 0000000000000d68
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
autotest completes just fine.

We've decided to disable KFENCE in the latest Ubuntu Impish kernel
(5.13) for now, because of this issue, but I'm still investigating
trying to better understand the problem.

Any hint / suggestion?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-10 13:53 BUG: soft lockup in __kmalloc_node() with KFENCE enabled Andrea Righi
@ 2021-10-11  6:00 ` Marco Elver
  2021-10-11  6:32   ` Andrea Righi
  0 siblings, 1 reply; 15+ messages in thread
From: Marco Elver @ 2021-10-11  6:00 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Alexander Potapenko, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel

On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> I can systematically reproduce the following soft lockup w/ the latest
> 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> tested so far).
>
> I've found this issue by running systemd autopkgtest (I'm using the
> latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> any recent version of systemd).
>
> I'm running this test inside a local KVM instance and apparently systemd
> is starting up its own KVM instances to run its tests, so the context is
> a nested KVM scenario (even if I don't think the nested KVM part really
> matters).
>
> Here's the oops:
>
> [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[...]
>
> If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> autotest completes just fine.
>
> We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> (5.13) for now, because of this issue, but I'm still investigating
> trying to better understand the problem.
>
> Any hint / suggestion?

Can you confirm this is not a QEMU TCG instance? There's been a known
issue with it: https://bugs.launchpad.net/qemu/+bug/1920934

One thing that I've been wondering is, if we can make
CONFIG_KFENCE_STATIC_KEYS=n the default, because the static keys
approach is becoming more trouble than it's worth. It requires us to
re-benchmark the defaults. If you're thinking of turning KFENCE on by
default (i.e. CONFIG_KFENCE_SAMPLE_INTERVAL non-zero), you could make
this decision for Ubuntu with whatever sample interval you choose.
We've found that for large deployments 500ms or above is more than
adequate.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  6:00 ` Marco Elver
@ 2021-10-11  6:32   ` Andrea Righi
  2021-10-11  6:48     ` Marco Elver
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-11  6:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > I can systematically reproduce the following soft lockup w/ the latest
> > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > tested so far).
> >
> > I've found this issue by running systemd autopkgtest (I'm using the
> > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > any recent version of systemd).
> >
> > I'm running this test inside a local KVM instance and apparently systemd
> > is starting up its own KVM instances to run its tests, so the context is
> > a nested KVM scenario (even if I don't think the nested KVM part really
> > matters).
> >
> > Here's the oops:
> >
> > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [...]
> >
> > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > autotest completes just fine.
> >
> > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > (5.13) for now, because of this issue, but I'm still investigating
> > trying to better understand the problem.
> >
> > Any hint / suggestion?
> 
> Can you confirm this is not a QEMU TCG instance? There's been a known
> issue with it: https://bugs.launchpad.net/qemu/+bug/1920934

It looks like systemd is running qemu-system-x86 without any "accel"
options, so IIUC the instance shouldn't use TCG. Is this a correct
assumption or is there a better way to check?

> 
> One thing that I've been wondering is, if we can make
> CONFIG_KFENCE_STATIC_KEYS=n the default, because the static keys
> approach is becoming more trouble than it's worth. It requires us to
> re-benchmark the defaults. If you're thinking of turning KFENCE on by
> default (i.e. CONFIG_KFENCE_SAMPLE_INTERVAL non-zero), you could make
> this decision for Ubuntu with whatever sample interval you choose.
> We've found that for large deployments 500ms or above is more than
> adequate.

Another thing that I forgot to mention is that with
CONFIG_KFENCE_STATIC_KEYS=n the soft lockup doesn't seem to happen.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  6:32   ` Andrea Righi
@ 2021-10-11  6:48     ` Marco Elver
  2021-10-11  7:10       ` Andrea Righi
  0 siblings, 1 reply; 15+ messages in thread
From: Marco Elver @ 2021-10-11  6:48 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Alexander Potapenko, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > I can systematically reproduce the following soft lockup w/ the latest
> > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > tested so far).
> > >
> > > I've found this issue by running systemd autopkgtest (I'm using the
> > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > any recent version of systemd).
> > >
> > > I'm running this test inside a local KVM instance and apparently systemd
> > > is starting up its own KVM instances to run its tests, so the context is
> > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > matters).
> > >
> > > Here's the oops:
> > >
> > > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > [...]
> > >
> > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > autotest completes just fine.
> > >
> > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > (5.13) for now, because of this issue, but I'm still investigating
> > > trying to better understand the problem.
> > >
> > > Any hint / suggestion?
> >
> > Can you confirm this is not a QEMU TCG instance? There's been a known
> > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
>
> It looks like systemd is running qemu-system-x86 without any "accel"
> options, so IIUC the instance shouldn't use TCG. Is this a correct
> assumption or is there a better way to check?

AFAIK, the default is TCG if nothing else is requested. What was the
command line?

> > One thing that I've been wondering is, if we can make
> > CONFIG_KFENCE_STATIC_KEYS=n the default, because the static keys
> > approach is becoming more trouble than it's worth. It requires us to
> > re-benchmark the defaults. If you're thinking of turning KFENCE on by
> > default (i.e. CONFIG_KFENCE_SAMPLE_INTERVAL non-zero), you could make
> > this decision for Ubuntu with whatever sample interval you choose.
> > We've found that for large deployments 500ms or above is more than
> > adequate.
>
> Another thing that I forgot to mention is that with
> CONFIG_KFENCE_STATIC_KEYS=n the soft lockup doesn't seem to happen.

Thanks for confirming.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  6:48     ` Marco Elver
@ 2021-10-11  7:10       ` Andrea Righi
  2021-10-11  7:19         ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-11  7:10 UTC (permalink / raw)
  To: Marco Elver
  Cc: Alexander Potapenko, Dmitry Vyukov, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote:
> On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > I can systematically reproduce the following soft lockup w/ the latest
> > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > > tested so far).
> > > >
> > > > I've found this issue by running systemd autopkgtest (I'm using the
> > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > > any recent version of systemd).
> > > >
> > > > I'm running this test inside a local KVM instance and apparently systemd
> > > > is starting up its own KVM instances to run its tests, so the context is
> > > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > > matters).
> > > >
> > > > Here's the oops:
> > > >
> > > > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > > [...]
> > > >
> > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > > autotest completes just fine.
> > > >
> > > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > > (5.13) for now, because of this issue, but I'm still investigating
> > > > trying to better understand the problem.
> > > >
> > > > Any hint / suggestion?
> > >
> > > Can you confirm this is not a QEMU TCG instance? There's been a known
> > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
> >
> > It looks like systemd is running qemu-system-x86 without any "accel"
> > options, so IIUC the instance shouldn't use TCG. Is this a correct
> > assumption or is there a better way to check?
> 
> AFAIK, the default is TCG if nothing else is requested. What was the
> command line?

This is the full command line of what systemd is running:

  /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -kernel /boot/vmlinuz-5.15-rc4 -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -append  root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 selinux=0  SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-14.service systemd.wants=end.service

And this is running inside a KVM instance (so a nested KVM scenario).

-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  7:10       ` Andrea Righi
@ 2021-10-11  7:19         ` Dmitry Vyukov
  2021-10-11  9:23           ` Andrea Righi
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2021-10-11  7:19 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 09:10, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote:
> > On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > I can systematically reproduce the following soft lockup w/ the latest
> > > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > > > tested so far).
> > > > >
> > > > > I've found this issue by running systemd autopkgtest (I'm using the
> > > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > > > any recent version of systemd).
> > > > >
> > > > > I'm running this test inside a local KVM instance and apparently systemd
> > > > > is starting up its own KVM instances to run its tests, so the context is
> > > > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > > > matters).
> > > > >
> > > > > Here's the oops:
> > > > >
> > > > > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > > > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > > > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > > > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > > > [...]
> > > > >
> > > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > > > autotest completes just fine.
> > > > >
> > > > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > > > (5.13) for now, because of this issue, but I'm still investigating
> > > > > trying to better understand the problem.
> > > > >
> > > > > Any hint / suggestion?
> > > >
> > > > Can you confirm this is not a QEMU TCG instance? There's been a known
> > > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
> > >
> > > It looks like systemd is running qemu-system-x86 without any "accel"
> > > options, so IIUC the instance shouldn't use TCG. Is this a correct
> > > assumption or is there a better way to check?
> >
> > AFAIK, the default is TCG if nothing else is requested. What was the
> > command line?
>
> This is the full command line of what systemd is running:
>
>   /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -kernel /boot/vmlinuz-5.15-rc4 -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -append  root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 selinux=0  SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-14.service systemd.wants=end.service
>
> And this is running inside a KVM instance (so a nested KVM scenario).

Hi Andrea,

I think you need to pass -enable-kvm to make it "nested KVM scenario",
otherwise it's TCG emulation.

You seem to use the default 20s stall timeout. FWIW syzbot uses 160
secs timeout for TCG emulation to avoid false positive warnings:
https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
There are a number of other timeouts raised as well, some as high as
420 seconds.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  7:19         ` Dmitry Vyukov
@ 2021-10-11  9:23           ` Andrea Righi
  2021-10-11  9:52             ` Andrea Righi
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-11  9:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 09:19:48AM +0200, Dmitry Vyukov wrote:
> On Mon, 11 Oct 2021 at 09:10, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 08:48:29AM +0200, Marco Elver wrote:
> > > On Mon, 11 Oct 2021 at 08:32, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > On Mon, Oct 11, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> > > > > On Sun, 10 Oct 2021 at 15:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > > I can systematically reproduce the following soft lockup w/ the latest
> > > > > > 5.15-rc4 kernel (and all the 5.14, 5.13 and 5.12 kernels that I've
> > > > > > tested so far).
> > > > > >
> > > > > > I've found this issue by running systemd autopkgtest (I'm using the
> > > > > > latest systemd in Ubuntu - 248.3-1ubuntu7 - but it should happen with
> > > > > > any recent version of systemd).
> > > > > >
> > > > > > I'm running this test inside a local KVM instance and apparently systemd
> > > > > > is starting up its own KVM instances to run its tests, so the context is
> > > > > > a nested KVM scenario (even if I don't think the nested KVM part really
> > > > > > matters).
> > > > > >
> > > > > > Here's the oops:
> > > > > >
> > > > > > [   36.466565] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [udevadm:333]
> > > > > > [   36.466565] Modules linked in: btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
> > > > > > [   36.466565] CPU: 0 PID: 333 Comm: udevadm Not tainted 5.15-rc4
> > > > > > [   36.466565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> > > > > [...]
> > > > > >
> > > > > > If I disable CONFIG_KFENCE the soft lockup doesn't happen and systemd
> > > > > > autotest completes just fine.
> > > > > >
> > > > > > We've decided to disable KFENCE in the latest Ubuntu Impish kernel
> > > > > > (5.13) for now, because of this issue, but I'm still investigating
> > > > > > trying to better understand the problem.
> > > > > >
> > > > > > Any hint / suggestion?
> > > > >
> > > > > Can you confirm this is not a QEMU TCG instance? There's been a known
> > > > > issue with it: https://bugs.launchpad.net/qemu/+bug/1920934
> > > >
> > > > It looks like systemd is running qemu-system-x86 without any "accel"
> > > > options, so IIUC the instance shouldn't use TCG. Is this a correct
> > > > assumption or is there a better way to check?
> > >
> > > AFAIK, the default is TCG if nothing else is requested. What was the
> > > command line?
> >
> > This is the full command line of what systemd is running:
> >
> >   /bin/qemu-system-x86_64 -smp 4 -net none -m 512M -nographic -vga none -kernel /boot/vmlinuz-5.15-rc4 -drive format=raw,cache=unsafe,file=/var/tmp/systemd-test.sI1nrh/badid.img -initrd /boot/initrd.img-5.15-rc4 -append  root=/dev/sda1 rw raid=noautodetect rd.luks=0 loglevel=2 init=/lib/systemd/systemd console=ttyS0 selinux=0  SYSTEMD_UNIT_PATH=/usr/lib/systemd/tests/testdata/testsuite-14.units:/usr/lib/systemd/tests/testdata/units: systemd.unit=testsuite.target systemd.wants=testsuite-14.service systemd.wants=end.service
> >
> > And this is running inside a KVM instance (so a nested KVM scenario).
> 
> Hi Andrea,
> 
> I think you need to pass -enable-kvm to make it "nested KVM scenario",
> otherwise it's TCG emulation.

So, IIUC I shouldn't hit the QEMU TCG issue mentioned by Marco, right?

> 
> You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> secs timeout for TCG emulation to avoid false positive warnings:
> https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> There are a number of other timeouts raised as well, some as high as
> 420 seconds.

I see, I'll try with these settings and see if I can still hit the soft
lockup messages.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  9:23           ` Andrea Righi
@ 2021-10-11  9:52             ` Andrea Righi
  2021-10-11 10:03               ` Marco Elver
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-11  9:52 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
...
> > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > secs timeout for TCG emulation to avoid false positive warnings:
> > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > There are a number of other timeouts raised as well, some as high as
> > 420 seconds.
> 
> I see, I'll try with these settings and see if I can still hit the soft
> lockup messages.

Still getting soft lockup messages even with the new timeout settings:

[  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
[  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
[  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
[  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]

-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11  9:52             ` Andrea Righi
@ 2021-10-11 10:03               ` Marco Elver
  2021-10-11 14:42                 ` Andrea Righi
  0 siblings, 1 reply; 15+ messages in thread
From: Marco Elver @ 2021-10-11 10:03 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> ...
> > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > secs timeout for TCG emulation to avoid false positive warnings:
> > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > There are a number of other timeouts raised as well, some as high as
> > > 420 seconds.
> >
> > I see, I'll try with these settings and see if I can still hit the soft
> > lockup messages.
>
> Still getting soft lockup messages even with the new timeout settings:
>
> [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]

The lockups are expected if you're hitting the TCG bug I linked. Try
to pass '-enable-kvm' to the inner qemu instance (my bad if you
already have), assuming that's somehow easy to do.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11 10:03               ` Marco Elver
@ 2021-10-11 14:42                 ` Andrea Righi
  2021-10-11 15:00                   ` Dmitry Vyukov
  2021-10-20  6:00                   ` Marco Elver
  0 siblings, 2 replies; 15+ messages in thread
From: Andrea Righi @ 2021-10-11 14:42 UTC (permalink / raw)
  To: Marco Elver
  Cc: Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > ...
> > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > There are a number of other timeouts raised as well, some as high as
> > > > 420 seconds.
> > >
> > > I see, I'll try with these settings and see if I can still hit the soft
> > > lockup messages.
> >
> > Still getting soft lockup messages even with the new timeout settings:
> >
> > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> 
> The lockups are expected if you're hitting the TCG bug I linked. Try
> to pass '-enable-kvm' to the inner qemu instance (my bad if you
> already have), assuming that's somehow easy to do.

If I add '-enable-kvm' I can triggering other random panics (almost
immediately), like this one for example:

[21383.189976] BUG: kernel NULL pointer dereference, address: 0000000000000098
[21383.190633] #PF: supervisor read access in kernel mode
[21383.191072] #PF: error_code(0x0000) - not-present page
[21383.191529] PGD 0 P4D 0 
[21383.191771] Oops: 0000 [#1] SMP NOPTI
[21383.192113] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15-rc4
[21383.192757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[21383.193414] RIP: 0010:wb_timer_fn+0x44/0x3c0
[21383.193855] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
[21383.195366] RSP: 0018:ffffbcd140003e68 EFLAGS: 00010246
[21383.195842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
[21383.196425] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a3521f4fd80
[21383.197010] RBP: ffffbcd140003e90 R08: 0000000000000000 R09: 0000000000000000
[21383.197594] R10: 0000000000000004 R11: 000000000000000f R12: ffff9a34c75c4900
[21383.198178] R13: ffff9a34c3906de0 R14: 0000000000000000 R15: ffff9a353dc18c00
[21383.198763] FS:  0000000000000000(0000) GS:ffff9a353dc00000(0000) knlGS:0000000000000000
[21383.199558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21383.200212] CR2: 0000000000000098 CR3: 0000000005f54000 CR4: 00000000000006f0
[21383.200930] Call Trace:
[21383.201210]  <IRQ>
[21383.201461]  ? blk_stat_free_callback_rcu+0x30/0x30
[21383.202692]  blk_stat_timer_fn+0x138/0x140
[21383.203180]  call_timer_fn+0x2b/0x100
[21383.203666]  __run_timers.part.0+0x1d1/0x240
[21383.204227]  ? kvm_clock_get_cycles+0x11/0x20
[21383.204815]  ? ktime_get+0x3e/0xa0
[21383.205309]  ? native_apic_msr_write+0x2c/0x30
[21383.205914]  ? lapic_next_event+0x20/0x30
[21383.206412]  ? clockevents_program_event+0x94/0xf0
[21383.206873]  run_timer_softirq+0x2a/0x50
[21383.207260]  __do_softirq+0xcb/0x26f
[21383.207647]  irq_exit_rcu+0x8c/0xb0
[21383.208010]  sysvec_apic_timer_interrupt+0x7c/0x90
[21383.208464]  </IRQ>
[21383.208713]  asm_sysvec_apic_timer_interrupt+0x12/0x20

I think that systemd autotest used to use -enable-kvm, but then they
removed it, because it was introducing too many problems in the nested
KVM context. I'm not sure about the nature of those problems though, I
can investigate a bit and see if I can understand what they were
exactly.

-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11 14:42                 ` Andrea Righi
@ 2021-10-11 15:00                   ` Dmitry Vyukov
  2021-10-11 15:11                     ` Andrea Righi
  2021-10-20  6:00                   ` Marco Elver
  1 sibling, 1 reply; 15+ messages in thread
From: Dmitry Vyukov @ 2021-10-11 15:00 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 16:42, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> > On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > > ...
> > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > > There are a number of other timeouts raised as well, some as high as
> > > > > 420 seconds.
> > > >
> > > > I see, I'll try with these settings and see if I can still hit the soft
> > > > lockup messages.
> > >
> > > Still getting soft lockup messages even with the new timeout settings:
> > >
> > > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> >
> > The lockups are expected if you're hitting the TCG bug I linked. Try
> > to pass '-enable-kvm' to the inner qemu instance (my bad if you
> > already have), assuming that's somehow easy to do.
>
> If I add '-enable-kvm' I can triggering other random panics (almost
> immediately), like this one for example:
>
> [21383.189976] BUG: kernel NULL pointer dereference, address: 0000000000000098
> [21383.190633] #PF: supervisor read access in kernel mode
> [21383.191072] #PF: error_code(0x0000) - not-present page
> [21383.191529] PGD 0 P4D 0
> [21383.191771] Oops: 0000 [#1] SMP NOPTI
> [21383.192113] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15-rc4
> [21383.192757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> [21383.193414] RIP: 0010:wb_timer_fn+0x44/0x3c0
> [21383.193855] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
> [21383.195366] RSP: 0018:ffffbcd140003e68 EFLAGS: 00010246
> [21383.195842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
> [21383.196425] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a3521f4fd80
> [21383.197010] RBP: ffffbcd140003e90 R08: 0000000000000000 R09: 0000000000000000
> [21383.197594] R10: 0000000000000004 R11: 000000000000000f R12: ffff9a34c75c4900
> [21383.198178] R13: ffff9a34c3906de0 R14: 0000000000000000 R15: ffff9a353dc18c00
> [21383.198763] FS:  0000000000000000(0000) GS:ffff9a353dc00000(0000) knlGS:0000000000000000
> [21383.199558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [21383.200212] CR2: 0000000000000098 CR3: 0000000005f54000 CR4: 00000000000006f0
> [21383.200930] Call Trace:
> [21383.201210]  <IRQ>
> [21383.201461]  ? blk_stat_free_callback_rcu+0x30/0x30
> [21383.202692]  blk_stat_timer_fn+0x138/0x140
> [21383.203180]  call_timer_fn+0x2b/0x100
> [21383.203666]  __run_timers.part.0+0x1d1/0x240
> [21383.204227]  ? kvm_clock_get_cycles+0x11/0x20
> [21383.204815]  ? ktime_get+0x3e/0xa0
> [21383.205309]  ? native_apic_msr_write+0x2c/0x30
> [21383.205914]  ? lapic_next_event+0x20/0x30
> [21383.206412]  ? clockevents_program_event+0x94/0xf0
> [21383.206873]  run_timer_softirq+0x2a/0x50
> [21383.207260]  __do_softirq+0xcb/0x26f
> [21383.207647]  irq_exit_rcu+0x8c/0xb0
> [21383.208010]  sysvec_apic_timer_interrupt+0x7c/0x90
> [21383.208464]  </IRQ>
> [21383.208713]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> I think that systemd autotest used to use -enable-kvm, but then they
> removed it, because it was introducing too many problems in the nested
> KVM context. I'm not sure about the nature of those problems though, I
> can investigate a bit and see if I can understand what they were
> exactly.

This looks like just a plain bug in wb_timer_fn, not something related
to virtualization.
Do you have this fix?
https://syzkaller.appspot.com/bug?extid=aa0801b6b32dca9dda82

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11 15:00                   ` Dmitry Vyukov
@ 2021-10-11 15:11                     ` Andrea Righi
  2021-10-11 15:16                       ` Dmitry Vyukov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrea Righi @ 2021-10-11 15:11 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, Oct 11, 2021 at 05:00:15PM +0200, Dmitry Vyukov wrote:
> On Mon, 11 Oct 2021 at 16:42, Andrea Righi <andrea.righi@canonical.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> > > On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > > > ...
> > > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > > > There are a number of other timeouts raised as well, some as high as
> > > > > > 420 seconds.
> > > > >
> > > > > I see, I'll try with these settings and see if I can still hit the soft
> > > > > lockup messages.
> > > >
> > > > Still getting soft lockup messages even with the new timeout settings:
> > > >
> > > > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > > > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > > > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > > > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> > >
> > > The lockups are expected if you're hitting the TCG bug I linked. Try
> > > to pass '-enable-kvm' to the inner qemu instance (my bad if you
> > > already have), assuming that's somehow easy to do.
> >
> > If I add '-enable-kvm' I can triggering other random panics (almost
> > immediately), like this one for example:
> >
> > [21383.189976] BUG: kernel NULL pointer dereference, address: 0000000000000098
> > [21383.190633] #PF: supervisor read access in kernel mode
> > [21383.191072] #PF: error_code(0x0000) - not-present page
> > [21383.191529] PGD 0 P4D 0
> > [21383.191771] Oops: 0000 [#1] SMP NOPTI
> > [21383.192113] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15-rc4
> > [21383.192757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> > [21383.193414] RIP: 0010:wb_timer_fn+0x44/0x3c0
> > [21383.193855] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
> > [21383.195366] RSP: 0018:ffffbcd140003e68 EFLAGS: 00010246
> > [21383.195842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
> > [21383.196425] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a3521f4fd80
> > [21383.197010] RBP: ffffbcd140003e90 R08: 0000000000000000 R09: 0000000000000000
> > [21383.197594] R10: 0000000000000004 R11: 000000000000000f R12: ffff9a34c75c4900
> > [21383.198178] R13: ffff9a34c3906de0 R14: 0000000000000000 R15: ffff9a353dc18c00
> > [21383.198763] FS:  0000000000000000(0000) GS:ffff9a353dc00000(0000) knlGS:0000000000000000
> > [21383.199558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [21383.200212] CR2: 0000000000000098 CR3: 0000000005f54000 CR4: 00000000000006f0
> > [21383.200930] Call Trace:
> > [21383.201210]  <IRQ>
> > [21383.201461]  ? blk_stat_free_callback_rcu+0x30/0x30
> > [21383.202692]  blk_stat_timer_fn+0x138/0x140
> > [21383.203180]  call_timer_fn+0x2b/0x100
> > [21383.203666]  __run_timers.part.0+0x1d1/0x240
> > [21383.204227]  ? kvm_clock_get_cycles+0x11/0x20
> > [21383.204815]  ? ktime_get+0x3e/0xa0
> > [21383.205309]  ? native_apic_msr_write+0x2c/0x30
> > [21383.205914]  ? lapic_next_event+0x20/0x30
> > [21383.206412]  ? clockevents_program_event+0x94/0xf0
> > [21383.206873]  run_timer_softirq+0x2a/0x50
> > [21383.207260]  __do_softirq+0xcb/0x26f
> > [21383.207647]  irq_exit_rcu+0x8c/0xb0
> > [21383.208010]  sysvec_apic_timer_interrupt+0x7c/0x90
> > [21383.208464]  </IRQ>
> > [21383.208713]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > I think that systemd autotest used to use -enable-kvm, but then they
> > removed it, because it was introducing too many problems in the nested
> > KVM context. I'm not sure about the nature of those problems though, I
> > can investigate a bit and see if I can understand what they were
> > exactly.
> 
> This looks like just a plain bug in wb_timer_fn, not something related
> to virtualization.
> Do you have this fix?
> https://syzkaller.appspot.com/bug?extid=aa0801b6b32dca9dda82

Yes, it looks like I have this:

 d152c682f03c block: add an explicit ->disk backpointer to the request_queue

-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11 15:11                     ` Andrea Righi
@ 2021-10-11 15:16                       ` Dmitry Vyukov
  0 siblings, 0 replies; 15+ messages in thread
From: Dmitry Vyukov @ 2021-10-11 15:16 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Marco Elver, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 17:11, Andrea Righi <andrea.righi@canonical.com> wrote:
>
> On Mon, Oct 11, 2021 at 05:00:15PM +0200, Dmitry Vyukov wrote:
> > On Mon, 11 Oct 2021 at 16:42, Andrea Righi <andrea.righi@canonical.com> wrote:
> > >
> > > On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> > > > On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > > > > ...
> > > > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > > > > There are a number of other timeouts raised as well, some as high as
> > > > > > > 420 seconds.
> > > > > >
> > > > > > I see, I'll try with these settings and see if I can still hit the soft
> > > > > > lockup messages.
> > > > >
> > > > > Still getting soft lockup messages even with the new timeout settings:
> > > > >
> > > > > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > > > > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > > > > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > > > > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> > > >
> > > > The lockups are expected if you're hitting the TCG bug I linked. Try
> > > > to pass '-enable-kvm' to the inner qemu instance (my bad if you
> > > > already have), assuming that's somehow easy to do.
> > >
> > > If I add '-enable-kvm' I can triggering other random panics (almost
> > > immediately), like this one for example:
> > >
> > > [21383.189976] BUG: kernel NULL pointer dereference, address: 0000000000000098
> > > [21383.190633] #PF: supervisor read access in kernel mode
> > > [21383.191072] #PF: error_code(0x0000) - not-present page
> > > [21383.191529] PGD 0 P4D 0
> > > [21383.191771] Oops: 0000 [#1] SMP NOPTI
> > > [21383.192113] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15-rc4
> > > [21383.192757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> > > [21383.193414] RIP: 0010:wb_timer_fn+0x44/0x3c0
> > > [21383.193855] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
> > > [21383.195366] RSP: 0018:ffffbcd140003e68 EFLAGS: 00010246
> > > [21383.195842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
> > > [21383.196425] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a3521f4fd80
> > > [21383.197010] RBP: ffffbcd140003e90 R08: 0000000000000000 R09: 0000000000000000
> > > [21383.197594] R10: 0000000000000004 R11: 000000000000000f R12: ffff9a34c75c4900
> > > [21383.198178] R13: ffff9a34c3906de0 R14: 0000000000000000 R15: ffff9a353dc18c00
> > > [21383.198763] FS:  0000000000000000(0000) GS:ffff9a353dc00000(0000) knlGS:0000000000000000
> > > [21383.199558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [21383.200212] CR2: 0000000000000098 CR3: 0000000005f54000 CR4: 00000000000006f0
> > > [21383.200930] Call Trace:
> > > [21383.201210]  <IRQ>
> > > [21383.201461]  ? blk_stat_free_callback_rcu+0x30/0x30
> > > [21383.202692]  blk_stat_timer_fn+0x138/0x140
> > > [21383.203180]  call_timer_fn+0x2b/0x100
> > > [21383.203666]  __run_timers.part.0+0x1d1/0x240
> > > [21383.204227]  ? kvm_clock_get_cycles+0x11/0x20
> > > [21383.204815]  ? ktime_get+0x3e/0xa0
> > > [21383.205309]  ? native_apic_msr_write+0x2c/0x30
> > > [21383.205914]  ? lapic_next_event+0x20/0x30
> > > [21383.206412]  ? clockevents_program_event+0x94/0xf0
> > > [21383.206873]  run_timer_softirq+0x2a/0x50
> > > [21383.207260]  __do_softirq+0xcb/0x26f
> > > [21383.207647]  irq_exit_rcu+0x8c/0xb0
> > > [21383.208010]  sysvec_apic_timer_interrupt+0x7c/0x90
> > > [21383.208464]  </IRQ>
> > > [21383.208713]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> > >
> > > I think that systemd autotest used to use -enable-kvm, but then they
> > > removed it, because it was introducing too many problems in the nested
> > > KVM context. I'm not sure about the nature of those problems though, I
> > > can investigate a bit and see if I can understand what they were
> > > exactly.
> >
> > This looks like just a plain bug in wb_timer_fn, not something related
> > to virtualization.
> > Do you have this fix?
> > https://syzkaller.appspot.com/bug?extid=aa0801b6b32dca9dda82
>
> Yes, it looks like I have this:
>
>  d152c682f03c block: add an explicit ->disk backpointer to the request_queue

Then there is another bug in wb_timer_fn I guess...

Don't know if this is the same or something else:
https://lore.kernel.org/lkml/CAHbLzkrdGva2dzO36r62LKv_ip5trbMK0BO3vCeSBk2_7OE-zA@mail.gmail.com/

There also were some data races in this function:
https://groups.google.com/g/syzkaller-upstream-moderation/search?q=wb_timer_fn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-11 14:42                 ` Andrea Righi
  2021-10-11 15:00                   ` Dmitry Vyukov
@ 2021-10-20  6:00                   ` Marco Elver
  2021-10-20  6:11                     ` Andrea Righi
  1 sibling, 1 reply; 15+ messages in thread
From: Marco Elver @ 2021-10-20  6:00 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Mon, 11 Oct 2021 at 16:42, Andrea Righi <andrea.righi@canonical.com> wrote:
> On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> > On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > > ...
> > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > > There are a number of other timeouts raised as well, some as high as
> > > > > 420 seconds.
> > > >
> > > > I see, I'll try with these settings and see if I can still hit the soft
> > > > lockup messages.
> > >
> > > Still getting soft lockup messages even with the new timeout settings:
> > >
> > > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> >
> > The lockups are expected if you're hitting the TCG bug I linked. Try
> > to pass '-enable-kvm' to the inner qemu instance (my bad if you
> > already have), assuming that's somehow easy to do.
>
> If I add '-enable-kvm' I can triggering other random panics (almost
> immediately), like this one for example:

Just FYI: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com

But you can already flip that switch in your config
(CONFIG_KFENCE_STATIC_KEYS=n), which we recommend as a default now.

As a side-effect it'd also make your QEMU TCG tests pass.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled
  2021-10-20  6:00                   ` Marco Elver
@ 2021-10-20  6:11                     ` Andrea Righi
  0 siblings, 0 replies; 15+ messages in thread
From: Andrea Righi @ 2021-10-20  6:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: Dmitry Vyukov, Alexander Potapenko, kasan-dev, linux-mm, linux-kernel

On Wed, Oct 20, 2021 at 08:00:00AM +0200, Marco Elver wrote:
> On Mon, 11 Oct 2021 at 16:42, Andrea Righi <andrea.righi@canonical.com> wrote:
> > On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote:
> > > On Mon, 11 Oct 2021 at 11:53, Andrea Righi <andrea.righi@canonical.com> wrote:
> > > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote:
> > > > ...
> > > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160
> > > > > > secs timeout for TCG emulation to avoid false positive warnings:
> > > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509
> > > > > > There are a number of other timeouts raised as well, some as high as
> > > > > > 420 seconds.
> > > > >
> > > > > I see, I'll try with these settings and see if I can still hit the soft
> > > > > lockup messages.
> > > >
> > > > Still getting soft lockup messages even with the new timeout settings:
> > > >
> > > > [  462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168]
> > > > [  462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171]
> > > > [  924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168]
> > > > [  924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171]
> > >
> > > The lockups are expected if you're hitting the TCG bug I linked. Try
> > > to pass '-enable-kvm' to the inner qemu instance (my bad if you
> > > already have), assuming that's somehow easy to do.
> >
> > If I add '-enable-kvm' I can triggering other random panics (almost
> > immediately), like this one for example:
> 
> Just FYI: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com
> 
> But you can already flip that switch in your config
> (CONFIG_KFENCE_STATIC_KEYS=n), which we recommend as a default now.
> 
> As a side-effect it'd also make your QEMU TCG tests pass.

Cool! Thanks for the update!

And about the other panic that I was getting it seems to be fixed by
this one:
https://lore.kernel.org/lkml/YW6N2qXpBU3oc50q@arighi-desktop/T/#u

-Andrea

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-10-20  6:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-10 13:53 BUG: soft lockup in __kmalloc_node() with KFENCE enabled Andrea Righi
2021-10-11  6:00 ` Marco Elver
2021-10-11  6:32   ` Andrea Righi
2021-10-11  6:48     ` Marco Elver
2021-10-11  7:10       ` Andrea Righi
2021-10-11  7:19         ` Dmitry Vyukov
2021-10-11  9:23           ` Andrea Righi
2021-10-11  9:52             ` Andrea Righi
2021-10-11 10:03               ` Marco Elver
2021-10-11 14:42                 ` Andrea Righi
2021-10-11 15:00                   ` Dmitry Vyukov
2021-10-11 15:11                     ` Andrea Righi
2021-10-11 15:16                       ` Dmitry Vyukov
2021-10-20  6:00                   ` Marco Elver
2021-10-20  6:11                     ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).