linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
@ 2023-05-25  2:59 Pengfei Xu
  2023-05-25  3:51 ` Eric Sandeen
  0 siblings, 1 reply; 11+ messages in thread
From: Pengfei Xu @ 2023-05-25  2:59 UTC (permalink / raw)
  To: dchinner; +Cc: djwong, heng.su, linux-xfs, linux-fsdevel, lkp

Hi Dave,

Greeting!

Platform: Alder lake
There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.

Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin

Suspected commit is as follow due to 2 skip commits:
 # possible first bad commit: [f8f1ed1ab3babad46b25e2dbe8de43b33fe7aaa6] xfs: return a referenced perag from filestreams allocator
 # possible first bad commit: [571e259282a43f58b1f70dcbf2add20d8c83a72b] xfs: pass perag to filestreams tracing  (skip)
"
fs/xfs/xfs_filestream.c: In function ‘xfs_filestream_pick_ag’:
fs/xfs/xfs_filestream.c:92:4: error: label ‘next_ag’ used but not defined
    goto next_ag;
    ^~~~
make[3]: *** [scripts/Makefile.build:252: fs/xfs/xfs_filestream.o] Error 1
make[2]: *** [scripts/Makefile.build:504: fs/xfs] Error 2
make[1]: *** [scripts/Makefile.build:504: fs] Error 2
make: *** [Makefile:2021: .] Error 2
"

 # possible first bad commit: [eb70aa2d8ed9a6fc3525f305226c550524390cd2] xfs: use for_each_perag_wrap in xfs_filestream_pick_ag (skip)
"
fs/xfs/xfs_filestream.c: In function ‘xfs_filestream_pick_ag’:
fs/xfs/xfs_filestream.c:111:4: error: label ‘next_ag’ used but not defined
    goto next_ag;
    ^~~~
make[3]: *** [scripts/Makefile.build:252: fs/xfs/xfs_filestream.o] Error 1
make[2]: *** [scripts/Makefile.build:504: fs/xfs] Error 2
make[1]: *** [scripts/Makefile.build:504: fs] Error 2
make: *** [Makefile:2021: .] Error 2
"

"
[   29.223473] XFS (loop0): Unmounting Filesystem d408de26-55fb-48ab-a8ab-aacedb20f9dd
[   29.223942] XFS (loop0): SB summary counter sanity check failed
[   29.224173] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x7d/0x180, xfs_sb block 0x0 
[   29.224544] XFS (loop0): Unmount and run xfs_repair
[   29.224731] XFS (loop0): First 128 bytes of corrupted metadata buffer:
[   29.224979] 00000000: 58 46 53 42 00 00 04 00 00 00 00 00 00 00 80 00  XFSB............
[   29.225304] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   29.225603] 00000020: d4 08 de 26 55 fb 48 ab a8 ab aa ce db 20 f9 dd  ...&U.H...... ..
[   29.225902] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 00 20  ......@........ 
[   29.226200] 00000040: 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 22  .......!......."
[   29.226503] 00000050: 00 00 00 04 00 00 40 00 00 00 00 02 00 00 00 00  ......@.........
[   29.226802] 00000060: 00 00 04 98 b4 f5 02 00 02 00 00 02 00 00 00 00  ................
[   29.227101] 00000070: 00 00 00 00 00 00 00 00 0a 09 09 01 0e 00 00 14  ................
[   29.228273] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x5d8/0x5f0 (fs/xfs/xfs_buf.c:1552).  Shutting down filesystem.
[   29.228788] XFS (loop0): Please unmount the filesystem and rectify the problem(s)
[   56.322257] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [repro:529]
[   56.322608] Modules linked in:
[   56.322733] irq event stamp: 22632
[   56.322866] hardirqs last  enabled at (22631): [<ffffffff82fad69e>] irqentry_exit+0x3e/0xa0
[   56.323185] hardirqs last disabled at (22632): [<ffffffff82fab7b3>] sysvec_apic_timer_interrupt+0x13/0xe0
[   56.323550] softirqs last  enabled at (9060): [<ffffffff82fcf8e9>] __do_softirq+0x2d9/0x3c3
[   56.323865] softirqs last disabled at (8463): [<ffffffff81126714>] irq_exit_rcu+0xc4/0x100
[   56.324179] CPU: 1 PID: 529 Comm: repro Not tainted 6.4.0-rc3-44c026a73be8+ #1
[   56.324455] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   56.324877] RIP: 0010:write_comp_data+0x0/0x90
[   56.325056] Code: 85 d2 74 0b 8b 86 c8 1d 00 00 39 f8 0f 94 c0 5d c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 41 57 49 89 d7 41 56 49 89 fe bf 03 00 00 00 41 55 49
[   56.325736] RSP: 0018:ffffc90000f5bc60 EFLAGS: 00000246
[   56.325936] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81a138ea
[   56.326204] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
[   56.326475] RBP: ffffc90000f5bc68 R08: 0000000000000000 R09: 000000000000001c
[   56.326744] R10: 0000000000000001 R11: ffffffff83d64580 R12: ffffffff81ac0c81
[   56.327011] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8880134bf900
[   56.327278] FS:  00007f85f5814740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[   56.327580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.327798] CR2: 00007fe75831f000 CR3: 000000000deb4004 CR4: 0000000000770ee0
[   56.328067] PKRU: 55555554
[   56.328176] Call Trace:
[   56.328273]  <TASK>
[   56.328359]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[   56.328571]  xfs_perag_grab_tag+0x27a/0x460
[   56.328745]  xfs_icwalk+0x31/0xf0
[   56.328884]  xfs_reclaim_inodes+0xc6/0x140
[   56.329051]  xfs_unmount_flush_inodes+0x63/0x80
[   56.329235]  xfs_unmountfs+0x69/0x1f0
[   56.329389]  xfs_fs_put_super+0x5a/0x120
[   56.329548]  ? __pfx_xfs_fs_put_super+0x10/0x10
[   56.329730]  generic_shutdown_super+0xac/0x240
[   56.329909]  kill_block_super+0x46/0x90
[   56.330063]  deactivate_locked_super+0x52/0xb0
[   56.330242]  deactivate_super+0xb3/0xd0
[   56.330400]  cleanup_mnt+0x15e/0x1e0
[   56.330553]  __cleanup_mnt+0x1f/0x30
[   56.330704]  task_work_run+0xb6/0x120
[   56.330853]  exit_to_user_mode_prepare+0x200/0x210
[   56.331045]  syscall_exit_to_user_mode+0x2d/0x60
[   56.331229]  do_syscall_64+0x4a/0x90
[   56.331379]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   56.331575] RIP: 0033:0x7f85f59407db
[   56.331718] Code: 96 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 96 0c 00 f7 d8 64 89 01 48
[   56.332395] RSP: 002b:00007ffd74badbc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[   56.332680] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f85f59407db
[   56.332945] RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007ffd74badc70
[   56.333216] RBP: 00007ffd74baecb0 R08: 0000000000fb4333 R09: 0000000000000009
[   56.333486] R10: 0000000000404071 R11: 0000000000000202 R12: 00000000004012c0
[   56.333752] R13: 00007ffd74baedf0 R14: 0000000000000000 R15: 0000000000000000
[   56.334026]  </TASK>
[   56.334116] Kernel panic - not syncing: softlockup: hung tasks
"
I hope this time is accurate and helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Thanks!
BR.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  2:59 [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel Pengfei Xu
@ 2023-05-25  3:51 ` Eric Sandeen
  2023-05-25  5:44   ` Pengfei Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2023-05-25  3:51 UTC (permalink / raw)
  To: Pengfei Xu, dchinner; +Cc: djwong, heng.su, linux-xfs, linux-fsdevel, lkp

On 5/24/23 9:59 PM, Pengfei Xu wrote:
> Hi Dave,
> 
> Greeting!
> 
> Platform: Alder lake
> There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.
> 
> Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
> Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
> Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
> Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin

There was a lot of discussion yesterday about how turning the crank on 
syzkaller and throwing un-triaged bug reports over the wall at 
stressed-out xfs developers isn't particularly helpful.

There was also a very specific concern raised in that discussion:

> IOWs, the bug report is deficient and not complete, and so I'm
> forced to spend unnecessary time trying to work out how to extract
> the filesystem image from a weird syzkaller report that is basically
> just a bunch of undocumented blobs in a github tree.

but here we are again, with another undocumented blob in a github tree, 
and no meaningful attempt at triage.

Syzbot at least is now providing filesystem images[1], which relieves 
some of the burden on the filesystem developers you're expecting to fix 
these bugs.

Perhaps before you send the /next/ filesystem-related syzkaller report, 
you can at least work out how to provide a standard filesystem image as 
part of the reproducer, one that can be examined with normal filesystem 
development and debugging tools?

[1]
https://lore.kernel.org/lkml/0000000000001f239205fb969174@google.com/T/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  3:51 ` Eric Sandeen
@ 2023-05-25  5:44   ` Pengfei Xu
  2023-05-25  6:15     ` Dave Chinner
  2023-05-25 14:17     ` Eric Sandeen
  0 siblings, 2 replies; 11+ messages in thread
From: Pengfei Xu @ 2023-05-25  5:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: dchinner, djwong, heng.su, linux-xfs, linux-fsdevel, lkp

On 2023-05-24 at 22:51:27 -0500, Eric Sandeen wrote:
> On 5/24/23 9:59 PM, Pengfei Xu wrote:
> > Hi Dave,
> > 
> > Greeting!
> > 
> > Platform: Alder lake
> > There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.
> > 
> > Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
> > Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
> > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
> > Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
> > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
> > Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin
> 
> There was a lot of discussion yesterday about how turning the crank on
> syzkaller and throwing un-triaged bug reports over the wall at stressed-out
> xfs developers isn't particularly helpful.
> 
> There was also a very specific concern raised in that discussion:
> 
> > IOWs, the bug report is deficient and not complete, and so I'm
> > forced to spend unnecessary time trying to work out how to extract
> > the filesystem image from a weird syzkaller report that is basically
> > just a bunch of undocumented blobs in a github tree.
> 
> but here we are again, with another undocumented blob in a github tree, and
> no meaningful attempt at triage.
> 
> Syzbot at least is now providing filesystem images[1], which relieves some
> of the burden on the filesystem developers you're expecting to fix these
> bugs.
> 
> Perhaps before you send the /next/ filesystem-related syzkaller report, you
> can at least work out how to provide a standard filesystem image as part of
> the reproducer, one that can be examined with normal filesystem development
> and debugging tools?
> 
  There is a standard filesystem image after

git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
image is named as centos8_3.img, and will boot by start3.sh.

There is bzImage v6.4-rc3 in link: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bzImage_v64rc3
You could use it to boot v6.4-rc3 kernel.

./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Then you could reproduce this issue easily in above environment.

Thanks!
BR.

> [1]
> https://lore.kernel.org/lkml/0000000000001f239205fb969174@google.com/T/
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  5:44   ` Pengfei Xu
@ 2023-05-25  6:15     ` Dave Chinner
  2023-05-25 17:55       ` Theodore Ts'o
  2023-05-26  4:55       ` Pengfei Xu
  2023-05-25 14:17     ` Eric Sandeen
  1 sibling, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2023-05-25  6:15 UTC (permalink / raw)
  To: Pengfei Xu
  Cc: Eric Sandeen, dchinner, djwong, heng.su, linux-xfs, linux-fsdevel, lkp

On Thu, May 25, 2023 at 01:44:31PM +0800, Pengfei Xu wrote:
> On 2023-05-24 at 22:51:27 -0500, Eric Sandeen wrote:
> > On 5/24/23 9:59 PM, Pengfei Xu wrote:
> > > Hi Dave,
> > > 
> > > Greeting!
> > > 
> > > Platform: Alder lake
> > > There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.
> > > 
> > > Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
> > > Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
> > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
> > > Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
> > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
> > > Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin
> > 
> > There was a lot of discussion yesterday about how turning the crank on
> > syzkaller and throwing un-triaged bug reports over the wall at stressed-out
> > xfs developers isn't particularly helpful.
> > 
> > There was also a very specific concern raised in that discussion:
> > 
> > > IOWs, the bug report is deficient and not complete, and so I'm
> > > forced to spend unnecessary time trying to work out how to extract
> > > the filesystem image from a weird syzkaller report that is basically
> > > just a bunch of undocumented blobs in a github tree.
> > 
> > but here we are again, with another undocumented blob in a github tree, and
> > no meaningful attempt at triage.
> > 
> > Syzbot at least is now providing filesystem images[1], which relieves some
> > of the burden on the filesystem developers you're expecting to fix these
> > bugs.
> > 
> > Perhaps before you send the /next/ filesystem-related syzkaller report, you
> > can at least work out how to provide a standard filesystem image as part of
> > the reproducer, one that can be examined with normal filesystem development
> > and debugging tools?
> > 
>   There is a standard filesystem image after
> 
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> image is named as centos8_3.img, and will boot by start3.sh.

No. That is not the filesystem image that is being asked for. The
syzkaller reproducer (i.e. what you call repro.c) contructs a
filesystem image in it's own memory which it then mounts and runs
the test operations on.  That's the filesystem image that we need
extracted into a separate image file because that's the one that is
corrupted and we need to look at when triaging these issues.
Google's syzbot does this now, so your syzkaller bot should also be
able to do it.

Please go and talk to the syzkaller authors to find out how they
extract filesystem images from the reproducer, and any other
information they've also been asked to provide for triage
purposes.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  5:44   ` Pengfei Xu
  2023-05-25  6:15     ` Dave Chinner
@ 2023-05-25 14:17     ` Eric Sandeen
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2023-05-25 14:17 UTC (permalink / raw)
  To: Pengfei Xu; +Cc: dchinner, djwong, heng.su, linux-xfs, linux-fsdevel, lkp

On 5/25/23 12:44 AM, Pengfei Xu wrote:
> On 2023-05-24 at 22:51:27 -0500, Eric Sandeen wrote:
>> On 5/24/23 9:59 PM, Pengfei Xu wrote:
>>> Hi Dave,
>>>
>>> Greeting!
>>>
>>> Platform: Alder lake
>>> There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.
>>>
>>> Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
>>> Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
>>> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
>>> Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
>>> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
>>> Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin
>>
>> There was a lot of discussion yesterday about how turning the crank on
>> syzkaller and throwing un-triaged bug reports over the wall at stressed-out
>> xfs developers isn't particularly helpful.
>>
>> There was also a very specific concern raised in that discussion:
>>
>>> IOWs, the bug report is deficient and not complete, and so I'm
>>> forced to spend unnecessary time trying to work out how to extract
>>> the filesystem image from a weird syzkaller report that is basically
>>> just a bunch of undocumented blobs in a github tree.
>>
>> but here we are again, with another undocumented blob in a github tree, and
>> no meaningful attempt at triage.
>>
>> Syzbot at least is now providing filesystem images[1], which relieves some
>> of the burden on the filesystem developers you're expecting to fix these
>> bugs.
>>
>> Perhaps before you send the /next/ filesystem-related syzkaller report, you
>> can at least work out how to provide a standard filesystem image as part of
>> the reproducer, one that can be examined with normal filesystem development
>> and debugging tools?
>>
>    There is a standard filesystem image after
> 
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> image is named as centos8_3.img, and will boot by start3.sh.

Honestly, this suggests to me that you don't really have much 
understanding at all about the bugs you're reporting.

> There is bzImage v6.4-rc3 in link: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bzImage_v64rc3
> You could use it to boot v6.4-rc3 kernel.
> 
> ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>    // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>    // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
> 
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
> 
> Then you could reproduce this issue easily in above environment.

You seem to be suggesting that the xfs developers should go do /more 
work/ to get to the bare minimum of a decent fuzzed filesystem bug 
report, instead of you doing a little bit of prep work yourself by 
providing the fuzzed filesystem image itself?

Your github account says you are "looking to collaborate on Linux kernel 
learning" - tossing auto-generated and difficult-to-triage bug reports 
at other developers is not collaboration. Wouldn't it be more 
interesting to take the time to understand the reports you're 
generating, find ways to make them more accessible/debuggable, and/or 
take some time to look into the problems yourself, in order to learn 
about the code you're turning the crank on?

> Thanks!
> BR.
> 
>> [1]
>> https://lore.kernel.org/lkml/0000000000001f239205fb969174@google.com/T/
>>
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  6:15     ` Dave Chinner
@ 2023-05-25 17:55       ` Theodore Ts'o
  2023-05-26  6:43         ` Pengfei Xu
  2023-05-26 17:42         ` Dave Hansen
  2023-05-26  4:55       ` Pengfei Xu
  1 sibling, 2 replies; 11+ messages in thread
From: Theodore Ts'o @ 2023-05-25 17:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Pengfei Xu, Eric Sandeen, dchinner, djwong, heng.su, linux-xfs,
	linux-fsdevel, lkp, Aleksandr Nogikh, Dmitry Vyukov

On Thu, May 25, 2023 at 04:15:01PM +1000, Dave Chinner wrote:
> Google's syzbot does this now, so your syzkaller bot should also be
> able to do it....
>
> Please go and talk to the syzkaller authors to find out how they
> extract filesystem images from the reproducer, and any other
> information they've also been asked to provide for triage
> purposes.

Pengfei,

What is it that your syzkaller instance doing that Google's upstream
syzkaller instance is not doing?  Google's syzkaller's team is been
very responsive at improving syzkaller's Web UI, including making it
easy to get artifacts from the syzkaller instance, requesting that
their bot to test a particular git tree or patch (since sometimes
reproducer doesn't easily reproduce on KVM, but easily reproduces in
their Google Cloud VM environment).

So if there is some unique feature which you've added to your syzbot
instances, maybe you can contribute that change upstream, so that
everyone can benefit?  From an upstream developer's perspective, it
also means that I can very easily take a look at the currently active
syzbot reports for a particular subsystem --- for example:

       https://syzkaller.appspot.com/upstream/s/ext4

... and I can see how often a particular syzbot issue reproduces, and
it makes it easier for me to prioritize which syzbot report I should
work on next.  If there is a discussion on a particular report, I can
get a link to that discussion on lore.kernel.org; and once a patch has
been submitted, there is an indication on the dashboard that there is
a PATCH associated with that particular report.

For example, take a look at this report:

	https://syzkaller.appspot.com/bug?extid=e44749b6ba4d0434cd47

... and look at the contents under the Discussion section; and then
open up the "Last patch testing requests" collapsible section.

These are some of the reasons why using Google's instance of syzkaller
is a huge value add --- and quite frankly, it means that I will
prioritize looking at syzkaller reports on the syzkaller.appspot.com
dashboard, where I can easily prioritize which reports are most useful
for me to look at next, over those that you and others might forward
from some company's private syzkaller instance.  It's just far more
productive for me as an upstream maintainer.

Bottom line, having various companies run their own private instances
of syzkaller is much less useful for the upstream community.  If Intel
feels that it's useful to run their own instance, maybe there's some
way you can work with Google syzkaller team so you don't have to do
that?

Are there some improvements to the syzkaller code base Intel would be
willing to contribute to the upstream syzkaller code base at
https://github.com/google/syzkaller?  Or is there some other reason
why Intel is running its own syzkaller instance?

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25  6:15     ` Dave Chinner
  2023-05-25 17:55       ` Theodore Ts'o
@ 2023-05-26  4:55       ` Pengfei Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Pengfei Xu @ 2023-05-26  4:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, dchinner, djwong, heng.su, linux-xfs,
	linux-fsdevel, lkp, nogikh

Hi Dave,

On 2023-05-25 at 16:15:01 +1000, Dave Chinner wrote:
> On Thu, May 25, 2023 at 01:44:31PM +0800, Pengfei Xu wrote:
> > On 2023-05-24 at 22:51:27 -0500, Eric Sandeen wrote:
> > > On 5/24/23 9:59 PM, Pengfei Xu wrote:
> > > > Hi Dave,
> > > > 
> > > > Greeting!
> > > > 
> > > > Platform: Alder lake
> > > > There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel.
> > > > 
> > > > Syzkaller analysis repro.report and bisect detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230524_140757___cleanup_mnt
> > > > Guest machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/machineInfo0
> > > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.c
> > > > Reproduced syscall: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/repro.prog
> > > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/bisect_info.log
> > > > Kconfig origin: https://github.com/xupengfe/syzkaller_logs/blob/main/230524_140757___cleanup_mnt/kconfig_origin
> > > 
> > > There was a lot of discussion yesterday about how turning the crank on
> > > syzkaller and throwing un-triaged bug reports over the wall at stressed-out
> > > xfs developers isn't particularly helpful.
> > > 
> > > There was also a very specific concern raised in that discussion:
> > > 
> > > > IOWs, the bug report is deficient and not complete, and so I'm
> > > > forced to spend unnecessary time trying to work out how to extract
> > > > the filesystem image from a weird syzkaller report that is basically
> > > > just a bunch of undocumented blobs in a github tree.
> > > 
> > > but here we are again, with another undocumented blob in a github tree, and
> > > no meaningful attempt at triage.
> > > 
> > > Syzbot at least is now providing filesystem images[1], which relieves some
> > > of the burden on the filesystem developers you're expecting to fix these
> > > bugs.
> > > 
> > > Perhaps before you send the /next/ filesystem-related syzkaller report, you
> > > can at least work out how to provide a standard filesystem image as part of
> > > the reproducer, one that can be examined with normal filesystem development
> > > and debugging tools?
> > > 
> >   There is a standard filesystem image after
> > 
> > git clone https://gitlab.com/xupengfe/repro_vm_env.git
> > cd repro_vm_env
> > tar -xvf repro_vm_env.tar.gz
> > image is named as centos8_3.img, and will boot by start3.sh.
> 
> No. That is not the filesystem image that is being asked for. The
> syzkaller reproducer (i.e. what you call repro.c) contructs a
> filesystem image in it's own memory which it then mounts and runs
> the test operations on.  That's the filesystem image that we need
> extracted into a separate image file because that's the one that is
> corrupted and we need to look at when triaging these issues.
> Google's syzbot does this now, so your syzkaller bot should also be
> able to do it.
> 
> Please go and talk to the syzkaller authors to find out how they
> extract filesystem images from the reproducer, and any other
> information they've also been asked to provide for triage
> purposes.
> 
  Thanks Dave Chinner's patient suggestion!
  Thanks syzkaller maintainer Aleksandr Nogikh's guidance!
  I put the generated filesystem image file0.gz for mounting in link:
https://github.com/xupengfe/syzkaller_logs/raw/main/230524_140757___cleanup_mnt/file0.gz
  And could "gunzip file0.gz" to get file0.

  Thanks!
  BR.

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25 17:55       ` Theodore Ts'o
@ 2023-05-26  6:43         ` Pengfei Xu
  2023-05-26 17:42         ` Dave Hansen
  1 sibling, 0 replies; 11+ messages in thread
From: Pengfei Xu @ 2023-05-26  6:43 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Dave Chinner, Eric Sandeen, dchinner, djwong, heng.su, linux-xfs,
	linux-fsdevel, lkp, Aleksandr Nogikh, Dmitry Vyukov

Hi Ted,

On 2023-05-25 at 13:55:42 -0400, Theodore Ts'o wrote:
> On Thu, May 25, 2023 at 04:15:01PM +1000, Dave Chinner wrote:
> > Google's syzbot does this now, so your syzkaller bot should also be
> > able to do it....
> >
> > Please go and talk to the syzkaller authors to find out how they
> > extract filesystem images from the reproducer, and any other
> > information they've also been asked to provide for triage
> > purposes.
> 
> Pengfei,
> 
> What is it that your syzkaller instance doing that Google's upstream
> syzkaller instance is not doing?  Google's syzkaller's team is been
> very responsive at improving syzkaller's Web UI, including making it
> easy to get artifacts from the syzkaller instance, requesting that
> their bot to test a particular git tree or patch (since sometimes
> reproducer doesn't easily reproduce on KVM, but easily reproduces in
> their Google Cloud VM environment).
> 
> So if there is some unique feature which you've added to your syzbot
> instances, maybe you can contribute that change upstream, so that
> everyone can benefit?  From an upstream developer's perspective, it
> also means that I can very easily take a look at the currently active
> syzbot reports for a particular subsystem --- for example:
> 
>        https://syzkaller.appspot.com/upstream/s/ext4
> 
> ... and I can see how often a particular syzbot issue reproduces, and
> it makes it easier for me to prioritize which syzbot report I should
> work on next.  If there is a discussion on a particular report, I can
> get a link to that discussion on lore.kernel.org; and once a patch has
> been submitted, there is an indication on the dashboard that there is
> a PATCH associated with that particular report.
> 
> For example, take a look at this report:
> 
> 	https://syzkaller.appspot.com/bug?extid=e44749b6ba4d0434cd47
> 
> ... and look at the contents under the Discussion section; and then
> open up the "Last patch testing requests" collapsible section.
> 
> These are some of the reasons why using Google's instance of syzkaller
> is a huge value add --- and quite frankly, it means that I will
> prioritize looking at syzkaller reports on the syzkaller.appspot.com
> dashboard, where I can easily prioritize which reports are most useful
> for me to look at next, over those that you and others might forward
> from some company's private syzkaller instance.  It's just far more
> productive for me as an upstream maintainer.
> 
> Bottom line, having various companies run their own private instances
> of syzkaller is much less useful for the upstream community.  If Intel
> feels that it's useful to run their own instance, maybe there's some
> way you can work with Google syzkaller team so you don't have to do
> that?
> 
> Are there some improvements to the syzkaller code base Intel would be
> willing to contribute to the upstream syzkaller code base at
> https://github.com/google/syzkaller?  Or is there some other reason
> why Intel is running its own syzkaller instance?
> 
  Yes, I agree that we should work together to improve Syzkaller in case any
  coverage/feature is not supported by Syzkaller to ensure others can benefit
  from it. For example, I added IOMMUFD syscall description and user space
  SHSTK(shadow stack) tests for x86 platforms in syzkaller.

  Syzkaller is an unsupervised coverage-guided kernel fuzzer.
  According to my observation, some issues are platform dependent.
  Intel specific platforms could find some other different mainline kernel
  bugs by syzkaller tool which syzbot(https://syzkaller.appspot.com/upstream)
  doesn't find.
  And there are some special configurations like IOMMUFD, user space SHSTK
  (shadow stack) Intel cares about, for SHSTK, it needs HW support and qemu
  support also, we could cover some special situation to find more bugs.
  For example IOMMU related issues:
  Report: https://lore.kernel.org/all/ZBE1k040xAhIuTmq@xpf.sh.intel.com/
  Patch and veirfied: https://lore.kernel.org/linux-iommu/ZCfN0MSBxfYTm7kI@xpf.sh.intel.com/

  In order to solve these bugs, it makes sense for us to report the issues to
  Linux kernel community if no one has already reported them.

  Thanks!
  BR.

> Cheers,
> 
> 						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-25 17:55       ` Theodore Ts'o
  2023-05-26  6:43         ` Pengfei Xu
@ 2023-05-26 17:42         ` Dave Hansen
  2023-05-26 20:54           ` Theodore Ts'o
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2023-05-26 17:42 UTC (permalink / raw)
  To: Theodore Ts'o, Dave Chinner
  Cc: Pengfei Xu, Eric Sandeen, dchinner, djwong, heng.su, linux-xfs,
	linux-fsdevel, lkp, Aleksandr Nogikh, Dmitry Vyukov, Li, Philip

On 5/25/23 10:55, Theodore Ts'o wrote:
> Bottom line, having various companies run their own private instances
> of syzkaller is much less useful for the upstream community.

Yes, totally agree.

> If Intel feels that it's useful to run their own instance, maybe
> there's some way you can work with Google syzkaller team so you don't
> have to do that?
I actually don't know why or when Intel started doing this.  0day in
general runs on a pretty diverse set of systems and I suspect this was
an attempt to leverage that.  Philip, do you know the history here?

Pengfei, is there a list somewhere of the things that you think are
missing from Google's syzkaller instance?  If not, could you make one,
please?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-26 17:42         ` Dave Hansen
@ 2023-05-26 20:54           ` Theodore Ts'o
  2023-05-26 21:20             ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2023-05-26 20:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Chinner, Pengfei Xu, Eric Sandeen, dchinner, djwong,
	heng.su, linux-xfs, linux-fsdevel, lkp, Aleksandr Nogikh,
	Dmitry Vyukov, Li, Philip

On Fri, May 26, 2023 at 10:42:55AM -0700, Dave Hansen wrote:
> 
> > If Intel feels that it's useful to run their own instance, maybe
> > there's some way you can work with Google syzkaller team so you don't
> > have to do that?
>
> I actually don't know why or when Intel started doing this.  0day in
> general runs on a pretty diverse set of systems and I suspect this was
> an attempt to leverage that.  Philip, do you know the history here?

Yeah, I think that's at least part of the issue.  Looking at some of
the reports that, the reported architecture was Tiger Lake and Adler
Lake.  According to Pengfei, part of this was to test features that
require newer cpu features, such as CET / Shadow Stack.  Now, I could
be wrong, because Intel's CPU naming scheme is too complex for my tiny
brain and makes my head spin.  It's really hard to map the names used
for mobile processors to those used by Xeon server class platforms,
but I *think*, if Intel's Product Managers haven't confused me
hopelessly, Google Cloud's C3 VM's, which use Sapphire Rapids, should
have those hardware features which are in Tiger Lake and Adler Lake,
while the Google Cloud's N2 VM's, which use Ice Lake processors, are
too old.  Can someone confirm if I got that right?

So this might be an issue of Intel submitting the relevant syzkaller
commits that add support for testing Shadow Stack, CET, IOMMUFD, etc.,
where needed to the upstream syzkaller git repo --- and then
convincing the Google Syzkaller team to turn up run some of test VM's
on the much more expensive (per CPU/hour) C3 VM's.  The former is
probably something that is just a matter of standard open source
upstreaming.  The latter might be more complicated, and might require
some private negotiations between companies to address the cost
differential and availability of C3 VM's.


The other thing that's probably worth considering here is that
hopefully many of these reports are one that aren't *actually*
architecture dependent, but for some reason, are just results that one
syzkaller's instance has found, but another syzkaller instance has not
yet found.  So perhaps there can be some kind of syzkaller state
export/import scheme so that a report that be transferred from one
syzkaller instance to another.  That way, upstream developers would
have a single syzkaller dashboard to pay attention to, get regular
information about how often a particular report is getting triggered,
and if the information behind the report can get fed into receiving
syzkaller's instance's fuzzing seed library, it might improve the test
coverage for other kernels that Intel doesn't have the business case
to test (e.g., Android kernels, kernels compiled for arm64 and RISC-V,
etc.)

After all, looking at the report which kicked off this thread ("soft
lockup in __cleanup_mnt"), I don't think this is something that should
be hardware specific; and yet, this report appears not to exist in
Google's syzkaller instance.  If we could import the fuzzing seed for
this and similar reports into Google's syzkaller instance, it seems to
me that this would be a Good Thing.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel
  2023-05-26 20:54           ` Theodore Ts'o
@ 2023-05-26 21:20             ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2023-05-26 21:20 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Dave Chinner, Pengfei Xu, Eric Sandeen, dchinner, djwong,
	heng.su, linux-xfs, linux-fsdevel, lkp, Aleksandr Nogikh,
	Dmitry Vyukov, Li, Philip

On 5/26/23 13:54, Theodore Ts'o wrote:
> On Fri, May 26, 2023 at 10:42:55AM -0700, Dave Hansen wrote:
>>
>>> If Intel feels that it's useful to run their own instance, maybe
>>> there's some way you can work with Google syzkaller team so you don't
>>> have to do that?
>>
>> I actually don't know why or when Intel started doing this.  0day in
>> general runs on a pretty diverse set of systems and I suspect this was
>> an attempt to leverage that.  Philip, do you know the history here?
> 
> Yeah, I think that's at least part of the issue.  Looking at some of
> the reports that, the reported architecture was Tiger Lake and Adler
> Lake.  According to Pengfei, part of this was to test features that
> require newer cpu features, such as CET / Shadow Stack.  Now, I could
> be wrong, because Intel's CPU naming scheme is too complex for my tiny
> brain and makes my head spin.  It's really hard to map the names used
> for mobile processors to those used by Xeon server class platforms,
> but I *think*, if Intel's Product Managers haven't confused me
> hopelessly, Google Cloud's C3 VM's, which use Sapphire Rapids, should
> have those hardware features which are in Tiger Lake and Adler Lake,
> while the Google Cloud's N2 VM's, which use Ice Lake processors, are
> too old.  Can someone confirm if I got that right?

That's roughly right.  *But*, there are things that got removed from
Tiger->Alder Lake like AVX-512 and things that the Xeons have that the
client CPUs don't, like SGX.

Shadow stacks are definitely one of the things that got added from Ice
Lake => Sapphire Rapids.

But like you mentioned below, I don't see any actual evidence that
"newer" hardware is implicated here at all.

> So this might be an issue of Intel submitting the relevant syzkaller
> commits that add support for testing Shadow Stack, CET, IOMMUFD, etc.,
> where needed to the upstream syzkaller git repo --- and then
> convincing the Google Syzkaller team to turn up run some of test VM's
> on the much more expensive (per CPU/hour) C3 VM's.  The former is
> probably something that is just a matter of standard open source
> upstreaming.  The latter might be more complicated, and might require
> some private negotiations between companies to address the cost
> differential and availability of C3 VM's.

Yeah, absolutely.

If Intel keeps up with its own instance of syzkaller, Intel should
constantly be asking itself why the Google instance isn't hitting the
same bugs and how we can close the gap if there is one.

> The other thing that's probably worth considering here is that
> hopefully many of these reports are one that aren't *actually*
> architecture dependent, but for some reason, are just results that one
> syzkaller's instance has found, but another syzkaller instance has not
> yet found.  So perhaps there can be some kind of syzkaller state
> export/import scheme so that a report that be transferred from one
> syzkaller instance to another.  That way, upstream developers would
> have a single syzkaller dashboard to pay attention to, get regular
> information about how often a particular report is getting triggered,
> and if the information behind the report can get fed into receiving
> syzkaller's instance's fuzzing seed library, it might improve the test
> coverage for other kernels that Intel doesn't have the business case
> to test (e.g., Android kernels, kernels compiled for arm64 and RISC-V,
> etc.)

Absolutely, a unified view of all of the instances would be really nice.

> After all, looking at the report which kicked off this thread ("soft
> lockup in __cleanup_mnt"), I don't think this is something that should
> be hardware specific; and yet, this report appears not to exist in
> Google's syzkaller instance.  If we could import the fuzzing seed for
> this and similar reports into Google's syzkaller instance, it seems to
> me that this would be a Good Thing.

Very true.  I don't see anything obviously Intel-specific here.  One of
the first questions we should be asking ourselves is why _we_ hit this
and Google didn't.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-26 21:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25  2:59 [Syzkaller & bisect] There is "soft lockup in __cleanup_mnt" in v6.4-rc3 kernel Pengfei Xu
2023-05-25  3:51 ` Eric Sandeen
2023-05-25  5:44   ` Pengfei Xu
2023-05-25  6:15     ` Dave Chinner
2023-05-25 17:55       ` Theodore Ts'o
2023-05-26  6:43         ` Pengfei Xu
2023-05-26 17:42         ` Dave Hansen
2023-05-26 20:54           ` Theodore Ts'o
2023-05-26 21:20             ` Dave Hansen
2023-05-26  4:55       ` Pengfei Xu
2023-05-25 14:17     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).