linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
@ 2021-11-02  9:27 bugzilla-daemon
  2021-11-02  9:29 ` [Bug 214913] " bugzilla-daemon
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-11-02  9:27 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

            Bug ID: 214913
           Summary: [xfstests generic/051] BUG: Kernel NULL pointer
                    dereference on read at 0x00000108 NIP
                    [c0000000000372e4] tm_cgpr_active+0x14/0x40
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: mainline linux v5.15
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: zlang@redhat.com
        Regression: No

xfstests generic/051 and some similar test cases always hit a kernel panic on
XFS.
From the call trace, it doesn't look like a xfs bug. As I only reproduce it on
ppc64le, so I report this bug to PPC64 at first.

[  740.492561] run fstests generic/051 at 2021-11-01 12:40:42 
[  742.806962] XFS (sda3): Mounting V5 Filesystem 
[  742.925825] XFS (sda3): Ending clean mount 
[  742.955028] XFS (sda3): User initiated shutdown received. 
[  742.955201] XFS (sda3): Metadata I/O Error (0x4) detected at
xfs_fs_goingdown+0x68/0x160 [xfs] (fs/xfs/xfs_fsops.c:497).  Shutting down
filesystem. 
[  742.955370] XFS (sda3): Please unmount the filesystem and rectify the
problem(s) 
[  742.973098] XFS (sda3): Unmounting Filesystem 
[  744.352066] XFS (sda3): Mounting V5 Filesystem 
[  744.425758] XFS (sda3): Ending clean mount 
[  775.192100] XFS (sda3): Unmounting Filesystem 
[  776.116445] XFS (sda3): Mounting V5 Filesystem 
[  777.331381] XFS (sda3): Ending clean mount 
[  800.111560] restraintd[1327]: *** Current Time: Mon Nov 01 12:41:42 2021 
Localwatchdog at: Wed Nov 03 12:31:42 2021 
[  813.403287] XFS (sda3): User initiated shutdown received. 
[  813.403380] XFS (sda3): Log I/O Error (0x6) detected at
xfs_fs_goingdown+0xf8/0x160 [xfs] (fs/xfs/xfs_fsops.c:500).  Shutting down
filesystem. 
[  813.403514] XFS (sda3): Please unmount the filesystem and rectify the
problem(s) 
[  813.418455] sda3: writeback error on inode 60042, offset 63640576, sector
2306320 
[  813.418484] sda3: writeback error on inode 81161, offset 13091840, sector
2306496 
[  813.428831] sda3: writeback error on inode 16878782, offset 30536704, sector
18080754 
[  813.429026] Kernel attempted to read user page (108) - exploit attempt?
(uid: 0) 
[  813.429068] BUG: Kernel NULL pointer dereference on read at 0x00000108 
[  813.429085] Faulting instruction address: 0xc0000000000372e4 
[  813.429102] Oops: Kernel access of bad area, sig: 11 [#1] 
[  813.429117] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
[  813.429133] Modules linked in: bonding rfkill tls sunrpc pseries_rng drm
fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi
ibmveth scsi_transport_srp vmx_crypto 
[  813.429202] CPU: 3 PID: 94001 Comm: fsstress Kdump: loaded Tainted: G       
W         5.15.0 #1 
[  813.429216] NIP:  c0000000000372e4 LR: c0000000006d9e48 CTR:
c0000000000372d0 
[  813.429227] REGS: c000000064ba7440 TRAP: 0300   Tainted: G        W         
(5.15.0) 
[  813.429238] MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> 
CR: 88004280  XER: 00000000 
[  813.429272] CFAR: c00000000000cb1c DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0  
[  813.429272] GPR00: c0000000006d9e48 c000000064ba76e0 c000000002cdc400
0000000000000000  
[  813.429272] GPR04: c000000002c3ac50 0000000000000000 0000000000000000
c00000004d174000  
[  813.429272] GPR08: c0000000013d21d8 0000000000000000 0000000000000012
0000000000000000  
[  813.429272] GPR12: c0000000000372d0 c000000007fccb00 0000000000000000
0000000000000005  
[  813.429272] GPR16: 0000000000000000 c0000000d19fa900 c000000001365bb0
c000000003fc26b4  
[  813.429272] GPR20: c0000000d19fb338 0000000000040100 0000000000000001
0000000000000001  
[  813.429272] GPR24: c00000000135d2e0 00000000ffffffff c000000064ba7968
c000000001091ef8  
[  813.429272] GPR28: 0000000000000108 0000000000000004 c0000000cc456400
c000000002c3ac50  
[  813.429396] NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 
[  813.429420] LR [c0000000006d9e48] fill_thread_core_info+0x158/0x250 
[  813.429435] Call Trace: 
[  813.429443] [c000000064ba76e0] [c0000000006d9eb8]
fill_thread_core_info+0x1c8/0x250 (unreliable) 
[  813.429465] [c000000064ba7760] [c0000000006dac70]
fill_note_info.constprop.0+0x240/0x420 
[  813.429480] [c000000064ba77d0] [c0000000006daf3c] elf_core_dump+0xec/0x5e0 
[  813.429494] [c000000064ba79e0] [c0000000006e1edc] do_coredump+0x32c/0xc10 
[  813.429507] [c000000064ba7bb0] [c000000000187adc] get_signal+0x52c/0x910 
[  813.429519] [c000000064ba7ca0] [c000000000021b9c] do_signal+0x7c/0x330 
[  813.429533] [c000000064ba7d40] [c000000000022e00]
do_notify_resume+0xb0/0x140 
[  813.429548] [c000000064ba7d70] [c000000000031330]
interrupt_exit_user_prepare_main+0x220/0x280 
[  813.429562] [c000000064ba7de0] [c000000000031804]
syscall_exit_prepare+0xe4/0x1e0 
[  813.429575] [c000000064ba7e10] [c00000000000c174]
system_call_vectored_common+0xf4/0x278 
[  813.429589] --- interrupt: 3000 at 0x7fffa9c7667c 
[  813.429600] NIP:  00007fffa9c7667c LR: 0000000000000000 CTR:
0000000000000000 
[  813.429610] REGS: c000000064ba7e80 TRAP: 3000   Tainted: G        W         
(5.15.0) 
[  813.429621] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 44004402 
XER: 00000000 
[  813.429647] IRQMASK: 0  
[  813.429647] GPR00: 00000000000000fa 00007fffefa13e10 00007fffa9e17100
0000000000000000  
[  813.429647] GPR04: 0000000000016f31 0000000000000006 0000000000000008
00000000ffffffff  
[  813.429647] GPR08: 0000000000000000 0000000000000000 0000000000000000
0000000000000000  
[  813.429647] GPR12: 0000000000000000 00007fffa9f2b040 0000000000000000
0000000000000000  
[  813.429647] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000010030de4  
[  813.429647] GPR20: 00000000100158c8 0000000000000000 0000000000000000
0000000010003d60  
[  813.429647] GPR24: 0000000000000001 0000000010012c60 00000000100137c8
0000000000000006  
[  813.429647] GPR28: 0000000000000005 ffffffffffffffff 00007fffa9f23840
0000000000016f31  
[  813.429776] NIP [00007fffa9c7667c] 0x7fffa9c7667c 
[  813.429789] LR [0000000000000000] 0x0 
[  813.429799] --- interrupt: 3000 
[  813.429808] Instruction dump: 
[  813.429816] 4bfe8345 60000000 e8010040 38210030 ebe1fff8 7c0803a6 4e800020
7c0802a6  
[  813.429839] 60000000 60000000 e92329c0 38600000 <e9290108> 7929e844 79291f43
4d820020  
[  813.429863] ---[ end trace 8a41ad95f224ad91 ]--- 
[  813.431701]  
[  813.431723] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:573 
[  813.431733] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 94001,
name: fsstress 
[  813.431744] INFO: lockdep is turned off. 
[  813.431750] irq event stamp: 1270330 
[  813.431756] hardirqs last  enabled at (1270329): [<c000000000589680>]
___slab_alloc+0xc40/0xf60 
[  813.431769] hardirqs last disabled at (1270330): [<c00000000009a4cc>]
interrupt_enter_prepare.constprop.0+0x10c/0x200 
[  813.431784] softirqs last  enabled at (1269500): [<c008000001dc61dc>]
__rhashtable_insert_fast.constprop.0+0x3d4/0x7c0 [xfs] 
[  813.431932] softirqs last disabled at (1269498): [<c008000001dc5ef8>]
__rhashtable_insert_fast.constprop.0+0xf0/0x7c0 [xfs] 
[  813.432045] CPU: 3 PID: 94001 Comm: fsstress Kdump: loaded Tainted: G      D
W         5.15.0 #1 
[  813.432056] Call Trace: 
[  813.432060] [c000000064ba6f20] [c00000000093e5d8] dump_stack_lvl+0xac/0x108
(unreliable) 
[  813.432075] [c000000064ba6f60] [c0000000001b991c] ___might_sleep+0x2dc/0x300 
[  813.432087] [c000000064ba6ff0] [c00000000107703c] __mutex_lock+0x6c/0x9e0 
[  813.432098] [c000000064ba7100] [c00000000069f678]
io_uring_del_tctx_node+0x78/0x170 
[  813.432111] [c000000064ba7140] [c0000000006b4c28]
io_uring_cancel_generic+0x248/0x3e0 
[  813.432122] [c000000064ba7200] [c00000000016ff70] do_exit+0xf0/0x700 
[  813.432135] [c000000064ba7290] [c00000000002b060] oops_end+0x1d0/0x200 
[  813.432148] [c000000064ba7310] [c000000000092ac4]
__bad_page_fault+0x174/0x190 
[  813.432177] [c000000064ba7380] [c00000000009c508]
__do_hash_fault+0x148/0x1f0 
[  813.432196] [c000000064ba73b0] [c00000000009c5d8] do_hash_fault+0x28/0x60 
[  813.432211] [c000000064ba73d0] [c00000000000891c]
data_access_common_virt+0x19c/0x1f0 
[  813.432226] --- interrupt: 300 at tm_cgpr_active+0x14/0x40 
[  813.432234] NIP:  c0000000000372e4 LR: c0000000006d9e48 CTR:
c0000000000372d0 
[  813.432244] REGS: c000000064ba7440 TRAP: 0300   Tainted: G      D W         
(5.15.0) 
[  813.432253] MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> 
CR: 88004280  XER: 00000000 
[  813.432286] CFAR: c00000000000cb1c DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0  
[  813.432286] GPR00: c0000000006d9e48 c000000064ba76e0 c000000002cdc400
0000000000000000  
[  813.432286] GPR04: c000000002c3ac50 0000000000000000 0000000000000000
c00000004d174000  
[  813.432286] GPR08: c0000000013d21d8 0000000000000000 0000000000000012
0000000000000000  
[  813.432286] GPR12: c0000000000372d0 c000000007fccb00 0000000000000000
0000000000000005  
[  813.432286] GPR16: 0000000000000000 c0000000d19fa900 c000000001365bb0
c000000003fc26b4  
[  813.432286] GPR20: c0000000d19fb338 0000000000040100 0000000000000001
0000000000000001  
[  813.432286] GPR24: c00000000135d2e0 00000000ffffffff c000000064ba7968
c000000001091ef8  
[  813.432286] GPR28: 0000000000000108 0000000000000004 c0000000cc456400
c000000002c3ac50  
[  813.432402] NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 
[  813.432412] LR [c0000000006d9e48] fill_thread_core_info+0x158/0x250 
[  813.432424] --- interrupt: 300 
[  813.432429] [c000000064ba76e0] [c0000000006d9eb8]
fill_thread_core_info+0x1c8/0x250 (unreliable) 
[  813.432443] [c000000064ba7760] [c0000000006dac70]
fill_note_info.constprop.0+0x240/0x420 
[  813.432455] [c000000064ba77d0] [c0000000006daf3c] elf_core_dump+0xec/0x5e0 
[  813.432467] [c000000064ba79e0] [c0000000006e1edc] do_coredump+0x32c/0xc10 
[  813.432479] [c000000064ba7bb0] [c000000000187adc] get_signal+0x52c/0x910 
[  813.432492] [c000000064ba7ca0] [c000000000021b9c] do_signal+0x7c/0x330 
[  813.432518] [c000000064ba7d40] [c000000000022e00]
do_notify_resume+0xb0/0x140 
[  813.432537] [c000000064ba7d70] [c000000000031330]
interrupt_exit_user_prepare_main+0x220/0x280 
[  813.432556] [c000000064ba7de0] [c000000000031804]
syscall_exit_prepare+0xe4/0x1e0 
[  813.432571] [c000000064ba7e10] [c00000000000c174]
system_call_vectored_common+0xf4/0x278 
[  813.432585] --- interrupt: 3000 at 0x7fffa9c7667c 
[  813.432595] NIP:  00007fffa9c7667c LR: 0000000000000000 CTR:
0000000000000000 
[  813.432605] REGS: c000000064ba7e80 TRAP: 3000   Tainted: G      D W         
(5.15.0) 
[  813.432615] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 44004402 
XER: 00000000 
[  813.432641] IRQMASK: 0  
[  813.432641] GPR00: 00000000000000fa 00007fffefa13e10 00007fffa9e17100
0000000000000000  
[  813.432641] GPR04: 0000000000016f31 0000000000000006 0000000000000008
00000000ffffffff  
[  813.432641] GPR08: 0000000000000000 0000000000000000 0000000000000000
0000000000000000  
[  813.432641] GPR12: 0000000000000000 00007fffa9f2b040 0000000000000000
0000000000000000  
[  813.432641] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000010030de4  
[  813.432641] GPR20: 00000000100158c8 0000000000000000 0000000000000000
0000000010003d60  
[  813.432641] GPR24: 0000000000000001 0000000010012c60 00000000100137c8
0000000000000006  
[  813.432641] GPR28: 0000000000000005 ffffffffffffffff 00007fffa9f23840
0000000000016f31  
[  813.432761] NIP [00007fffa9c7667c] 0x7fffa9c7667c 
[  813.432770] LR [0000000000000000] 0x0 
[  813.432777] --- interrupt: 3000 
[  860.223013] restraintd[1327]: *** Current Time: Mon Nov 01 12:42:42 2021 
Localwatchdog at: Wed Nov 03 12:31:42 2021 


I reproduced this bug on linux HEAD=8bb7eca972ad. The steps to reproduce this
bug is:
1) git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
2) build xfstests
3) run generic/051 on ppc64le on xfs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
@ 2021-11-02  9:29 ` bugzilla-daemon
  2021-11-04  5:45 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-11-02  9:29 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #1 from Zorro Lang (zlang@redhat.com) ---
Created attachment 299403
  --> https://bugzilla.kernel.org/attachment.cgi?id=299403&action=edit
.config file

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
  2021-11-02  9:29 ` [Bug 214913] " bugzilla-daemon
@ 2021-11-04  5:45 ` bugzilla-daemon
  2021-11-04  8:15 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-11-04  5:45 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |michael@ellerman.id.au

--- Comment #2 from Michael Ellerman (michael@ellerman.id.au) ---
Thanks for the report, I agree this looks like a powerpc bug not an XFS bug.

I won't have time to look at this until next week probably, unless someone
beats me to it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
  2021-11-02  9:29 ` [Bug 214913] " bugzilla-daemon
  2021-11-04  5:45 ` bugzilla-daemon
@ 2021-11-04  8:15 ` bugzilla-daemon
  2021-11-05 11:53 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-11-04  8:15 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michal Suchanek (hramrach@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hramrach@gmail.com

--- Comment #3 from Michal Suchanek (hramrach@gmail.com) ---
What CPU is this?

Does it go away if you boot with ppc_tm=off

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2021-11-04  8:15 ` bugzilla-daemon
@ 2021-11-05 11:53 ` bugzilla-daemon
  2021-12-09 11:43 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-11-05 11:53 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #4 from Zorro Lang (zlang@redhat.com) ---
(In reply to Michal Suchanek from comment #3)
> What CPU is this?
> 
> Does it go away if you boot with ppc_tm=off

(In reply to Michael Ellerman from comment #2)
> Thanks for the report, I agree this looks like a powerpc bug not an XFS bug.
> 
> I won't have time to look at this until next week probably, unless someone
> beats me to it.

Thanks for you reply. (Un)fortunately, due to linux keeps updating, I can't
reproduce this panic on latest mainline linux master branch now. The HEAD
commit is 7ddb58cb0eca. From 8bb7eca972ad (v5.15) to 7ddb58cb0eca (v5.15+),
there're many changes, I can't sure which commit fixes this bug, or hide it? Do
you know if there was a known issue about this has been fixed?

Thanks,
Zorro

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2021-11-05 11:53 ` bugzilla-daemon
@ 2021-12-09 11:43 ` bugzilla-daemon
  2022-12-11 13:13 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2021-12-09 11:43 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO

--- Comment #5 from Michael Ellerman (michael@ellerman.id.au) ---
Sorry I don't have any idea which commit could have fixed this.

The process that crashed was "fsstress", do you know if it uses io_uring?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2021-12-09 11:43 ` bugzilla-daemon
@ 2022-12-11 13:13 ` bugzilla-daemon
  2022-12-11 13:19 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-11 13:13 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #6 from Zorro Lang (zlang@redhat.com) ---
FYI, still hit this issue on linux 6.1.0-rc8+. And it's nearly 100%
reproducible.

[ 1581.047788] run fstests generic/051 at 2022-12-10 11:28:27 
[ 1582.574596] XFS (sda3): Mounting V5 Filesystem 
[ 1582.638653] XFS (sda3): Ending clean mount 
[ 1582.646329] XFS (sda3): User initiated shutdown received. 
[ 1582.646397] XFS (sda3): Metadata I/O Error (0x4) detected at
xfs_fs_goingdown+0x68/0x160 [xfs] (fs/xfs/xfs_fsops.c:483).  Shutting down
filesystem. 
[ 1582.646506] XFS (sda3): Please unmount the filesystem and rectify the
problem(s) 
[ 1582.692102] XFS (sda3): Unmounting Filesystem 
[ 1584.011651] XFS (sda3): Mounting V5 Filesystem 
[ 1584.123764] XFS (sda3): Ending clean mount 
[ 1605.168286] restraintd[3598]: *** Current Time: Sat Dec 10 11:28:52 2022 
Localwatchdog at: Mon Dec 12 11:03:52 2022 
[ 1614.846132] XFS (sda3): Unmounting Filesystem 
[ 1615.569693] XFS (sda3): Mounting V5 Filesystem 
[ 1615.725272] XFS (sda3): Ending clean mount 
[ 1650.793064] XFS (sda3): User initiated shutdown received. 
[ 1650.793108] XFS (sda3): Log I/O Error (0x6) detected at
xfs_fs_goingdown+0xf8/0x160 [xfs] (fs/xfs/xfs_fsops.c:486).  Shutting down
filesystem. 
[ 1650.793200] XFS (sda3): Please unmount the filesystem and rectify the
problem(s) 
[ 1650.801605] Kernel attempted to read user page (108) - exploit attempt?
(uid: 0) 
[ 1650.801625] BUG: Kernel NULL pointer dereference on read at 0x00000108 
[ 1650.801638] Faulting instruction address: 0xc000000000036154 
[ 1650.801652] Oops: Kernel access of bad area, sig: 11 [#1] 
[ 1650.801660] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
[ 1650.801671] Modules linked in: dm_flakey dm_mod bonding tls rfkill sunrpc
pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi
sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto 
[ 1650.801727] CPU: 0 PID: 382724 Comm: fsstress Kdump: loaded Not tainted
6.1.0-rc8+ #1 
[ 1650.801739] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005
of:IBM,FW940.02 (VL940_041) hv:phyp pSeries 
[ 1650.801743] Kernel attempted to read user page (108) - exploit attempt?
(uid: 0) 
[ 1650.801748] NIP:  c000000000036154 LR: c0000000006f67b4 CTR:
c000000000036140 
[ 1650.801755] BUG: Kernel NULL pointer dereference on read at 0x00000108 
[ 1650.801759] REGS: c00000004eb7b480 TRAP: 0300   Not tainted  (6.1.0-rc8+) 
[ 1650.801764] Faulting instruction address: 0xc000000000036154 
[ 1650.801769] MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> 
CR: 88004400  XER: 00000000 
[ 1650.801809] CFAR: c00000000000c9d4 DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0  
[ 1650.801809] GPR00: c0000000006f67b4 c00000004eb7b720 c0000000016c0600
0000000000000000  
[ 1650.801809] GPR04: c000000001690ef8 0000000000000000 0000000000000000
c00000004b72a900  
[ 1650.801809] GPR08: c000000001506ee8 0000000000000000 0000000000000009
0000000000000000  
[ 1650.801809] GPR12: c000000000036140 c0000000051e0000 0000000000000000
00007fff96f879b0  
[ 1650.801809] GPR16: 00007fff970941d0 ffffffffffffffff 0000000000000005
c00000004484a400  
[ 1650.801809] GPR20: c00000004484aeb8 0000000000040100 0000000000000001
c000000001489d58  
[ 1650.801809] GPR24: 00000000ffffffff c00000004eb7b8b0 0000000000000004
c0000000011531e8  
[ 1650.801809] GPR28: 0000000000000108 c00000004be38400 0000000000000004
c000000001690ef8  
[ 1650.801927] NIP [c000000000036154] tm_cgpr_active+0x14/0x40 
[ 1650.801939] LR [c0000000006f67b4] fill_thread_core_info+0x1d4/0x290 
[ 1650.801951] Call Trace: 
[ 1650.801955] [c00000004eb7b720] [c0000000006f673c]
fill_thread_core_info+0x15c/0x290 (unreliable) 
[ 1650.801971] [c00000004eb7b7a0] [c0000000006f6fd4] fill_note_info+0x1f4/0x390 
[ 1650.801984] [c00000004eb7b810] [c0000000006f71fc] elf_core_dump+0x8c/0x580 
[ 1650.801997] [c00000004eb7ba00] [c0000000006fcc10] do_coredump+0x330/0xca0 
[ 1650.802012] [c00000004eb7bbd0] [c000000000174f94] get_signal+0x7f4/0x8f0 
[ 1650.802024] [c00000004eb7bcb0] [c000000000020d2c] do_signal+0x7c/0x330 
[ 1650.802036] [c00000004eb7bd50] [c000000000022010]
do_notify_resume+0xb0/0x140 
[ 1650.802049] [c00000004eb7bd80] [c000000000030550]
interrupt_exit_user_prepare_main+0x1d0/0x290 
[ 1650.802062] [c00000004eb7bde0] [c0000000000306f4]
syscall_exit_prepare+0xe4/0x1f0 
[ 1650.802074] [c00000004eb7be10] [c00000000000bffc]
system_call_vectored_common+0xfc/0x280 
[ 1650.802089] --- interrupt: 3000 at 0x7fff96de315c 
[ 1650.802099] NIP:  00007fff96de315c LR: 0000000000000000 CTR:
0000000000000000 
[ 1650.802107] REGS: c00000004eb7be80 TRAP: 3000   Not tainted  (6.1.0-rc8+) 
[ 1650.802115] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42004404 
XER: 00000000 
[ 1650.802141] IRQMASK: 0  
[ 1650.802141] GPR00: 00000000000000fa 00007fffc54a96a0 00007fff96f87200
0000000000000000  
[ 1650.802141] GPR04: 000000000005d704 0000000000000006 0000000000000000
0000000000000000  
[ 1650.802141] GPR08: 00007fff96f81f68 0000000000000000 0000000000000000
0000000000000000  
[ 1650.802141] GPR12: 0000000000000000 00007fff9709b1c0 0000000000000000
00007fff96f879b0  
[ 1650.802141] GPR16: 00007fff970941d0 ffffffffffffffff 0000000010030bec
00000000100152e8  
[ 1650.802141] GPR20: 0000000000000000 0000000000000000 00007fffc54bdfee
0000000000000001  
[ 1650.802141] GPR24: 0000000010009800 00000000100131a8 8f5c28f5c28f5c29
028f5c28f5c28f5c  
[ 1650.802141] GPR28: 0000000000000006 ffffffffffffffff 00007fff97093980
000000000005d704  
[ 1650.802249] NIP [00007fff96de315c] 0x7fff96de315c 
[ 1650.802258] LR [0000000000000000] 0x0 
[ 1650.802266] --- interrupt: 3000 
[ 1650.802272] Instruction dump: 
[ 1650.802279] 4bfe87d5 60000000 e8010040 38210030 ebe1fff8 7c0803a6 4e800020
7c0802a6  
[ 1650.802305] 60000000 60000000 e9232aa0 38600000 <e9290108> 7929e844 79291f43
41820008  
[ 1650.802330] ---[ end trace 0000000000000000 ]--- 
[ 1650.813469]  
[ 1650.813475] Oops: Kernel access of bad area, sig: 11 [#2] 
[ 1650.813480] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
[ 1650.813488] Modules linked in: dm_flakey dm_mod bonding tls rfkill sunrpc
pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi
sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto 
[ 1650.813524] CPU: 4 PID: 382723 Comm: fsstress Kdump: loaded Tainted: G     
D            6.1.0-rc8+ #1 
[ 1650.813532] Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005
of:IBM,FW940.02 (VL940_041) hv:phyp pSeries 
[ 1650.813537] NIP:  c000000000036154 LR: c0000000006f67b4 CTR:
c000000000036140 
[ 1650.813541] REGS: c00000004eb4b480 TRAP: 0300   Tainted: G      D           
 (6.1.0-rc8+) 
[ 1650.813546] MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> 
CR: 88004400  XER: 20040000 
[ 1650.813562] CFAR: c00000000000c9d4 DAR: 0000000000000108 DSISR: 40000000
IRQMASK: 0  
[ 1650.813562] GPR00: c0000000006f67b4 c00000004eb4b720 c0000000016c0600
0000000000000000  
[ 1650.813562] GPR04: c000000001690ef8 0000000000000000 0000000000000000
c0000000437e4800  
[ 1650.813562] GPR08: c000000001506ee8 0000000000000000 0000000000000009
0000000000000000  
[ 1650.813562] GPR12: c000000000036140 c00000000ffcc480 0000000000000000
00007fff96f879b0  
[ 1650.813562] GPR16: 00007fff970941d0 ffffffffffffffff 0000000000000005
c000000044810e00  
[ 1650.813562] GPR20: c0000000448118b8 0000000000040100 0000000000000001
c000000001489d58  
[ 1650.813562] GPR24: 00000000ffffffff c00000004eb4b8b0 0000000000000004
c0000000011531e8  
[ 1650.813562] GPR28: 0000000000000108 c00000003235f000 0000000000000004
c000000001690ef8  
[ 1650.813619] NIP [c000000000036154] tm_cgpr_active+0x14/0x40 
[ 1650.813625] LR [c0000000006f67b4] fill_thread_core_info+0x1d4/0x290 
[ 1650.813632] Call Trace: 
[ 1650.813634] [c00000004eb4b720] [c0000000006f673c]
fill_thread_core_info+0x15c/0x290 (unreliable) 
[ 1650.813643] [c00000004eb4b7a0] [c0000000006f6fd4] fill_note_info+0x1f4/0x390 
[ 1650.813650] [c00000004eb4b810] [c0000000006f71fc] elf_core_dump+0x8c/0x580 
[ 1650.813657] [c00000004eb4ba00] [c0000000006fcc10] do_coredump+0x330/0xca0 
[ 1650.813662] [c00000004eb4bbd0] [c000000000174f94] get_signal+0x7f4/0x8f0 
[ 1650.813668] [c00000004eb4bcb0] [c000000000020d2c] do_signal+0x7c/0x330 
[ 1650.813674] [c00000004eb4bd50] [c000000000022010]
do_notify_resume+0xb0/0x140 
[ 1650.813681] [c00000004eb4bd80] [c000000000030550]
interrupt_exit_user_prepare_main+0x1d0/0x290 
[ 1650.813687] [c00000004eb4bde0] [c0000000000306f4]
syscall_exit_prepare+0xe4/0x1f0 
[ 1650.813693] [c00000004eb4be10] [c00000000000bffc]
system_call_vectored_common+0xfc/0x280 
[ 1650.813700] --- interrupt: 3000 at 0x7fff96de315c 
[ 1650.813705] NIP:  00007fff96de315c LR: 0000000000000000 CTR:
0000000000000000 
[ 1650.813709] REGS: c00000004eb4be80 TRAP: 3000   Tainted: G      D           
 (6.1.0-rc8+) 
[ 1650.813713] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42004404 
XER: 00000000 
[ 1650.813725] IRQMASK: 0  
[ 1650.813725] GPR00: 00000000000000fa 00007fffc54a9b90 00007fff96f87200
0000000000000000  
[ 1650.813725] GPR04: 000000000005d703 0000000000000006 0000000000000000
0000000000000000  
[ 1650.813725] GPR08: 00007fff96f81f68 0000000000000000 0000000000000000
0000000000000000  
[ 1650.813725] GPR12: 0000000000000000 00007fff9709b1c0 0000000000000000
00007fff96f879b0  
[ 1650.813725] GPR16: 00007fff970941d0 ffffffffffffffff 0000000010030bec
00000000100152e8  
[ 1650.813725] GPR20: 0000000000000000 0000000000000000 00007fffc54bdfee
0000000000000001  
[ 1650.813725] GPR24: 0000000010010460 00000000100131a8 8f5c28f5c28f5c29
028f5c28f5c28f5c  
[ 1650.813725] GPR28: 0000000000000006 0000000000000005 00007fff97093980
000000000005d703  
[ 1650.813778] NIP [00007fff96de315c] 0x7fff96de315c 
[ 1650.813782] LR [0000000000000000] 0x0 
[ 1650.813785] --- interrupt: 3000 
[ 1650.813788] Instruction dump: 
[ 1650.813791] 4bfe87d5 60000000 e8010040 38210030 ebe1fff8 7c0803a6 4e800020
7c0802a6  
[ 1650.813801] 60000000 60000000 e9232aa0 38600000 <e9290108> 7929e844 79291f43
41820008  
[ 1650.813811] ---[ end trace 0000000000000000 ]---

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-12-11 13:13 ` bugzilla-daemon
@ 2022-12-11 13:19 ` bugzilla-daemon
  2022-12-12  3:52   ` Nicholas Piggin
  2022-12-12  3:52 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-11 13:19 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #7 from Zorro Lang (zlang@redhat.com) ---
(In reply to Michael Ellerman from comment #5)
> Sorry I don't have any idea which commit could have fixed this.
> 
> The process that crashed was "fsstress", do you know if it uses io_uring?

Yes, fsstress has io_uring read/write operations. And from the kernel .config
file(as attachment), the CONFIG_IO_URING=y

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2022-12-11 13:19 ` bugzilla-daemon
@ 2022-12-12  3:52   ` Nicholas Piggin
  2022-12-12  7:30     ` Christophe Leroy
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2022-12-12  3:52 UTC (permalink / raw)
  To: bugzilla-daemon, linuxppc-dev

On Sun Dec 11, 2022 at 11:19 PM AEST,  wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
> (In reply to Michael Ellerman from comment #5)
> > Sorry I don't have any idea which commit could have fixed this.
> > 
> > The process that crashed was "fsstress", do you know if it uses io_uring?
>
> Yes, fsstress has io_uring read/write operations. And from the kernel .config
> file(as attachment), the CONFIG_IO_URING=y

The task being dumped seems like it's lost its task->thread.regs. The
NULL pointer is here:

int tm_cgpr_active(struct task_struct *target, const struct user_regset *regset)
{
        if (!cpu_has_feature(CPU_FTR_TM))
                return -ENODEV;

        if (!MSR_TM_ACTIVE(target->thread.regs->msr))
                return 0;

        return regset->n;
}

On that regs->msr deref. r9 contains the regs pointer.

The kernel attempt to read user page - exploit attempt? message is
I think a red herring it's coming up because of the NULL deref I
think (I thought we fixed that).

Anyway I'm not sure how we could lose regs, all user threads should
have them set to non-NULL. It doesn't look like we can collect threads
for dumping before we have called copy_thread(), which is where they
get thread.regs set. AFAIK it's not supposed to change after that.

Would you be able to try this patch, hopefully it catches the problem
thread on the exit side, and gives a clue why regs is NULL.

Thanks,
Nick

---

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6a11025e5850..ece63b3d2304 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1898,9 +1898,21 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
 	/*
 	 * Now fill in each thread's information.
 	 */
-	for (t = info->thread; t != NULL; t = t->next)
+	for (t = info->thread; t != NULL; t = t->next) {
+		if (!t->task) {
+			WARN_ON(1);
+			printk("core info lost task\n");
+			continue;
+		}
+		if (!t->task->thread.regs) {
+			WARN_ON(1);
+			printk("lost regs pid:%d (current->pid:%d)\n", t->task->pid, current->pid);
+			continue;
+		}
+
 		if (!fill_thread_core_info(t, view, cprm->siginfo->si_signo, info))
 			return 0;
+	}
 
 	/*
 	 * Fill in the two process-wide notes.
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..6820fe333081 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -366,6 +366,8 @@ static void coredump_task_exit(struct task_struct *tsk)
 	if (core_state) {
 		struct core_thread self;
 
+		WARN_ON(!current->thread.regs);
+
 		self.task = current;
 		if (self.task->flags & PF_SIGNALED)
 			self.next = xchg(&core_state->dumper.next, &self);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-12-11 13:19 ` bugzilla-daemon
@ 2022-12-12  3:52 ` bugzilla-daemon
  2022-12-12  5:57 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-12  3:52 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #8 from npiggin@gmail.com ---
On Sun Dec 11, 2022 at 11:19 PM AEST,  wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
> (In reply to Michael Ellerman from comment #5)
> > Sorry I don't have any idea which commit could have fixed this.
> > 
> > The process that crashed was "fsstress", do you know if it uses io_uring?
>
> Yes, fsstress has io_uring read/write operations. And from the kernel .config
> file(as attachment), the CONFIG_IO_URING=y

The task being dumped seems like it's lost its task->thread.regs. The
NULL pointer is here:

int tm_cgpr_active(struct task_struct *target, const struct user_regset
*regset)
{
        if (!cpu_has_feature(CPU_FTR_TM))
                return -ENODEV;

        if (!MSR_TM_ACTIVE(target->thread.regs->msr))
                return 0;

        return regset->n;
}

On that regs->msr deref. r9 contains the regs pointer.

The kernel attempt to read user page - exploit attempt? message is
I think a red herring it's coming up because of the NULL deref I
think (I thought we fixed that).

Anyway I'm not sure how we could lose regs, all user threads should
have them set to non-NULL. It doesn't look like we can collect threads
for dumping before we have called copy_thread(), which is where they
get thread.regs set. AFAIK it's not supposed to change after that.

Would you be able to try this patch, hopefully it catches the problem
thread on the exit side, and gives a clue why regs is NULL.

Thanks,
Nick

---

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6a11025e5850..ece63b3d2304 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1898,9 +1898,21 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
        /*
         * Now fill in each thread's information.
         */
-       for (t = info->thread; t != NULL; t = t->next)
+       for (t = info->thread; t != NULL; t = t->next) {
+               if (!t->task) {
+                       WARN_ON(1);
+                       printk("core info lost task\n");
+                       continue;
+               }
+               if (!t->task->thread.regs) {
+                       WARN_ON(1);
+                       printk("lost regs pid:%d (current->pid:%d)\n",
t->task->pid, current->pid);
+                       continue;
+               }
+
                if (!fill_thread_core_info(t, view, cprm->siginfo->si_signo,
info))
                        return 0;
+       }

        /*
         * Fill in the two process-wide notes.
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..6820fe333081 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -366,6 +366,8 @@ static void coredump_task_exit(struct task_struct *tsk)
        if (core_state) {
                struct core_thread self;

+               WARN_ON(!current->thread.regs);
+
                self.task = current;
                if (self.task->flags & PF_SIGNALED)
                        self.next = xchg(&core_state->dumper.next, &self);

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-12-12  3:52 ` bugzilla-daemon
@ 2022-12-12  5:57 ` bugzilla-daemon
  2022-12-12  7:19   ` Nicholas Piggin
  2022-12-12  7:19 ` bugzilla-daemon
  2022-12-12  7:30 ` bugzilla-daemon
  10 siblings, 1 reply; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-12  5:57 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
I assume it's an io_uring IO worker.

They're created via create_io_worker() -> create_io_thread().

They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2022-12-12  5:57 ` bugzilla-daemon
@ 2022-12-12  7:19   ` Nicholas Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nicholas Piggin @ 2022-12-12  7:19 UTC (permalink / raw)
  To: bugzilla-daemon, linuxppc-dev; +Cc: Eric Biederman

On Mon Dec 12, 2022 at 3:57 PM AEST,  wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
> I assume it's an io_uring IO worker.
>
> They're created via create_io_worker() -> create_io_thread().
>
> They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
> up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.

Hmm, you might be right. These things are created with the memory and
thread  / signal context shared with the userspace process.

Still doesn't seem like they should be involved in core dumping though,
pt_regs would have no meaning even if we did set something there. How
best to catch these and filter them out of the core dump? Check for
PF_IO_WORKER in the coredump gathering?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-12-12  5:57 ` bugzilla-daemon
@ 2022-12-12  7:19 ` bugzilla-daemon
  2022-12-12  7:30 ` bugzilla-daemon
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-12  7:19 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #10 from npiggin@gmail.com ---
On Mon Dec 12, 2022 at 3:57 PM AEST,  wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>
> --- Comment #9 from Michael Ellerman (michael@ellerman.id.au) ---
> I assume it's an io_uring IO worker.
>
> They're created via create_io_worker() -> create_io_thread().
>
> They pass a non-NULL `args->fn` to copy_process() -> copy_thread(), so we end
> up in the "kernel thread" branch of the if, which sets p->thread.regs = NULL.

Hmm, you might be right. These things are created with the memory and
thread  / signal context shared with the userspace process.

Still doesn't seem like they should be involved in core dumping though,
pt_regs would have no meaning even if we did set something there. How
best to catch these and filter them out of the core dump? Check for
PF_IO_WORKER in the coredump gathering?

Thanks,
Nick

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2022-12-12  3:52   ` Nicholas Piggin
@ 2022-12-12  7:30     ` Christophe Leroy
  0 siblings, 0 replies; 15+ messages in thread
From: Christophe Leroy @ 2022-12-12  7:30 UTC (permalink / raw)
  To: Nicholas Piggin, bugzilla-daemon, linuxppc-dev



Le 12/12/2022 à 04:52, Nicholas Piggin a écrit :
> On Sun Dec 11, 2022 at 11:19 PM AEST,  wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>>
>> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
>> (In reply to Michael Ellerman from comment #5)
>>> Sorry I don't have any idea which commit could have fixed this.
>>>
>>> The process that crashed was "fsstress", do you know if it uses io_uring?
>>
>> Yes, fsstress has io_uring read/write operations. And from the kernel .config
>> file(as attachment), the CONFIG_IO_URING=y
> 
> The task being dumped seems like it's lost its task->thread.regs. The
> NULL pointer is here:
> 
> int tm_cgpr_active(struct task_struct *target, const struct user_regset *regset)
> {
>          if (!cpu_has_feature(CPU_FTR_TM))
>                  return -ENODEV;
> 
>          if (!MSR_TM_ACTIVE(target->thread.regs->msr))
>                  return 0;
> 
>          return regset->n;
> }
> 
> On that regs->msr deref. r9 contains the regs pointer.
> 
> The kernel attempt to read user page - exploit attempt? message is
> I think a red herring it's coming up because of the NULL deref I
> think (I thought we fixed that).
> 

No we didn't fix that, my patch was rejected see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8b865b93d25c15c8e6d41e71c368bfc28da4489d.1606816701.git.christophe.leroy@csgroup.eu/

The reason for the rejection was:

   The first page can be mapped if mmap_min_addr is 0.

   Blocking all faults to the first page would potentially break any
   program that does that.

   Also if there is something mapped at 0 it's a good chance it is an
   exploit attempt :)



Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
  2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
                   ` (9 preceding siblings ...)
  2022-12-12  7:19 ` bugzilla-daemon
@ 2022-12-12  7:30 ` bugzilla-daemon
  10 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2022-12-12  7:30 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=214913

--- Comment #11 from Christophe Leroy (christophe.leroy@csgroup.eu) ---
Le 12/12/2022 à 04:52, Nicholas Piggin a écrit :
> On Sun Dec 11, 2022 at 11:19 PM AEST,  wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=214913
>>
>> --- Comment #7 from Zorro Lang (zlang@redhat.com) ---
>> (In reply to Michael Ellerman from comment #5)
>>> Sorry I don't have any idea which commit could have fixed this.
>>>
>>> The process that crashed was "fsstress", do you know if it uses io_uring?
>>
>> Yes, fsstress has io_uring read/write operations. And from the kernel
>> .config
>> file(as attachment), the CONFIG_IO_URING=y
> 
> The task being dumped seems like it's lost its task->thread.regs. The
> NULL pointer is here:
> 
> int tm_cgpr_active(struct task_struct *target, const struct user_regset
> *regset)
> {
>          if (!cpu_has_feature(CPU_FTR_TM))
>                  return -ENODEV;
> 
>          if (!MSR_TM_ACTIVE(target->thread.regs->msr))
>                  return 0;
> 
>          return regset->n;
> }
> 
> On that regs->msr deref. r9 contains the regs pointer.
> 
> The kernel attempt to read user page - exploit attempt? message is
> I think a red herring it's coming up because of the NULL deref I
> think (I thought we fixed that).
> 

No we didn't fix that, my patch was rejected see 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/8b865b93d25c15c8e6d41e71c368bfc28da4489d.1606816701.git.christophe.leroy@csgroup.eu/

The reason for the rejection was:

   The first page can be mapped if mmap_min_addr is 0.

   Blocking all faults to the first page would potentially break any
   program that does that.

   Also if there is something mapped at 0 it's a good chance it is an
   exploit attempt :)



Christophe

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-12-12  7:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-02  9:27 [Bug 214913] New: [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40 bugzilla-daemon
2021-11-02  9:29 ` [Bug 214913] " bugzilla-daemon
2021-11-04  5:45 ` bugzilla-daemon
2021-11-04  8:15 ` bugzilla-daemon
2021-11-05 11:53 ` bugzilla-daemon
2021-12-09 11:43 ` bugzilla-daemon
2022-12-11 13:13 ` bugzilla-daemon
2022-12-11 13:19 ` bugzilla-daemon
2022-12-12  3:52   ` Nicholas Piggin
2022-12-12  7:30     ` Christophe Leroy
2022-12-12  3:52 ` bugzilla-daemon
2022-12-12  5:57 ` bugzilla-daemon
2022-12-12  7:19   ` Nicholas Piggin
2022-12-12  7:19 ` bugzilla-daemon
2022-12-12  7:30 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).