All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
@ 2017-06-28  8:32 Eryu Guan
  2017-06-28 17:16 ` Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Eryu Guan @ 2017-06-28  8:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Balbir Singh, liwan

Hi all,

Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
page access. But it's not reproducing on every ppc64le host we've
tested, but it usually happened in filesystem testings.

[  207.403459] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c5e7f
[  207.403470] Faulting instruction address: 0xc0000000004d470c
[  207.403475] Oops: Kernel access of bad area, sig: 7 [#1]
[  207.403477] SMP NR_CPUS=2048
[  207.403478] NUMA
[  207.403480] pSeries
[  207.403483] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp
[  207.403503] CPU: 0 PID: 2263 Comm: mount Not tainted 4.12.0-rc7 #26
[  207.403506] task: c0000003ef2fde00 task.stack: c0000003de394000
[  207.403509] NIP: c0000000004d470c LR: c00000000011cd24 CTR: c000000000130de0
[  207.403512] REGS: c0000003de397450 TRAP: 0600   Not tainted  (4.12.0-rc7)
[  207.403515] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[  207.403521]   CR: 28028844  XER: 00000001
[  207.403525] CFAR: c00000000011cd20 DAR: c0000001c52c5e7f DSISR: 00000000 SOFTE: 0
[  207.403525] GPR00: c00000000011cce8 c0000003de3976d0 c000000001049500 c0000003f2c6ec20
[  207.403525] GPR04: c0000003f2c6ec20 c0000001c52c5e7f 0000000000000000 0000000000000001
[  207.403525] GPR08: 000c5543cab19830 0000000198e19900 0000000000000008 0000000000000000
[  207.403525] GPR12: c000000000130de0 c00000000fac0000 0000000000000000 c0000003f1328000
[  207.403525] GPR16: 0000000000000000 c0000003de700400 0000000000000000 c0000003de700594
[  207.403525] GPR20: 0000000000000002 0000000000000000 0000000000004000 c000000000cc5780
[  207.403525] GPR24: 00000001c45ffc5f 0000000000000000 00000001c45ffc5f c00000000107dd00
[  207.403525] GPR28: c0000003f2c6f434 0000000000000004 0000000000000800 c0000003f2c6ec00
[  207.403567] NIP [c0000000004d470c] llist_add_batch+0xc/0x40
[  207.403571] LR [c00000000011cd24] try_to_wake_up+0x4a4/0x5b0
[  207.403573] Call Trace:
[  207.403576] [c0000003de3976d0] [c00000000011cce8] try_to_wake_up+0x468/0x5b0 (unreliable)
[  207.403581] [c0000003de397750] [c000000000102cc8] create_worker+0x148/0x250
[  207.403585] [c0000003de3977f0] [c000000000105e7c] alloc_unbound_pwq+0x3bc/0x4c0
[  207.403589] [c0000003de397850] [c0000000001064bc] apply_wqattrs_prepare+0x2ac/0x320
[  207.403593] [c0000003de3978c0] [c00000000010656c] apply_workqueue_attrs_locked+0x3c/0xa0
[  207.403597] [c0000003de3978f0] [c000000000106acc] apply_workqueue_attrs+0x4c/0x80
[  207.403601] [c0000003de397930] [c00000000010866c] __alloc_workqueue_key+0x16c/0x4e0
[  207.403615] [c0000003de3979f0] [d000000013de5ce0] ext4_fill_super+0x1c70/0x3390 [ext4]
[  207.403620] [c0000003de397b30] [c00000000031739c] mount_bdev+0x21c/0x250
[  207.403633] [c0000003de397bd0] [d000000013dddb80] ext4_mount+0x20/0x40 [ext4]
[  207.403637] [c0000003de397bf0] [c000000000318944] mount_fs+0x74/0x210
[  207.403641] [c0000003de397ca0] [c000000000340638] vfs_kern_mount+0x68/0x1d0
[  207.403644] [c0000003de397d10] [c000000000345348] do_mount+0x278/0xef0
[  207.403648] [c0000003de397de0] [c0000000003463e4] SyS_mount+0x94/0x100
[  207.403652] [c0000003de397e30] [c00000000000af84] system_call+0x38/0xe0
[  207.403655] Instruction dump:
[  207.403658] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000
[  207.403663] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
[  207.403669] ---[ end trace 4fa94bf890f28f69 ]---

Today I've finally found a host that could reliably trigger the crash by
mounting an ext4 filesystem and I've done a git bisect. The first bad
pointed to this commit:

commit 9c355917fcf006af47ffaa5ae43a1a804764a6f6
Author: Balbir Singh <bsingharora@gmail.com>
Date:   Wed Apr 12 16:35:19 2017 +1000

    powerpc/tracing: Allow tracing of mmap syscalls
    
    Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
    syscall tracing machinery. This means users are not able to see the execution of
    mmap() syscalls using the syscall tracer.
    
    Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
    meta-data associated with these syscalls is visible to the syscall tracer.
    
    A side-effect of this change is that the return type has changed from unsigned
    long to long. However this should have no effect, the only code in the kernel
    which uses the result of these syscalls is in the syscall return path, which is
    written in asm and treats the result as unsigned regardless.
    
    Example output:
      cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
      cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
      cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
      cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
    
    Signed-off-by: Balbir Singh <bsingharora@gmail.com>
    [mpe: Massage change log, add detail on return type change]
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

And I've confirmed that reverting above commit 'resolves' the crash. I
appended memory and cpu information of the host to the end of this
email, if you need more detailed information please let me know.

Thanks,
Eryu

[root@ibm-p8-03-lp6 ~]# free
              total        used        free      shared  buff/cache   available
Mem:       18756864      399552    17880704       12672      476608    17470592
Swap:       7864256           0     7864256
[root@ibm-p8-03-lp6 ~]# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          3
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
NUMA node2 CPU(s):     8-15
NUMA node3 CPU(s):

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-28  8:32 [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Eryu Guan
@ 2017-06-28 17:16 ` Balbir Singh
  2017-06-29  3:41   ` Eryu Guan
  2017-06-29  4:54 ` [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Michael Ellerman
  2017-06-29 10:27 ` Michael Ellerman
  2 siblings, 1 reply; 23+ messages in thread
From: Balbir Singh @ 2017-06-28 17:16 UTC (permalink / raw)
  To: Eryu Guan; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), liwan

On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> Hi all,
>
> Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
> page access. But it's not reproducing on every ppc64le host we've
> tested, but it usually happened in filesystem testings.
>
> [  207.403459] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c5e7f
> [  207.403470] Faulting instruction address: 0xc0000000004d470c
> [  207.403475] Oops: Kernel access of bad area, sig: 7 [#1]
> [  207.403477] SMP NR_CPUS=2048
> [  207.403478] NUMA
> [  207.403480] pSeries
> [  207.403483] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp
> [  207.403503] CPU: 0 PID: 2263 Comm: mount Not tainted 4.12.0-rc7 #26
> [  207.403506] task: c0000003ef2fde00 task.stack: c0000003de394000
> [  207.403509] NIP: c0000000004d470c LR: c00000000011cd24 CTR: c000000000130de0
> [  207.403512] REGS: c0000003de397450 TRAP: 0600   Not tainted  (4.12.0-rc7)
> [  207.403515] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> [  207.403521]   CR: 28028844  XER: 00000001
> [  207.403525] CFAR: c00000000011cd20 DAR: c0000001c52c5e7f DSISR: 00000000 SOFTE: 0
> [  207.403525] GPR00: c00000000011cce8 c0000003de3976d0 c000000001049500 c0000003f2c6ec20
> [  207.403525] GPR04: c0000003f2c6ec20 c0000001c52c5e7f 0000000000000000 0000000000000001
> [  207.403525] GPR08: 000c5543cab19830 0000000198e19900 0000000000000008 0000000000000000
> [  207.403525] GPR12: c000000000130de0 c00000000fac0000 0000000000000000 c0000003f1328000
> [  207.403525] GPR16: 0000000000000000 c0000003de700400 0000000000000000 c0000003de700594
> [  207.403525] GPR20: 0000000000000002 0000000000000000 0000000000004000 c000000000cc5780
> [  207.403525] GPR24: 00000001c45ffc5f 0000000000000000 00000001c45ffc5f c00000000107dd00
> [  207.403525] GPR28: c0000003f2c6f434 0000000000000004 0000000000000800 c0000003f2c6ec00
> [  207.403567] NIP [c0000000004d470c] llist_add_batch+0xc/0x40
> [  207.403571] LR [c00000000011cd24] try_to_wake_up+0x4a4/0x5b0
> [  207.403573] Call Trace:
> [  207.403576] [c0000003de3976d0] [c00000000011cce8] try_to_wake_up+0x468/0x5b0 (unreliable)
> [  207.403581] [c0000003de397750] [c000000000102cc8] create_worker+0x148/0x250
> [  207.403585] [c0000003de3977f0] [c000000000105e7c] alloc_unbound_pwq+0x3bc/0x4c0
> [  207.403589] [c0000003de397850] [c0000000001064bc] apply_wqattrs_prepare+0x2ac/0x320
> [  207.403593] [c0000003de3978c0] [c00000000010656c] apply_workqueue_attrs_locked+0x3c/0xa0
> [  207.403597] [c0000003de3978f0] [c000000000106acc] apply_workqueue_attrs+0x4c/0x80
> [  207.403601] [c0000003de397930] [c00000000010866c] __alloc_workqueue_key+0x16c/0x4e0
> [  207.403615] [c0000003de3979f0] [d000000013de5ce0] ext4_fill_super+0x1c70/0x3390 [ext4]
> [  207.403620] [c0000003de397b30] [c00000000031739c] mount_bdev+0x21c/0x250
> [  207.403633] [c0000003de397bd0] [d000000013dddb80] ext4_mount+0x20/0x40 [ext4]
> [  207.403637] [c0000003de397bf0] [c000000000318944] mount_fs+0x74/0x210
> [  207.403641] [c0000003de397ca0] [c000000000340638] vfs_kern_mount+0x68/0x1d0
> [  207.403644] [c0000003de397d10] [c000000000345348] do_mount+0x278/0xef0
> [  207.403648] [c0000003de397de0] [c0000000003463e4] SyS_mount+0x94/0x100
> [  207.403652] [c0000003de397e30] [c00000000000af84] system_call+0x38/0xe0
> [  207.403655] Instruction dump:
> [  207.403658] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000
> [  207.403663] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
> [  207.403669] ---[ end trace 4fa94bf890f28f69 ]---
>
> Today I've finally found a host that could reliably trigger the crash by
> mounting an ext4 filesystem and I've done a git bisect. The first bad
> pointed to this commit:

Thanks for the excellent bug report, I am a little lost on the stack
trace, it shows a bad page access that we think is triggered by the
mmap changes? The patch changed the return type to integrate the call
into trace-cmd. Could you point me to the tests that can help
reproduce the crash. Could you also suggest how long to try the test
cases for?

Balbir Singh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-28 17:16 ` Balbir Singh
@ 2017-06-29  3:41   ` Eryu Guan
  2017-06-29  8:47     ` Balbir Singh
  0 siblings, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-06-29  3:41 UTC (permalink / raw)
  To: Balbir Singh; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), liwan

On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> > Hi all,
> >
> > Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
> > page access. But it's not reproducing on every ppc64le host we've
> > tested, but it usually happened in filesystem testings.
> >
> > [  207.403459] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c5e7f
> > [  207.403470] Faulting instruction address: 0xc0000000004d470c
> > [  207.403475] Oops: Kernel access of bad area, sig: 7 [#1]
> > [  207.403477] SMP NR_CPUS=2048
> > [  207.403478] NUMA
> > [  207.403480] pSeries
> > [  207.403483] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp
> > [  207.403503] CPU: 0 PID: 2263 Comm: mount Not tainted 4.12.0-rc7 #26
> > [  207.403506] task: c0000003ef2fde00 task.stack: c0000003de394000
> > [  207.403509] NIP: c0000000004d470c LR: c00000000011cd24 CTR: c000000000130de0
> > [  207.403512] REGS: c0000003de397450 TRAP: 0600   Not tainted  (4.12.0-rc7)
> > [  207.403515] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> > [  207.403521]   CR: 28028844  XER: 00000001
> > [  207.403525] CFAR: c00000000011cd20 DAR: c0000001c52c5e7f DSISR: 00000000 SOFTE: 0
> > [  207.403525] GPR00: c00000000011cce8 c0000003de3976d0 c000000001049500 c0000003f2c6ec20
> > [  207.403525] GPR04: c0000003f2c6ec20 c0000001c52c5e7f 0000000000000000 0000000000000001
> > [  207.403525] GPR08: 000c5543cab19830 0000000198e19900 0000000000000008 0000000000000000
> > [  207.403525] GPR12: c000000000130de0 c00000000fac0000 0000000000000000 c0000003f1328000
> > [  207.403525] GPR16: 0000000000000000 c0000003de700400 0000000000000000 c0000003de700594
> > [  207.403525] GPR20: 0000000000000002 0000000000000000 0000000000004000 c000000000cc5780
> > [  207.403525] GPR24: 00000001c45ffc5f 0000000000000000 00000001c45ffc5f c00000000107dd00
> > [  207.403525] GPR28: c0000003f2c6f434 0000000000000004 0000000000000800 c0000003f2c6ec00
> > [  207.403567] NIP [c0000000004d470c] llist_add_batch+0xc/0x40
> > [  207.403571] LR [c00000000011cd24] try_to_wake_up+0x4a4/0x5b0
> > [  207.403573] Call Trace:
> > [  207.403576] [c0000003de3976d0] [c00000000011cce8] try_to_wake_up+0x468/0x5b0 (unreliable)
> > [  207.403581] [c0000003de397750] [c000000000102cc8] create_worker+0x148/0x250
> > [  207.403585] [c0000003de3977f0] [c000000000105e7c] alloc_unbound_pwq+0x3bc/0x4c0
> > [  207.403589] [c0000003de397850] [c0000000001064bc] apply_wqattrs_prepare+0x2ac/0x320
> > [  207.403593] [c0000003de3978c0] [c00000000010656c] apply_workqueue_attrs_locked+0x3c/0xa0
> > [  207.403597] [c0000003de3978f0] [c000000000106acc] apply_workqueue_attrs+0x4c/0x80
> > [  207.403601] [c0000003de397930] [c00000000010866c] __alloc_workqueue_key+0x16c/0x4e0
> > [  207.403615] [c0000003de3979f0] [d000000013de5ce0] ext4_fill_super+0x1c70/0x3390 [ext4]
> > [  207.403620] [c0000003de397b30] [c00000000031739c] mount_bdev+0x21c/0x250
> > [  207.403633] [c0000003de397bd0] [d000000013dddb80] ext4_mount+0x20/0x40 [ext4]
> > [  207.403637] [c0000003de397bf0] [c000000000318944] mount_fs+0x74/0x210
> > [  207.403641] [c0000003de397ca0] [c000000000340638] vfs_kern_mount+0x68/0x1d0
> > [  207.403644] [c0000003de397d10] [c000000000345348] do_mount+0x278/0xef0
> > [  207.403648] [c0000003de397de0] [c0000000003463e4] SyS_mount+0x94/0x100
> > [  207.403652] [c0000003de397e30] [c00000000000af84] system_call+0x38/0xe0
> > [  207.403655] Instruction dump:
> > [  207.403658] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000
> > [  207.403663] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
> > [  207.403669] ---[ end trace 4fa94bf890f28f69 ]---
> >
> > Today I've finally found a host that could reliably trigger the crash by
> > mounting an ext4 filesystem and I've done a git bisect. The first bad
> > pointed to this commit:
> 
> Thanks for the excellent bug report, I am a little lost on the stack
> trace, it shows a bad page access that we think is triggered by the
> mmap changes? The patch changed the return type to integrate the call
> into trace-cmd. Could you point me to the tests that can help
> reproduce the crash. Could you also suggest how long to try the test
> cases for?

Sorry, I should have provided it in the first place. It's as simple as
mounting an ext4 filesystem on my test ppc64le host, i.e.

mkdir -p /mnt/ext4
mkfs -t ext4 -F /dev/sda5
mount /dev/sda5 /mnt/ext4

Kernel crash happened right after the mount command, and it's 100%
reproduced for me. I've tried the same reproducer on other ppc64 or
ppc64le hosts but not all of them could reproduce.

BTW, I just reverted the commit in question (9c355917fc) on top of
v4.12-rc7 kernel and the crash is gone.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-28  8:32 [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Eryu Guan
  2017-06-28 17:16 ` Balbir Singh
@ 2017-06-29  4:54 ` Michael Ellerman
  2017-06-29 10:27 ` Michael Ellerman
  2 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2017-06-29  4:54 UTC (permalink / raw)
  To: Eryu Guan, linuxppc-dev; +Cc: liwan

Hi Eryu,

Thanks for the bug report.

Eryu Guan <eguan@redhat.com> writes:
> Hi all,
>
> Li Wang and I are constantly seeing ppc64le hosts crashing due to bad

I'm curious why you're seeing this and not other folks. What compiler
are you using?

> page access. But it's not reproducing on every ppc64le host we've
> tested, but it usually happened in filesystem testings.
>
> [  207.403459] Unable to handle kernel paging request for unaligned access at address 0xc0000001c52c5e7f
                                                            ^^^^^^^^^                                    ^

> [  207.403470] Faulting instruction address: 0xc0000000004d470c

Which is:

ldarx   r10,0,r5

r5 = c0000001c52c5e7f 

So that makes sense, if you ldarx an unaligned address you get an
alignment fault.

> [  207.403475] Oops: Kernel access of bad area, sig: 7 [#1]
> [  207.403477] SMP NR_CPUS=2048
> [  207.403478] NUMA
> [  207.403480] pSeries
> [  207.403483] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp
> [  207.403503] CPU: 0 PID: 2263 Comm: mount Not tainted 4.12.0-rc7 #26
> [  207.403506] task: c0000003ef2fde00 task.stack: c0000003de394000
> [  207.403509] NIP: c0000000004d470c LR: c00000000011cd24 CTR: c000000000130de0
> [  207.403512] REGS: c0000003de397450 TRAP: 0600   Not tainted  (4.12.0-rc7)
> [  207.403515] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> [  207.403521]   CR: 28028844  XER: 00000001
> [  207.403525] CFAR: c00000000011cd20 DAR: c0000001c52c5e7f DSISR: 00000000 SOFTE: 0
> [  207.403525] GPR00: c00000000011cce8 c0000003de3976d0 c000000001049500 c0000003f2c6ec20
> [  207.403525] GPR04: c0000003f2c6ec20 c0000001c52c5e7f 0000000000000000 0000000000000001
> [  207.403525] GPR08: 000c5543cab19830 0000000198e19900 0000000000000008 0000000000000000
> [  207.403525] GPR12: c000000000130de0 c00000000fac0000 0000000000000000 c0000003f1328000
> [  207.403525] GPR16: 0000000000000000 c0000003de700400 0000000000000000 c0000003de700594
> [  207.403525] GPR20: 0000000000000002 0000000000000000 0000000000004000 c000000000cc5780
> [  207.403525] GPR24: 00000001c45ffc5f 0000000000000000 00000001c45ffc5f c00000000107dd00
> [  207.403525] GPR28: c0000003f2c6f434 0000000000000004 0000000000000800 c0000003f2c6ec00
> [  207.403567] NIP [c0000000004d470c] llist_add_batch+0xc/0x40

bool llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
		     struct llist_head *head)
{
	struct llist_node *first;

	do {
		new_last->next = first = ACCESS_ONCE(head->first);
	} while (cmpxchg(&head->first, first, new_first) != first);

So it's the cmpxchg().

__cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
{
	unsigned long prev;

	__asm__ __volatile__ (
	PPC_ATOMIC_ENTRY_BARRIER
"1:	ldarx	%0,0,%2		# __cmpxchg_u64\n\

> [  207.403571] LR [c00000000011cd24] try_to_wake_up+0x4a4/0x5b0

try_to_wake_up(p, ..)
  -> ttwu_queue(p, cpu, wake_flags);
     -> ttwu_queue_remote(p, cpu, wake_flags);

static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags)
{
	struct rq *rq = cpu_rq(cpu);

	p->sched_remote_wakeup = !!(wake_flags & WF_MIGRATED);

	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) {

static inline bool llist_add(struct llist_node *new, struct llist_head *head)
{
	return llist_add_batch(new, new, head);
}

So the cmpxchg is:

        cmpxchg(&head->first, first, new_first) != first)

Where head is &cpu_rq(cpu)->wake_list.

cpu came from try_to_wake_up() which did:

	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);


So possibly the cpu value is bogus. Or ..

#define cpu_rq(cpu)		(&per_cpu(runqueues, (cpu)))

That runqueues variable has become corrupted?

You might be able to work it out from the register dump, and the full
disassembly of the kernel. Or you could add some printks() in there and
reproduce it.

cheers

> [  207.403573] Call Trace:
> [  207.403576] [c0000003de3976d0] [c00000000011cce8] try_to_wake_up+0x468/0x5b0 (unreliable)
> [  207.403581] [c0000003de397750] [c000000000102cc8] create_worker+0x148/0x250
> [  207.403585] [c0000003de3977f0] [c000000000105e7c] alloc_unbound_pwq+0x3bc/0x4c0
> [  207.403589] [c0000003de397850] [c0000000001064bc] apply_wqattrs_prepare+0x2ac/0x320
> [  207.403593] [c0000003de3978c0] [c00000000010656c] apply_workqueue_attrs_locked+0x3c/0xa0
> [  207.403597] [c0000003de3978f0] [c000000000106acc] apply_workqueue_attrs+0x4c/0x80
> [  207.403601] [c0000003de397930] [c00000000010866c] __alloc_workqueue_key+0x16c/0x4e0
> [  207.403615] [c0000003de3979f0] [d000000013de5ce0] ext4_fill_super+0x1c70/0x3390 [ext4]
> [  207.403620] [c0000003de397b30] [c00000000031739c] mount_bdev+0x21c/0x250
> [  207.403633] [c0000003de397bd0] [d000000013dddb80] ext4_mount+0x20/0x40 [ext4]
> [  207.403637] [c0000003de397bf0] [c000000000318944] mount_fs+0x74/0x210
> [  207.403641] [c0000003de397ca0] [c000000000340638] vfs_kern_mount+0x68/0x1d0
> [  207.403644] [c0000003de397d10] [c000000000345348] do_mount+0x278/0xef0
> [  207.403648] [c0000003de397de0] [c0000000003463e4] SyS_mount+0x94/0x100
> [  207.403652] [c0000003de397e30] [c00000000000af84] system_call+0x38/0xe0
> [  207.403655] Instruction dump:
> [  207.403658] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000
> [  207.403663] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad
> [  207.403669] ---[ end trace 4fa94bf890f28f69 ]---
>
> Today I've finally found a host that could reliably trigger the crash by
> mounting an ext4 filesystem and I've done a git bisect. The first bad
> pointed to this commit:
>
> commit 9c355917fcf006af47ffaa5ae43a1a804764a6f6
> Author: Balbir Singh <bsingharora@gmail.com>
> Date:   Wed Apr 12 16:35:19 2017 +1000
>
>     powerpc/tracing: Allow tracing of mmap syscalls
>     
>     Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
>     syscall tracing machinery. This means users are not able to see the execution of
>     mmap() syscalls using the syscall tracer.
>     
>     Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
>     meta-data associated with these syscalls is visible to the syscall tracer.
>     
>     A side-effect of this change is that the return type has changed from unsigned
>     long to long. However this should have no effect, the only code in the kernel
>     which uses the result of these syscalls is in the syscall return path, which is
>     written in asm and treats the result as unsigned regardless.
>     
>     Example output:
>       cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
>       cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
>       cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
>       cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
>     
>     Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>     [mpe: Massage change log, add detail on return type change]
>     Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>
> And I've confirmed that reverting above commit 'resolves' the crash. I
> appended memory and cpu information of the host to the end of this
> email, if you need more detailed information please let me know.
>
> Thanks,
> Eryu
>
> [root@ibm-p8-03-lp6 ~]# free
>               total        used        free      shared  buff/cache   available
> Mem:       18756864      399552    17880704       12672      476608    17470592
> Swap:       7864256           0     7864256
> [root@ibm-p8-03-lp6 ~]# lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          3
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> Hypervisor vendor:     (null)
> Virtualization type:   full
> L1d cache:             64K
> L1i cache:             32K
> NUMA node0 CPU(s):     0-7
> NUMA node2 CPU(s):     8-15
> NUMA node3 CPU(s):

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29  3:41   ` Eryu Guan
@ 2017-06-29  8:47     ` Balbir Singh
  2017-06-29  9:04       ` Eryu Guan
  2017-06-29 10:05       ` Eryu Guan
  0 siblings, 2 replies; 23+ messages in thread
From: Balbir Singh @ 2017-06-29  8:47 UTC (permalink / raw)
  To: Eryu Guan; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), liwan

On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
>> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
<snip>
>> Thanks for the excellent bug report, I am a little lost on the stack
>> trace, it shows a bad page access that we think is triggered by the
>> mmap changes? The patch changed the return type to integrate the call
>> into trace-cmd. Could you point me to the tests that can help
>> reproduce the crash. Could you also suggest how long to try the test
>> cases for?
>
> Sorry, I should have provided it in the first place. It's as simple as
> mounting an ext4 filesystem on my test ppc64le host, i.e.
>
> mkdir -p /mnt/ext4
> mkfs -t ext4 -F /dev/sda5
> mount /dev/sda5 /mnt/ext4
>

I tried this test a few times with the kernel and could not reproduce it.
Could you please share the config and compiler details, I'll retry with -rc7.

In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
slub/slab debug, list corruption, etc catch anything at the time of the
corruption?

Thanks,
Balbir Singh.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29  8:47     ` Balbir Singh
@ 2017-06-29  9:04       ` Eryu Guan
  2017-06-29 10:05       ` Eryu Guan
  1 sibling, 0 replies; 23+ messages in thread
From: Eryu Guan @ 2017-06-29  9:04 UTC (permalink / raw)
  To: Balbir Singh; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), liwan

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]

On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> <snip>
> >> Thanks for the excellent bug report, I am a little lost on the stack
> >> trace, it shows a bad page access that we think is triggered by the
> >> mmap changes? The patch changed the return type to integrate the call
> >> into trace-cmd. Could you point me to the tests that can help
> >> reproduce the crash. Could you also suggest how long to try the test
> >> cases for?
> >
> > Sorry, I should have provided it in the first place. It's as simple as
> > mounting an ext4 filesystem on my test ppc64le host, i.e.
> >
> > mkdir -p /mnt/ext4
> > mkfs -t ext4 -F /dev/sda5
> > mount /dev/sda5 /mnt/ext4
> >
> 
> I tried this test a few times with the kernel and could not reproduce it.

Yes, it's not reproduced on every host, I'm not sure what makes my test
host so unique yet.

> Could you please share the config and compiler details, I'll retry with -rc7.

[root@ibm-p8-03-lp6 ~]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

[root@ibm-p8-03-lp6 ~]# rpm -q gcc
gcc-4.8.5-16.el7.ppc64le

I attached kernel config file.

> 
> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
> slub/slab debug, list corruption, etc catch anything at the time of the
> corruption?

OK, I'll retry with a debug kernel and report back.

Thanks,
Eryu

[-- Attachment #2: config-ppc64le.bz2 --]
[-- Type: application/x-bzip2, Size: 32218 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29  8:47     ` Balbir Singh
  2017-06-29  9:04       ` Eryu Guan
@ 2017-06-29 10:05       ` Eryu Guan
  2017-06-29 11:12         ` Michael Ellerman
  1 sibling, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-06-29 10:05 UTC (permalink / raw)
  To: Balbir Singh; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), liwan

[-- Attachment #1: Type: text/plain, Size: 4614 bytes --]

On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> <snip>
> >> Thanks for the excellent bug report, I am a little lost on the stack
> >> trace, it shows a bad page access that we think is triggered by the
> >> mmap changes? The patch changed the return type to integrate the call
> >> into trace-cmd. Could you point me to the tests that can help
> >> reproduce the crash. Could you also suggest how long to try the test
> >> cases for?
> >
> > Sorry, I should have provided it in the first place. It's as simple as
> > mounting an ext4 filesystem on my test ppc64le host, i.e.
> >
> > mkdir -p /mnt/ext4
> > mkfs -t ext4 -F /dev/sda5
> > mount /dev/sda5 /mnt/ext4
> >
> 
> I tried this test a few times with the kernel and could not reproduce it.
> Could you please share the config and compiler details, I'll retry with -rc7.
> 
> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
> slub/slab debug, list corruption, etc catch anything at the time of the
> corruption?

Testing with debug kernel (config file attached) didn't trigger kernel
crash, but only warnings

[   99.686770] ------------[ cut here ]------------
[   99.686868] WARNING: CPU: 1 PID: 2272 at ./include/linux/cpumask.h:121 try_to_wake_up+0x17c/0x8f0
[   99.686873] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   99.686950] CPU: 1 PID: 2272 Comm: mount Not tainted 4.12.0-rc7.debug #28
[   99.686955] task: c0000003f00b7b00 task.stack: c0000003f25e0000
[   99.686959] NIP: c0000000001359ec LR: c000000000135ed4 CTR: c00000000016f940
[   99.686964] REGS: c0000003f25e3420 TRAP: 0700   Not tainted  (4.12.0-rc7.debug)
[   99.686968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   99.686994]   CR: 28028822  XER: 00000001
[   99.687000] CFAR: c000000000135cb4 SOFTE: 0
[   99.687000] GPR00: c000000000135da0 c0000003f25e36a0 c000000001751800 00000000000000a0
[   99.687000] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
[   99.687000] GPR08: ffffffffffffffff 00000000000000a0 0000000000000000 00000000000041e0
[   99.687000] GPR12: 0000000000008800 c00000000fac0a80 0000000000000002 c0000003fd20b000
[   99.687000] GPR16: c0000003cabb0400 0000000000000000 0000000000000000 0000000000000002
[   99.687000] GPR20: 0000000000000000 c0000003f7a59d60 c000000001326300 c000000001795d00
[   99.687000] GPR24: c000000001799d48 0000000000000000 c00000000179a294 c0000003ec786be8
[   99.687000] GPR28: 0000000000000000 c0000003ec786680 00000000000000a0 c0000003ec786300
[   99.687083] NIP [c0000000001359ec] try_to_wake_up+0x17c/0x8f0
[   99.687088] LR [c000000000135ed4] try_to_wake_up+0x664/0x8f0
[   99.687092] Call Trace:
[   99.687095] [c0000003f25e36a0] [c000000000135da0] try_to_wake_up+0x530/0x8f0 (unreliable)
[   99.687104] [c0000003f25e3730] [c000000000114ea8] create_worker+0x148/0x220
[   99.687110] [c0000003f25e37d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
[   99.687117] [c0000003f25e3830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
[   99.687123] [c0000003f25e38a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
[   99.687130] [c0000003f25e38d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
[   99.687137] [c0000003f25e3910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
[   99.687155] [c0000003f25e39d0] [d000000013dd1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   99.687162] [c0000003f25e3b10] [c000000000390f7c] mount_bdev+0x22c/0x260
[   99.687178] [c0000003f25e3bb0] [d000000013dc9020] ext4_mount+0x20/0x40 [ext4]
[   99.687184] [c0000003f25e3bd0] [c0000000003923c4] mount_fs+0x74/0x210
[   99.687191] [c0000003f25e3c80] [c0000000003c0688] vfs_kern_mount+0x78/0x220
[   99.687197] [c0000003f25e3d00] [c0000000003c6044] do_mount+0x254/0xf70
[   99.687204] [c0000003f25e3de0] [c0000000003c7184] SyS_mount+0x94/0x100
[   99.687210] [c0000003f25e3e30] [c00000000000b190] system_call+0x38/0xe0
[   99.687215] Instruction dump:
[   99.687220] 409d000c 39200004 9121002c 387d0018 4803be2d 60000000 7fa3eb78 48911321
[   99.687236] 60000000 2fb70000 409e0124 480001e0 <0fe00000> 7fca3670 7d4a0194 57c906be
[   99.687252] ---[ end trace e80d5ad75ae4c2a0 ]---
[   99.691902] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

Thanks,
Eryu

[-- Attachment #2: config-ppc64le-debug.bz2 --]
[-- Type: application/x-bzip2, Size: 32415 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-28  8:32 [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Eryu Guan
  2017-06-28 17:16 ` Balbir Singh
  2017-06-29  4:54 ` [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Michael Ellerman
@ 2017-06-29 10:27 ` Michael Ellerman
  2017-06-29 10:33   ` Eryu Guan
  2 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-06-29 10:27 UTC (permalink / raw)
  To: Eryu Guan, linuxppc-dev; +Cc: liwan

Eryu Guan <eguan@redhat.com> writes:

> Hi all,
>
> Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
> page access. But it's not reproducing on every ppc64le host we've
> tested, but it usually happened in filesystem testings.

<snip>

> And I've confirmed that reverting above commit 'resolves' the crash.

Do you mean ~v4.12-rc7 with that commit reverted still triggers the
crash?

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29 10:27 ` Michael Ellerman
@ 2017-06-29 10:33   ` Eryu Guan
  2017-06-29 12:13     ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-06-29 10:33 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, liwan

On Thu, Jun 29, 2017 at 08:27:11PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> 
> > Hi all,
> >
> > Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
> > page access. But it's not reproducing on every ppc64le host we've
> > tested, but it usually happened in filesystem testings.
> 
> <snip>
> 
> > And I've confirmed that reverting above commit 'resolves' the crash.
> 
> Do you mean ~v4.12-rc7 with that commit reverted still triggers the
> crash?

Correct. I also confirmed that reverting it when it was HEAD also fixed
the crash.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29 10:05       ` Eryu Guan
@ 2017-06-29 11:12         ` Michael Ellerman
  2017-06-29 11:39           ` Eryu Guan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-06-29 11:12 UTC (permalink / raw)
  To: Eryu Guan, Balbir Singh
  Cc: liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Eryu Guan <eguan@redhat.com> writes:

> On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
>> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
>> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
>> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
>> <snip>
>> >> Thanks for the excellent bug report, I am a little lost on the stack
>> >> trace, it shows a bad page access that we think is triggered by the
>> >> mmap changes? The patch changed the return type to integrate the call
>> >> into trace-cmd. Could you point me to the tests that can help
>> >> reproduce the crash. Could you also suggest how long to try the test
>> >> cases for?
>> >
>> > Sorry, I should have provided it in the first place. It's as simple as
>> > mounting an ext4 filesystem on my test ppc64le host, i.e.
>> >
>> > mkdir -p /mnt/ext4
>> > mkfs -t ext4 -F /dev/sda5
>> > mount /dev/sda5 /mnt/ext4
>> 
>> I tried this test a few times with the kernel and could not reproduce it.
>> Could you please share the config and compiler details, I'll retry with -rc7.
>> 
>> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
>> slub/slab debug, list corruption, etc catch anything at the time of the
>> corruption?
>
> Testing with debug kernel (config file attached) didn't trigger kernel
> crash, but only warnings

But the warning says try_to_wake_up() is using a CPU number that's out
of bounds, which means when you lookup the runqueue for that CPU you
just get junk, and that's what was triggering the crash in your previous
report.

So at least that part of the mystery is solved.

> [   99.686770] ------------[ cut here ]------------
> [   99.686868] WARNING: CPU: 1 PID: 2272 at ./include/linux/cpumask.h:121 try_to_wake_up+0x17c/0x8f0

static inline unsigned int cpumask_check(unsigned int cpu)
{
#ifdef CONFIG_DEBUG_PER_CPU_MAPS
	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
	return cpu;
}

> [   99.686873] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
> [   99.686950] CPU: 1 PID: 2272 Comm: mount Not tainted 4.12.0-rc7.debug #28
> [   99.686955] task: c0000003f00b7b00 task.stack: c0000003f25e0000
> [   99.686959] NIP: c0000000001359ec LR: c000000000135ed4 CTR: c00000000016f940
> [   99.686964] REGS: c0000003f25e3420 TRAP: 0700   Not tainted  (4.12.0-rc7.debug)
> [   99.686968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> [   99.686994]   CR: 28028822  XER: 00000001
> [   99.687000] CFAR: c000000000135cb4 SOFTE: 0
> [   99.687000] GPR00: c000000000135da0 c0000003f25e36a0 c000000001751800 00000000000000a0
> [   99.687000] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
> [   99.687000] GPR08: ffffffffffffffff 00000000000000a0 0000000000000000 00000000000041e0
> [   99.687000] GPR12: 0000000000008800 c00000000fac0a80 0000000000000002 c0000003fd20b000
> [   99.687000] GPR16: c0000003cabb0400 0000000000000000 0000000000000000 0000000000000002
> [   99.687000] GPR20: 0000000000000000 c0000003f7a59d60 c000000001326300 c000000001795d00
> [   99.687000] GPR24: c000000001799d48 0000000000000000 c00000000179a294 c0000003ec786be8
> [   99.687000] GPR28: 0000000000000000 c0000003ec786680 00000000000000a0 c0000003ec786300
> [   99.687083] NIP [c0000000001359ec] try_to_wake_up+0x17c/0x8f0
> [   99.687088] LR [c000000000135ed4] try_to_wake_up+0x664/0x8f0
> [   99.687092] Call Trace:
> [   99.687095] [c0000003f25e36a0] [c000000000135da0] try_to_wake_up+0x530/0x8f0 (unreliable)
> [   99.687104] [c0000003f25e3730] [c000000000114ea8] create_worker+0x148/0x220
> [   99.687110] [c0000003f25e37d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
> [   99.687117] [c0000003f25e3830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
> [   99.687123] [c0000003f25e38a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
> [   99.687130] [c0000003f25e38d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
> [   99.687137] [c0000003f25e3910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0

We had a similar bug a few months back, caused by task->cpus_allowed
being fubar.

This looks similar, but different.

Can you try this debug patch? It might get us one step closer to the culprit.

cheers

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 803c3bc274c4..b7b712ad6778 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1565,6 +1565,14 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	else
 		cpu = cpumask_any(&p->cpus_allowed);
 
+	if (cpu >= nr_cpumask_bits) {
+		printk("%s: CPU %d out of range for task %p (%s)\n", __func__,
+			cpu, p, p->comm);
+		printk("p->cpus_allowed: %*pbl\n", cpumask_pr_args(&p->cpus_allowed));
+		dump_stack();
+		cpu = 0;
+	}
+
 	/*
 	 * In order not to call set_task_cpu() on a blocking task we need
 	 * to rely on ttwu() to place the task on a valid ->cpus_allowed

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29 11:12         ` Michael Ellerman
@ 2017-06-29 11:39           ` Eryu Guan
  2017-06-29 12:06             ` kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host) Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-06-29 11:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

On Thu, Jun 29, 2017 at 09:12:55PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> 
> > On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
> >> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
> >> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
> >> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> >> <snip>
> >> >> Thanks for the excellent bug report, I am a little lost on the stack
> >> >> trace, it shows a bad page access that we think is triggered by the
> >> >> mmap changes? The patch changed the return type to integrate the call
> >> >> into trace-cmd. Could you point me to the tests that can help
> >> >> reproduce the crash. Could you also suggest how long to try the test
> >> >> cases for?
> >> >
> >> > Sorry, I should have provided it in the first place. It's as simple as
> >> > mounting an ext4 filesystem on my test ppc64le host, i.e.
> >> >
> >> > mkdir -p /mnt/ext4
> >> > mkfs -t ext4 -F /dev/sda5
> >> > mount /dev/sda5 /mnt/ext4
> >> 
> >> I tried this test a few times with the kernel and could not reproduce it.
> >> Could you please share the config and compiler details, I'll retry with -rc7.
> >> 
> >> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
> >> slub/slab debug, list corruption, etc catch anything at the time of the
> >> corruption?
> >
> > Testing with debug kernel (config file attached) didn't trigger kernel
> > crash, but only warnings
> 
> But the warning says try_to_wake_up() is using a CPU number that's out
> of bounds, which means when you lookup the runqueue for that CPU you
> just get junk, and that's what was triggering the crash in your previous
> report.
> 
> So at least that part of the mystery is solved.
> 
> > [   99.686770] ------------[ cut here ]------------
> > [   99.686868] WARNING: CPU: 1 PID: 2272 at ./include/linux/cpumask.h:121 try_to_wake_up+0x17c/0x8f0
> 
> static inline unsigned int cpumask_check(unsigned int cpu)
> {
> #ifdef CONFIG_DEBUG_PER_CPU_MAPS
> 	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
> #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
> 	return cpu;
> }
> 
> > [   99.686873] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
> > [   99.686950] CPU: 1 PID: 2272 Comm: mount Not tainted 4.12.0-rc7.debug #28
> > [   99.686955] task: c0000003f00b7b00 task.stack: c0000003f25e0000
> > [   99.686959] NIP: c0000000001359ec LR: c000000000135ed4 CTR: c00000000016f940
> > [   99.686964] REGS: c0000003f25e3420 TRAP: 0700   Not tainted  (4.12.0-rc7.debug)
> > [   99.686968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> > [   99.686994]   CR: 28028822  XER: 00000001
> > [   99.687000] CFAR: c000000000135cb4 SOFTE: 0
> > [   99.687000] GPR00: c000000000135da0 c0000003f25e36a0 c000000001751800 00000000000000a0
> > [   99.687000] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
> > [   99.687000] GPR08: ffffffffffffffff 00000000000000a0 0000000000000000 00000000000041e0
> > [   99.687000] GPR12: 0000000000008800 c00000000fac0a80 0000000000000002 c0000003fd20b000
> > [   99.687000] GPR16: c0000003cabb0400 0000000000000000 0000000000000000 0000000000000002
> > [   99.687000] GPR20: 0000000000000000 c0000003f7a59d60 c000000001326300 c000000001795d00
> > [   99.687000] GPR24: c000000001799d48 0000000000000000 c00000000179a294 c0000003ec786be8
> > [   99.687000] GPR28: 0000000000000000 c0000003ec786680 00000000000000a0 c0000003ec786300
> > [   99.687083] NIP [c0000000001359ec] try_to_wake_up+0x17c/0x8f0
> > [   99.687088] LR [c000000000135ed4] try_to_wake_up+0x664/0x8f0
> > [   99.687092] Call Trace:
> > [   99.687095] [c0000003f25e36a0] [c000000000135da0] try_to_wake_up+0x530/0x8f0 (unreliable)
> > [   99.687104] [c0000003f25e3730] [c000000000114ea8] create_worker+0x148/0x220
> > [   99.687110] [c0000003f25e37d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
> > [   99.687117] [c0000003f25e3830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
> > [   99.687123] [c0000003f25e38a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
> > [   99.687130] [c0000003f25e38d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
> > [   99.687137] [c0000003f25e3910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
> 
> We had a similar bug a few months back, caused by task->cpus_allowed
> being fubar.
> 
> This looks similar, but different.
> 
> Can you try this debug patch? It might get us one step closer to the culprit.

[   69.039219] select_task_rq: CPU 160 out of range for task c0000003f0772780 (kworker/u321:0)
[   69.039312] p->cpus_allowed:
[   69.039317] CPU: 11 PID: 2230 Comm: mount Not tainted 4.12.0-rc7.debug+ #29
[   69.039322] Call Trace:
[   69.039328] [c0000003eee1b620] [c000000000a55f28] dump_stack+0xe8/0x154 (unreliable)
[   69.039338] [c0000003eee1b660] [c000000000135a2c] try_to_wake_up+0x1bc/0x940
[   69.039345] [c0000003eee1b730] [c000000000114ea8] create_worker+0x148/0x220
[   69.039352] [c0000003eee1b7d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
[   69.039358] [c0000003eee1b830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
[   69.039365] [c0000003eee1b8a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
[   69.039372] [c0000003eee1b8d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
[   69.039378] [c0000003eee1b910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
[   69.039399] [c0000003eee1b9d0] [d0000000141f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   69.039406] [c0000003eee1bb10] [c00000000039101c] mount_bdev+0x22c/0x260
[   69.039425] [c0000003eee1bbb0] [d0000000141e9020] ext4_mount+0x20/0x40 [ext4]
[   69.039431] [c0000003eee1bbd0] [c000000000392464] mount_fs+0x74/0x210
[   69.039438] [c0000003eee1bc80] [c0000000003c0728] vfs_kern_mount+0x78/0x220
[   69.039444] [c0000003eee1bd00] [c0000000003c60e4] do_mount+0x254/0xf70
[   69.039451] [c0000003eee1bde0] [c0000000003c7224] SyS_mount+0x94/0x100
[   69.039458] [c0000003eee1be30] [c00000000000b190] system_call+0x38/0xe0
[   69.044301] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

I applied this patch on top of 4.12-rc7 kernel, built with debug options
enabled. And kernel didn't print warning messages, didn't crash either.

Thanks,
Eryu

> 
> cheers
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 803c3bc274c4..b7b712ad6778 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1565,6 +1565,14 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
>  	else
>  		cpu = cpumask_any(&p->cpus_allowed);
>  
> +	if (cpu >= nr_cpumask_bits) {
> +		printk("%s: CPU %d out of range for task %p (%s)\n", __func__,
> +			cpu, p, p->comm);
> +		printk("p->cpus_allowed: %*pbl\n", cpumask_pr_args(&p->cpus_allowed));
> +		dump_stack();
> +		cpu = 0;
> +	}
> +
>  	/*
>  	 * In order not to call set_task_cpu() on a blocking task we need
>  	 * to rely on ttwu() to place the task on a valid ->cpus_allowed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-29 11:39           ` Eryu Guan
@ 2017-06-29 12:06             ` Michael Ellerman
  2017-06-29 13:59               ` Eryu Guan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-06-29 12:06 UTC (permalink / raw)
  To: Eryu Guan, tj
  Cc: Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Eryu Guan <eguan@redhat.com> writes:

> On Thu, Jun 29, 2017 at 09:12:55PM +1000, Michael Ellerman wrote:
>> Eryu Guan <eguan@redhat.com> writes:
>> 
>> > On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
>> >> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
>> >> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
>> >> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
>> >> <snip>
>> >> >> Thanks for the excellent bug report, I am a little lost on the stack
>> >> >> trace, it shows a bad page access that we think is triggered by the
>> >> >> mmap changes? The patch changed the return type to integrate the call
>> >> >> into trace-cmd. Could you point me to the tests that can help
>> >> >> reproduce the crash. Could you also suggest how long to try the test
>> >> >> cases for?
>> >> >
>> >> > Sorry, I should have provided it in the first place. It's as simple as
>> >> > mounting an ext4 filesystem on my test ppc64le host, i.e.
>> >> >
>> >> > mkdir -p /mnt/ext4
>> >> > mkfs -t ext4 -F /dev/sda5
>> >> > mount /dev/sda5 /mnt/ext4
>> >> 
>> >> I tried this test a few times with the kernel and could not reproduce it.
>> >> Could you please share the config and compiler details, I'll retry with -rc7.
>> >> 
>> >> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
>> >> slub/slab debug, list corruption, etc catch anything at the time of the
>> >> corruption?
>> >
>> > Testing with debug kernel (config file attached) didn't trigger kernel
>> > crash, but only warnings
>> 
>> But the warning says try_to_wake_up() is using a CPU number that's out
>> of bounds, which means when you lookup the runqueue for that CPU you
>> just get junk, and that's what was triggering the crash in your previous
>> report.
>> 
>> So at least that part of the mystery is solved.
>> 
>> > [   99.686770] ------------[ cut here ]------------
>> > [   99.686868] WARNING: CPU: 1 PID: 2272 at ./include/linux/cpumask.h:121 try_to_wake_up+0x17c/0x8f0
>> 
>> static inline unsigned int cpumask_check(unsigned int cpu)
>> {
>> #ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> 	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
>> #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
>> 	return cpu;
>> }
>> 
>> > [   99.686873] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
>> > [   99.686950] CPU: 1 PID: 2272 Comm: mount Not tainted 4.12.0-rc7.debug #28
>> > [   99.686955] task: c0000003f00b7b00 task.stack: c0000003f25e0000
>> > [   99.686959] NIP: c0000000001359ec LR: c000000000135ed4 CTR: c00000000016f940
>> > [   99.686964] REGS: c0000003f25e3420 TRAP: 0700   Not tainted  (4.12.0-rc7.debug)
>> > [   99.686968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
>> > [   99.686994]   CR: 28028822  XER: 00000001
>> > [   99.687000] CFAR: c000000000135cb4 SOFTE: 0
>> > [   99.687000] GPR00: c000000000135da0 c0000003f25e36a0 c000000001751800 00000000000000a0
>> > [   99.687000] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
>> > [   99.687000] GPR08: ffffffffffffffff 00000000000000a0 0000000000000000 00000000000041e0
>> > [   99.687000] GPR12: 0000000000008800 c00000000fac0a80 0000000000000002 c0000003fd20b000
>> > [   99.687000] GPR16: c0000003cabb0400 0000000000000000 0000000000000000 0000000000000002
>> > [   99.687000] GPR20: 0000000000000000 c0000003f7a59d60 c000000001326300 c000000001795d00
>> > [   99.687000] GPR24: c000000001799d48 0000000000000000 c00000000179a294 c0000003ec786be8
>> > [   99.687000] GPR28: 0000000000000000 c0000003ec786680 00000000000000a0 c0000003ec786300
>> > [   99.687083] NIP [c0000000001359ec] try_to_wake_up+0x17c/0x8f0
>> > [   99.687088] LR [c000000000135ed4] try_to_wake_up+0x664/0x8f0
>> > [   99.687092] Call Trace:
>> > [   99.687095] [c0000003f25e36a0] [c000000000135da0] try_to_wake_up+0x530/0x8f0 (unreliable)
>> > [   99.687104] [c0000003f25e3730] [c000000000114ea8] create_worker+0x148/0x220
>> > [   99.687110] [c0000003f25e37d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
>> > [   99.687117] [c0000003f25e3830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
>> > [   99.687123] [c0000003f25e38a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
>> > [   99.687130] [c0000003f25e38d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
>> > [   99.687137] [c0000003f25e3910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
>> 
>> We had a similar bug a few months back, caused by task->cpus_allowed
>> being fubar.
>> 
>> This looks similar, but different.
>> 
>> Can you try this debug patch? It might get us one step closer to the culprit.
>
> [   69.039219] select_task_rq: CPU 160 out of range for task c0000003f0772780 (kworker/u321:0)
> [   69.039312] p->cpus_allowed:
> [   69.039317] CPU: 11 PID: 2230 Comm: mount Not tainted 4.12.0-rc7.debug+ #29
> [   69.039322] Call Trace:
> [   69.039328] [c0000003eee1b620] [c000000000a55f28] dump_stack+0xe8/0x154 (unreliable)
> [   69.039338] [c0000003eee1b660] [c000000000135a2c] try_to_wake_up+0x1bc/0x940
> [   69.039345] [c0000003eee1b730] [c000000000114ea8] create_worker+0x148/0x220
> [   69.039352] [c0000003eee1b7d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
> [   69.039358] [c0000003eee1b830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
> [   69.039365] [c0000003eee1b8a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
> [   69.039372] [c0000003eee1b8d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
> [   69.039378] [c0000003eee1b910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
> [   69.039399] [c0000003eee1b9d0] [d0000000141f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
> [   69.039406] [c0000003eee1bb10] [c00000000039101c] mount_bdev+0x22c/0x260
> [   69.039425] [c0000003eee1bbb0] [d0000000141e9020] ext4_mount+0x20/0x40 [ext4]
> [   69.039431] [c0000003eee1bbd0] [c000000000392464] mount_fs+0x74/0x210
> [   69.039438] [c0000003eee1bc80] [c0000000003c0728] vfs_kern_mount+0x78/0x220
> [   69.039444] [c0000003eee1bd00] [c0000000003c60e4] do_mount+0x254/0xf70
> [   69.039451] [c0000003eee1bde0] [c0000000003c7224] SyS_mount+0x94/0x100
> [   69.039458] [c0000003eee1be30] [c00000000000b190] system_call+0x38/0xe0
> [   69.044301] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)
>
> I applied this patch on top of 4.12-rc7 kernel, built with debug options
> enabled.

So the question is why does kworker/u321:0 have an empty task->cpus_allowed ?

It's late here, but can you try this as well?

cheers


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..da4e0f969239 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1780,9 +1780,14 @@ static struct worker *create_worker(struct worker_pool *pool)
 	if (IS_ERR(worker->task))
 		goto fail;
 
+	WARN_ON(cpumask_empty(worker->task->cpus_allowed));
+
 	set_user_nice(worker->task, pool->attrs->nice);
 	kthread_bind_mask(worker->task, pool->attrs->cpumask);
 
+	WARN_ON(cpumask_empty(worker->task->cpus_allowed));
+	WARN_ON(cpumask_empty(pool->attrs->cpumask));
+
 	/* successful, attach the worker to the pool */
 	worker_attach_to_pool(worker, pool);
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host
  2017-06-29 10:33   ` Eryu Guan
@ 2017-06-29 12:13     ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2017-06-29 12:13 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linuxppc-dev, liwan

Eryu Guan <eguan@redhat.com> writes:

> On Thu, Jun 29, 2017 at 08:27:11PM +1000, Michael Ellerman wrote:
>> Eryu Guan <eguan@redhat.com> writes:
>> 
>> > Hi all,
>> >
>> > Li Wang and I are constantly seeing ppc64le hosts crashing due to bad
>> > page access. But it's not reproducing on every ppc64le host we've
>> > tested, but it usually happened in filesystem testings.
>> 
>> <snip>
>> 
>> > And I've confirmed that reverting above commit 'resolves' the crash.
>> 
>> Do you mean ~v4.12-rc7 with that commit reverted still triggers the
>> crash?
>
> Correct. I also confirmed that reverting it when it was HEAD also fixed
> the crash.

OK. The former, reverting it on top of mainline is more informative.

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-29 12:06             ` kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host) Michael Ellerman
@ 2017-06-29 13:59               ` Eryu Guan
  2017-06-29 14:24                 ` Tejun Heo
  2017-06-30 10:07                 ` Michael Ellerman
  0 siblings, 2 replies; 23+ messages in thread
From: Eryu Guan @ 2017-06-29 13:59 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

On Thu, Jun 29, 2017 at 10:06:31PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> 
> > On Thu, Jun 29, 2017 at 09:12:55PM +1000, Michael Ellerman wrote:
> >> Eryu Guan <eguan@redhat.com> writes:
> >> 
> >> > On Thu, Jun 29, 2017 at 06:47:50PM +1000, Balbir Singh wrote:
> >> >> On Thu, Jun 29, 2017 at 1:41 PM, Eryu Guan <eguan@redhat.com> wrote:
> >> >> > On Thu, Jun 29, 2017 at 03:16:10AM +1000, Balbir Singh wrote:
> >> >> >> On Wed, Jun 28, 2017 at 6:32 PM, Eryu Guan <eguan@redhat.com> wrote:
> >> >> <snip>
> >> >> >> Thanks for the excellent bug report, I am a little lost on the stack
> >> >> >> trace, it shows a bad page access that we think is triggered by the
> >> >> >> mmap changes? The patch changed the return type to integrate the call
> >> >> >> into trace-cmd. Could you point me to the tests that can help
> >> >> >> reproduce the crash. Could you also suggest how long to try the test
> >> >> >> cases for?
> >> >> >
> >> >> > Sorry, I should have provided it in the first place. It's as simple as
> >> >> > mounting an ext4 filesystem on my test ppc64le host, i.e.
> >> >> >
> >> >> > mkdir -p /mnt/ext4
> >> >> > mkfs -t ext4 -F /dev/sda5
> >> >> > mount /dev/sda5 /mnt/ext4
> >> >> 
> >> >> I tried this test a few times with the kernel and could not reproduce it.
> >> >> Could you please share the config and compiler details, I'll retry with -rc7.
> >> >> 
> >> >> In the meanwhile, does enabling kmemleak, DEBUG_PAGE_ALLOC,
> >> >> slub/slab debug, list corruption, etc catch anything at the time of the
> >> >> corruption?
> >> >
> >> > Testing with debug kernel (config file attached) didn't trigger kernel
> >> > crash, but only warnings
> >> 
> >> But the warning says try_to_wake_up() is using a CPU number that's out
> >> of bounds, which means when you lookup the runqueue for that CPU you
> >> just get junk, and that's what was triggering the crash in your previous
> >> report.
> >> 
> >> So at least that part of the mystery is solved.
> >> 
> >> > [   99.686770] ------------[ cut here ]------------
> >> > [   99.686868] WARNING: CPU: 1 PID: 2272 at ./include/linux/cpumask.h:121 try_to_wake_up+0x17c/0x8f0
> >> 
> >> static inline unsigned int cpumask_check(unsigned int cpu)
> >> {
> >> #ifdef CONFIG_DEBUG_PER_CPU_MAPS
> >> 	WARN_ON_ONCE(cpu >= nr_cpumask_bits);
> >> #endif /* CONFIG_DEBUG_PER_CPU_MAPS */
> >> 	return cpu;
> >> }
> >> 
> >> > [   99.686873] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
> >> > [   99.686950] CPU: 1 PID: 2272 Comm: mount Not tainted 4.12.0-rc7.debug #28
> >> > [   99.686955] task: c0000003f00b7b00 task.stack: c0000003f25e0000
> >> > [   99.686959] NIP: c0000000001359ec LR: c000000000135ed4 CTR: c00000000016f940
> >> > [   99.686964] REGS: c0000003f25e3420 TRAP: 0700   Not tainted  (4.12.0-rc7.debug)
> >> > [   99.686968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
> >> > [   99.686994]   CR: 28028822  XER: 00000001
> >> > [   99.687000] CFAR: c000000000135cb4 SOFTE: 0
> >> > [   99.687000] GPR00: c000000000135da0 c0000003f25e36a0 c000000001751800 00000000000000a0
> >> > [   99.687000] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
> >> > [   99.687000] GPR08: ffffffffffffffff 00000000000000a0 0000000000000000 00000000000041e0
> >> > [   99.687000] GPR12: 0000000000008800 c00000000fac0a80 0000000000000002 c0000003fd20b000
> >> > [   99.687000] GPR16: c0000003cabb0400 0000000000000000 0000000000000000 0000000000000002
> >> > [   99.687000] GPR20: 0000000000000000 c0000003f7a59d60 c000000001326300 c000000001795d00
> >> > [   99.687000] GPR24: c000000001799d48 0000000000000000 c00000000179a294 c0000003ec786be8
> >> > [   99.687000] GPR28: 0000000000000000 c0000003ec786680 00000000000000a0 c0000003ec786300
> >> > [   99.687083] NIP [c0000000001359ec] try_to_wake_up+0x17c/0x8f0
> >> > [   99.687088] LR [c000000000135ed4] try_to_wake_up+0x664/0x8f0
> >> > [   99.687092] Call Trace:
> >> > [   99.687095] [c0000003f25e36a0] [c000000000135da0] try_to_wake_up+0x530/0x8f0 (unreliable)
> >> > [   99.687104] [c0000003f25e3730] [c000000000114ea8] create_worker+0x148/0x220
> >> > [   99.687110] [c0000003f25e37d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
> >> > [   99.687117] [c0000003f25e3830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
> >> > [   99.687123] [c0000003f25e38a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
> >> > [   99.687130] [c0000003f25e38d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
> >> > [   99.687137] [c0000003f25e3910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
> >> 
> >> We had a similar bug a few months back, caused by task->cpus_allowed
> >> being fubar.
> >> 
> >> This looks similar, but different.
> >> 
> >> Can you try this debug patch? It might get us one step closer to the culprit.
> >
> > [   69.039219] select_task_rq: CPU 160 out of range for task c0000003f0772780 (kworker/u321:0)
> > [   69.039312] p->cpus_allowed:
> > [   69.039317] CPU: 11 PID: 2230 Comm: mount Not tainted 4.12.0-rc7.debug+ #29
> > [   69.039322] Call Trace:
> > [   69.039328] [c0000003eee1b620] [c000000000a55f28] dump_stack+0xe8/0x154 (unreliable)
> > [   69.039338] [c0000003eee1b660] [c000000000135a2c] try_to_wake_up+0x1bc/0x940
> > [   69.039345] [c0000003eee1b730] [c000000000114ea8] create_worker+0x148/0x220
> > [   69.039352] [c0000003eee1b7d0] [c00000000011a418] alloc_unbound_pwq+0x4c8/0x620
> > [   69.039358] [c0000003eee1b830] [c00000000011a9c4] apply_wqattrs_prepare+0x1f4/0x340
> > [   69.039365] [c0000003eee1b8a0] [c00000000011ab4c] apply_workqueue_attrs_locked+0x3c/0xa0
> > [   69.039372] [c0000003eee1b8d0] [c00000000011b094] apply_workqueue_attrs+0x54/0x90
> > [   69.039378] [c0000003eee1b910] [c00000000011d674] __alloc_workqueue_key+0x184/0x5b0
> > [   69.039399] [c0000003eee1b9d0] [d0000000141f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
> > [   69.039406] [c0000003eee1bb10] [c00000000039101c] mount_bdev+0x22c/0x260
> > [   69.039425] [c0000003eee1bbb0] [d0000000141e9020] ext4_mount+0x20/0x40 [ext4]
> > [   69.039431] [c0000003eee1bbd0] [c000000000392464] mount_fs+0x74/0x210
> > [   69.039438] [c0000003eee1bc80] [c0000000003c0728] vfs_kern_mount+0x78/0x220
> > [   69.039444] [c0000003eee1bd00] [c0000000003c60e4] do_mount+0x254/0xf70
> > [   69.039451] [c0000003eee1bde0] [c0000000003c7224] SyS_mount+0x94/0x100
> > [   69.039458] [c0000003eee1be30] [c00000000000b190] system_call+0x38/0xe0
> > [   69.044301] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)
> >
> > I applied this patch on top of 4.12-rc7 kernel, built with debug options
> > enabled.
> 
> So the question is why does kworker/u321:0 have an empty task->cpus_allowed ?
> 
> It's late here, but can you try this as well?
> 
> cheers
> 

I have to update the patch a bit to make it compile.

> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39ef764..da4e0f969239 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1780,9 +1780,14 @@ static struct worker *create_worker(struct worker_pool *pool)
>  	if (IS_ERR(worker->task))
>  		goto fail;
>  
> +	WARN_ON(cpumask_empty(worker->task->cpus_allowed));
                             /\
			      &  cpumask_empty expects a pointer
> +
>  	set_user_nice(worker->task, pool->attrs->nice);
>  	kthread_bind_mask(worker->task, pool->attrs->cpumask);
>  
> +	WARN_ON(cpumask_empty(worker->task->cpus_allowed));

same update to this WARN_ON.

> +	WARN_ON(cpumask_empty(pool->attrs->cpumask));

This is not changed.

> +
>  	/* successful, attach the worker to the pool */
>  	worker_attach_to_pool(worker, pool);
>  

Seems only the last two WARN_ON were triggered.

[   84.246263] ------------[ cut here ]------------
[   84.246287] WARNING: CPU: 0 PID: 2271 at kernel/workqueue.c:1788 create_worker+0x174/0x2c0
[   84.246292] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvsc
si ibmveth scsi_transport_srp
[   84.246340] CPU: 0 PID: 2271 Comm: mount Not tainted 4.12.0-rc7.debug+ #30
[   84.246345] task: c0000003f7eae680 task.stack: c0000003f4994000
[   84.246350] NIP: c000000000114ed4 LR: c000000000114ec4 CTR: c000000000134380
[   84.246354] REGS: c0000003f49974b0 TRAP: 0700   Not tainted  (4.12.0-rc7.debug+)
[   84.246358] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   84.246383]   CR: 28028888  XER: 00000001
[   84.246389] CFAR: c000000000581524 SOFTE: 1
[   84.246389] GPR00: c000000000114ea0 c0000003f4997730 c000000001751800 0000000000000001
[   84.246389] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   84.246389] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   84.246389] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd236000
[   84.246389] GPR16: c0000003ec8b0400 0000000000000000 0000000000000000 0000000000000002
[   84.246389] GPR20: 0000000000000000 c0000000fcd4f160 c0000003fd0387a0 c00000000179a294
[   84.246389] GPR24: c0000000fcd4f000 c00000000179ac70 c000000001935218 c0000003ef6df4a8
[   84.246389] GPR28: c0000003f4997790 00000000000000a0 c0000003fd25bc40 c0000003ef6df000
[   84.246474] NIP [c000000000114ed4] create_worker+0x174/0x2c0
[   84.246478] LR [c000000000114ec4] create_worker+0x164/0x2c0
[   84.246482] Call Trace:
[   84.246486] [c0000003f4997730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   84.246494] [c0000003f49977d0] [c00000000011a4b8] alloc_unbound_pwq+0x4c8/0x620
[   84.246501] [c0000003f4997830] [c00000000011aa64] apply_wqattrs_prepare+0x1f4/0x340
[   84.246507] [c0000003f49978a0] [c00000000011abec] apply_workqueue_attrs_locked+0x3c/0xa0
[   84.246514] [c0000003f49978d0] [c00000000011b134] apply_workqueue_attrs+0x54/0x90
[   84.246521] [c0000003f4997910] [c00000000011d714] __alloc_workqueue_key+0x184/0x5b0
[   84.246539] [c0000003f49979d0] [d0000000149f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   84.246546] [c0000003f4997b10] [c0000000003910bc] mount_bdev+0x22c/0x260
[   84.246562] [c0000003f4997bb0] [d0000000149e9020] ext4_mount+0x20/0x40 [ext4]
[   84.246569] [c0000003f4997bd0] [c000000000392504] mount_fs+0x74/0x210
[   84.246575] [c0000003f4997c80] [c0000000003c07c8] vfs_kern_mount+0x78/0x220
[   84.246582] [c0000003f4997d00] [c0000000003c6184] do_mount+0x254/0xf70
[   84.246588] [c0000003f4997de0] [c0000000003c72c4] SyS_mount+0x94/0x100
[   84.246596] [c0000003f4997e30] [c00000000000b190] system_call+0x38/0xe0
[   84.246601] Instruction dump:
[   84.246606] 3d220005 39298a94 e87e0040 38a00000 83a90000 38630380 7fa4eb78 4846c6a9
[   84.246622] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 3d420005 394a8a94 e93f04b8
[   84.246638] ---[ end trace ad05638ce2893be0 ]---
[   84.246643] ------------[ cut here ]------------
[   84.246648] WARNING: CPU: 0 PID: 2271 at kernel/workqueue.c:1789 create_worker+0x1a8/0x2c0
[   84.246652] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvsc
si ibmveth scsi_transport_srp
[   84.246693] CPU: 0 PID: 2271 Comm: mount Tainted: G        W       4.12.0-rc7.debug+ #30
[   84.246698] task: c0000003f7eae680 task.stack: c0000003f4994000
[   84.246702] NIP: c000000000114f08 LR: c000000000114ef8 CTR: c000000000134380
[   84.246706] REGS: c0000003f49974b0 TRAP: 0700   Tainted: G        W        (4.12.0-rc7.debug+)
[   84.246711] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   84.246734]   CR: 28028888  XER: 00000001
[   84.246739] CFAR: c000000000581524 SOFTE: 1
[   84.246739] GPR00: c000000000114ea0 c0000003f4997730 c000000001751800 0000000000000001
[   84.246739] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   84.246739] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   84.246739] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd236000
[   84.246739] GPR16: c0000003ec8b0400 0000000000000000 0000000000000000 0000000000000002
[   84.246739] GPR20: 0000000000000000 c0000000fcd4f160 c0000003fd0387a0 c00000000179a294
[   84.246739] GPR24: c0000000fcd4f000 c00000000179ac70 c000000001935218 c0000003ef6df4a8
[   84.246739] GPR28: c0000003f4997790 00000000000000a0 c0000003fd25bc40 c0000003ef6df000
[   84.246823] NIP [c000000000114f08] create_worker+0x1a8/0x2c0
[   84.246828] LR [c000000000114ef8] create_worker+0x198/0x2c0
[   84.246831] Call Trace:
[   84.246835] [c0000003f4997730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   84.246843] [c0000003f49977d0] [c00000000011a4b8] alloc_unbound_pwq+0x4c8/0x620
[   84.246850] [c0000003f4997830] [c00000000011aa64] apply_wqattrs_prepare+0x1f4/0x340
[   84.246856] [c0000003f49978a0] [c00000000011abec] apply_workqueue_attrs_locked+0x3c/0xa0
[   84.246863] [c0000003f49978d0] [c00000000011b134] apply_workqueue_attrs+0x54/0x90
[   84.246869] [c0000003f4997910] [c00000000011d714] __alloc_workqueue_key+0x184/0x5b0
[   84.246885] [c0000003f49979d0] [d0000000149f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   84.246892] [c0000003f4997b10] [c0000000003910bc] mount_bdev+0x22c/0x260
[   84.246907] [c0000003f4997bb0] [d0000000149e9020] ext4_mount+0x20/0x40 [ext4]
[   84.246914] [c0000003f4997bd0] [c000000000392504] mount_fs+0x74/0x210
[   84.246920] [c0000003f4997c80] [c0000000003c07c8] vfs_kern_mount+0x78/0x220
[   84.246926] [c0000003f4997d00] [c0000000003c6184] do_mount+0x254/0xf70
[   84.246932] [c0000003f4997de0] [c0000000003c72c4] SyS_mount+0x94/0x100
[   84.246939] [c0000003f4997e30] [c00000000000b190] system_call+0x38/0xe0
[   84.246944] Instruction dump:
[   84.246949] 3d420005 394a8a94 e93f04b8 38a00000 83aa0000 e8690008 7fa4eb78 4846c675
[   84.246965] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 7fe4fb78 7fc3f378 4bfffd75
[   84.246981] ---[ end trace ad05638ce2893be1 ]---
[   84.247090] select_task_rq: CPU 160 out of range for task c0000003efbb4980 (kworker/u321:0)
[   84.247243] p->cpus_allowed:
[   84.247248] CPU: 0 PID: 2271 Comm: mount Tainted: G        W       4.12.0-rc7.debug+ #30
[   84.247252] Call Trace:
[   84.247259] [c0000003f4997620] [c000000000a55fc8] dump_stack+0xe8/0x154 (unreliable)
[   84.247268] [c0000003f4997660] [c000000000135acc] try_to_wake_up+0x1bc/0x940
[   84.247275] [c0000003f4997730] [c000000000114f44] create_worker+0x1e4/0x2c0
[   84.247281] [c0000003f49977d0] [c00000000011a4b8] alloc_unbound_pwq+0x4c8/0x620
[   84.247288] [c0000003f4997830] [c00000000011aa64] apply_wqattrs_prepare+0x1f4/0x340
[   84.247295] [c0000003f49978a0] [c00000000011abec] apply_workqueue_attrs_locked+0x3c/0xa0
[   84.247301] [c0000003f49978d0] [c00000000011b134] apply_workqueue_attrs+0x54/0x90
[   84.247308] [c0000003f4997910] [c00000000011d714] __alloc_workqueue_key+0x184/0x5b0
[   84.247325] [c0000003f49979d0] [d0000000149f1768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   84.247332] [c0000003f4997b10] [c0000000003910bc] mount_bdev+0x22c/0x260
[   84.247348] [c0000003f4997bb0] [d0000000149e9020] ext4_mount+0x20/0x40 [ext4]
[   84.247354] [c0000003f4997bd0] [c000000000392504] mount_fs+0x74/0x210
[   84.247360] [c0000003f4997c80] [c0000000003c07c8] vfs_kern_mount+0x78/0x220
[   84.247367] [c0000003f4997d00] [c0000000003c6184] do_mount+0x254/0xf70
[   84.247373] [c0000003f4997de0] [c0000000003c72c4] SyS_mount+0x94/0x100
[   84.247380] [c0000003f4997e30] [c00000000000b190] system_call+0x38/0xe0
[   84.258971] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-29 13:59               ` Eryu Guan
@ 2017-06-29 14:24                 ` Tejun Heo
  2017-06-30  1:08                   ` Michael Ellerman
  2017-06-30 10:07                 ` Michael Ellerman
  1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2017-06-29 14:24 UTC (permalink / raw)
  To: Eryu Guan
  Cc: Michael Ellerman, Balbir Singh, liwan,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Hello,

Could be the same problem as the one reported in the following thread.

 http://lkml.kernel.org/r/1497266622.15415.39.camel@abdul.in.ibm.com

The root cause there is ppc arch code not setting up possible cpu <->
numa mapping during boot.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-29 14:24                 ` Tejun Heo
@ 2017-06-30  1:08                   ` Michael Ellerman
  2017-06-30 11:56                     ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-06-30  1:08 UTC (permalink / raw)
  To: Tejun Heo, Eryu Guan
  Cc: Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Tejun Heo <tj@kernel.org> writes:

> Hello,
>
> Could be the same problem as the one reported in the following thread.
>
>  http://lkml.kernel.org/r/1497266622.15415.39.camel@abdul.in.ibm.com
>
> The root cause there is ppc arch code not setting up possible cpu <->
> numa mapping during boot.

Huh?

You changed the workqueue code to avoid that in 2186d9f940b6
("workqueue: move wq_numa_init() to workqueue_init()"), didn't you?

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-29 13:59               ` Eryu Guan
  2017-06-29 14:24                 ` Tejun Heo
@ 2017-06-30 10:07                 ` Michael Ellerman
  2017-06-30 11:47                   ` Eryu Guan
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-06-30 10:07 UTC (permalink / raw)
  To: Eryu Guan
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Eryu Guan <eguan@redhat.com> writes:
>
> I have to update the patch a bit to make it compile.

Sure.

>> +	WARN_ON(cpumask_empty(worker->task->cpus_allowed));
>> +	WARN_ON(cpumask_empty(pool->attrs->cpumask));
>
> Seems only the last two WARN_ON were triggered.

OK thanks.

Can you try this patch and see if it changes anything? (with the debug
still applied).

We've been trying to reproduce the bug here but haven't had any luck so far.

cheers

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 4640f6d64f8b..b310ecc07e00 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -733,6 +733,8 @@ void __init setup_per_cpu_areas(void)
 	for_each_possible_cpu(cpu) {
                 __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
 		paca[cpu].data_offset = __per_cpu_offset[cpu];
+
+		set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
 	}
 }
 #endif

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-30 10:07                 ` Michael Ellerman
@ 2017-06-30 11:47                   ` Eryu Guan
  2017-07-04  6:26                     ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-06-30 11:47 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> >
> > I have to update the patch a bit to make it compile.
> 
> Sure.
> 
> >> +	WARN_ON(cpumask_empty(worker->task->cpus_allowed));
> >> +	WARN_ON(cpumask_empty(pool->attrs->cpumask));
> >
> > Seems only the last two WARN_ON were triggered.
> 
> OK thanks.
> 
> Can you try this patch and see if it changes anything? (with the debug
> still applied).

This patch fixes the crash for me. After appliying this patch (with all
other debug patches still applied), kernel didn't print any warnings or
calltraces or debug messages.

> 
> We've been trying to reproduce the bug here but haven't had any luck so far.

I'm using this reproducer:
for i in `seq 5`; do
	mkfs -t ext4 -F /dev/sda5 && sleep 3 && mount /dev/sda5 /mnt/ext4 && umount /dev/sda5
done

Thanks,
Eryu

> 
> cheers
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 4640f6d64f8b..b310ecc07e00 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -733,6 +733,8 @@ void __init setup_per_cpu_areas(void)
>  	for_each_possible_cpu(cpu) {
>                  __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
>  		paca[cpu].data_offset = __per_cpu_offset[cpu];
> +
> +		set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
>  	}
>  }
>  #endif

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-30  1:08                   ` Michael Ellerman
@ 2017-06-30 11:56                     ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2017-06-30 11:56 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Eryu Guan, Balbir Singh, liwan,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Hello, Michael.

On Fri, Jun 30, 2017 at 11:08:22AM +1000, Michael Ellerman wrote:
> Tejun Heo <tj@kernel.org> writes:
> 
> > Could be the same problem as the one reported in the following thread.
> >
> >  http://lkml.kernel.org/r/1497266622.15415.39.camel@abdul.in.ibm.com
> >
> > The root cause there is ppc arch code not setting up possible cpu <->
> > numa mapping during boot.
> 
> Huh?
> 
> You changed the workqueue code to avoid that in 2186d9f940b6
> ("workqueue: move wq_numa_init() to workqueue_init()"), didn't you?

That was a different issue.  This one is cpu <-> numa node mapping not
being stable across cpu hotplug.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-06-30 11:47                   ` Eryu Guan
@ 2017-07-04  6:26                     ` Michael Ellerman
  2017-07-04  8:21                       ` Eryu Guan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-07-04  6:26 UTC (permalink / raw)
  To: Eryu Guan
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Eryu Guan <eguan@redhat.com> writes:
> On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
>> 
>> Can you try this patch and see if it changes anything? (with the debug
>> still applied).
>
> This patch fixes the crash for me. After appliying this patch (with all
> other debug patches still applied), kernel didn't print any warnings or
> calltraces or debug messages.

OK. It's not meant to fix it :)

I can't form any connection between your bisection result and that
patch, nothing is making any sense TBH.

What hardware are you on? And are you doing CPU hotplug or anything like that?

Can you back out the last patch I sent and try this?

cheers


diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..8ec3841f9689 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3338,6 +3338,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 
 	lockdep_assert_held(&wq_pool_mutex);
 
+	WARN_ON(cpumask_empty(attrs->cpumask));
+
 	/* do we already have a matching pool? */
 	hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
 		if (wqattrs_equal(pool->attrs, attrs)) {
@@ -3366,6 +3368,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	copy_workqueue_attrs(pool->attrs, attrs);
 	pool->node = target_node;
 
+	WARN_ON(cpumask_empty(pool->attrs->cpumask));
+
 	/*
 	 * no_numa isn't a worker_pool attribute, always clear it.  See
 	 * 'struct workqueue_attrs' comments for detail.
@@ -5494,6 +5498,7 @@ static void __init wq_numa_init(void)
 
 	for_each_possible_cpu(cpu) {
 		node = cpu_to_node(cpu);
+		printk("%s: setting cpu %d on node %d present? %d\n", __func__, cpu, node, cpu_present(cpu));
 		if (WARN_ON(node == NUMA_NO_NODE)) {
 			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
 			/* happens iff arch is bonkers, let's just proceed */
@@ -5502,6 +5507,16 @@ static void __init wq_numa_init(void)
 		cpumask_set_cpu(cpu, tbl[node]);
 	}
 
+	for_each_possible_cpu(cpu) {
+		struct worker_pool *pool;
+
+		for_each_cpu_worker_pool(pool, cpu) {
+			if (cpumask_empty(pool->attrs->cpumask))
+				printk("%s: cpumask EMPTY! for pool %p on cpu %d\n", __func__, pool, cpu);
+			printk("%s: pool %p on cpu %d node = %d\n", __func__, pool, cpu, pool->node);
+		}
+	}
+
 	wq_numa_possible_cpumask = tbl;
 	wq_numa_enabled = true;
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-07-04  6:26                     ` Michael Ellerman
@ 2017-07-04  8:21                       ` Eryu Guan
  2017-07-04 11:06                         ` Michael Ellerman
  0 siblings, 1 reply; 23+ messages in thread
From: Eryu Guan @ 2017-07-04  8:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

[-- Attachment #1: Type: text/plain, Size: 17491 bytes --]

On Tue, Jul 04, 2017 at 04:26:11PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> > On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
> >> 
> >> Can you try this patch and see if it changes anything? (with the debug
> >> still applied).
> >
> > This patch fixes the crash for me. After appliying this patch (with all
> > other debug patches still applied), kernel didn't print any warnings or
> > calltraces or debug messages.
> 
> OK. It's not meant to fix it :)

Understand.

> 
> I can't form any connection between your bisection result and that
> patch, nothing is making any sense TBH.
> 
> What hardware are you on? And are you doing CPU hotplug or anything like that?

It's a "PowerVM" guest (I'm not familiar with powerpc, I don't know what
does that mean..) running on Power8 host. I didn't do any CPU hotplug or
anything like that.

lscpu output:
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          3
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
NUMA node2 CPU(s):     8-15
NUMA node3 CPU(s):

> 
> Can you back out the last patch I sent and try this?

I appended the calltraces from the test here, I also attached full dmesg
log, which included the boot log.

[   74.410871] ------------[ cut here ]------------
[   74.410895] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3346 alloc_unbound_pwq+0x320/0x690
[   74.410901] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.410949] CPU: 0 PID: 2378 Comm: mount Not tainted 4.12.0.debug+ #35
[   74.410954] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.410959] NIP: c00000000011a310 LR: c00000000011a300 CTR: c00000000011a1e4
[   74.410963] REGS: c0000003f039f550 TRAP: 0700   Not tainted  (4.12.0.debug+)
[   74.410968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.410993]   CR: 24028888  XER: 00000001
[   74.410998] CFAR: c000000000581584 SOFTE: 1
[   74.410998] GPR00: c00000000011a590 c0000003f039f7d0 c000000001751800 0000000000000001
[   74.410998] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
[   74.410998] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000030
[   74.410998] GPR12: 0000000000000001 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.410998] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.410998] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.410998] GPR24: c0000003cb7ac400 c0000003f02349c0 00000000000000a0 c0000003f0234a00
[   74.410998] GPR28: 000000006ca6897b c0000003cb7ac400 c00000000179a294 0000000000000000
[   74.411082] NIP [c00000000011a310] alloc_unbound_pwq+0x320/0x690
[   74.411087] LR [c00000000011a300] alloc_unbound_pwq+0x310/0x690
[   74.411091] Call Trace:
[   74.411095] [c0000003f039f7d0] [c00000000011a590] alloc_unbound_pwq+0x5a0/0x690 (unreliable)
[   74.411103] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411113] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411120] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411127] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411145] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411152] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411168] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411174] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411181] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411188] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411194] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411201] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.411206] Instruction dump:
[   74.411211] 554ac03e 7f8ae050 7b9c0020 2fac0000 409e0290 7f44d378 38a00000 484672cd
[   74.411227] 60000000 7c63d278 7c630074 7863d182 <0b030000> 3ca061c8 3f42001e 60a58647
[   74.411243] ---[ end trace b720011b125c3341 ]---
[   74.411253] ------------[ cut here ]------------
[   74.411258] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3376 alloc_unbound_pwq+0x4b0/0x690
[   74.411262] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.411303] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.411307] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.411312] NIP: c00000000011a4a0 LR: c00000000011a490 CTR: 0000000000000000
[   74.411316] REGS: c0000003f039f550 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.411320] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.411343]   CR: 28028888  XER: 20000001
[   74.411348] CFAR: c000000000581584 SOFTE: 1
[   74.411348] GPR00: c00000000011a474 c0000003f039f7d0 c000000001751800 0000000000000001
[   74.411348] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
[   74.411348] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
[   74.411348] GPR12: 0000000000008800 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.411348] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.411348] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.411348] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f0234a00
[   74.411348] GPR28: 00000000000000a0 c0000003cb7ac400 c00000000179a294 00000000000000a0
[   74.411431] NIP [c00000000011a4a0] alloc_unbound_pwq+0x4b0/0x690
[   74.411436] LR [c00000000011a490] alloc_unbound_pwq+0x4a0/0x690
[   74.411440] Call Trace:
[   74.411444] [c0000003f039f7d0] [c00000000011a474] alloc_unbound_pwq+0x484/0x690 (unreliable)
[   74.411452] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411459] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411465] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411472] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411488] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411494] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411510] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411516] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411523] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411529] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411535] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411542] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.411547] Instruction dump:
[   74.411552] 4bffa3b9 93f9004c e93904b8 83fe0000 38a00000 7fe4fb78 e8690008 4846713d
[   74.411567] 60000000 7fe31a78 7c630074 7863d182 <0b030000> e93904b8 39400000 7f23cb78
[   74.411584] ---[ end trace b720011b125c3342 ]---
[   74.411704] ------------[ cut here ]------------
[   74.411710] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1788 create_worker+0x174/0x2c0
[   74.411714] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.411755] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.411759] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.411763] NIP: c000000000114ed4 LR: c000000000114ec4 CTR: c0000000001343e0
[   74.411768] REGS: c0000003f039f4b0 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.411772] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.411795]   CR: 28028888  XER: 00000001
[   74.411801] CFAR: c000000000581584 SOFTE: 1
[   74.411801] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001
[   74.411801] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   74.411801] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   74.411801] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.411801] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.411801] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.411801] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8
[   74.411801] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800
[   74.411884] NIP [c000000000114ed4] create_worker+0x174/0x2c0
[   74.411888] LR [c000000000114ec4] create_worker+0x164/0x2c0
[   74.411892] Call Trace:
[   74.411895] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   74.411903] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.411910] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411916] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411923] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411929] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411946] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411952] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411968] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411974] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411980] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411986] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411993] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411999] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.412004] Instruction dump:
[   74.412009] 3d220005 39298a94 e87e0040 38a00000 83a90000 38630380 7fa4eb78 4846c709
[   74.412025] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 3d420005 394a8a94 e93f04b8
[   74.412041] ---[ end trace b720011b125c3343 ]---
[   74.412046] ------------[ cut here ]------------
[   74.412051] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1789 create_worker+0x1a8/0x2c0
[   74.412055] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.412095] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.412099] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.412103] NIP: c000000000114f08 LR: c000000000114ef8 CTR: c0000000001343e0
[   74.412108] REGS: c0000003f039f4b0 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.412144] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.412167]   CR: 28028888  XER: 00000001
[   74.412172] CFAR: c000000000581584 SOFTE: 1
[   74.412172] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001
[   74.412172] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   74.412172] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   74.412172] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.412172] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.412172] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.412172] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8
[   74.412172] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800
[   74.412255] NIP [c000000000114f08] create_worker+0x1a8/0x2c0
[   74.412259] LR [c000000000114ef8] create_worker+0x198/0x2c0
[   74.412263] Call Trace:
[   74.412267] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   74.412275] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.412281] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.412288] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.412294] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.412301] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.412317] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.412323] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.412339] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.412345] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.412352] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.412358] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.412364] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.412371] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.412376] Instruction dump:
[   74.412380] 3d420005 394a8a94 e93f04b8 38a00000 83aa0000 e8690008 7fa4eb78 4846c6d5
[   74.412396] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 7fe4fb78 7fc3f378 4bfffd75
[   74.412412] ---[ end trace b720011b125c3344 ]---
[   74.412524] select_task_rq: CPU 160 out of range for task c0000003f1500000 (kworker/u321:0)
[   74.412612] p->cpus_allowed:
[   74.412616] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.412620] Call Trace:
[   74.412625] [c0000003f039f620] [c000000000a562a8] dump_stack+0xe8/0x154 (unreliable)
[   74.412635] [c0000003f039f660] [c000000000135b2c] try_to_wake_up+0x1bc/0x940
[   74.412641] [c0000003f039f730] [c000000000114f44] create_worker+0x1e4/0x2c0
[   74.412647] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.412654] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.412660] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.412667] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.412673] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.412689] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.412700] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.412715] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.412722] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.412728] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.412734] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.412740] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.412749] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.420022] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

Thanks,
Eryu
> 
> cheers
> 
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39ef764..8ec3841f9689 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3338,6 +3338,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  
>  	lockdep_assert_held(&wq_pool_mutex);
>  
> +	WARN_ON(cpumask_empty(attrs->cpumask));
> +
>  	/* do we already have a matching pool? */
>  	hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
>  		if (wqattrs_equal(pool->attrs, attrs)) {
> @@ -3366,6 +3368,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  	copy_workqueue_attrs(pool->attrs, attrs);
>  	pool->node = target_node;
>  
> +	WARN_ON(cpumask_empty(pool->attrs->cpumask));
> +
>  	/*
>  	 * no_numa isn't a worker_pool attribute, always clear it.  See
>  	 * 'struct workqueue_attrs' comments for detail.
> @@ -5494,6 +5498,7 @@ static void __init wq_numa_init(void)
>  
>  	for_each_possible_cpu(cpu) {
>  		node = cpu_to_node(cpu);
> +		printk("%s: setting cpu %d on node %d present? %d\n", __func__, cpu, node, cpu_present(cpu));
>  		if (WARN_ON(node == NUMA_NO_NODE)) {
>  			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
>  			/* happens iff arch is bonkers, let's just proceed */
> @@ -5502,6 +5507,16 @@ static void __init wq_numa_init(void)
>  		cpumask_set_cpu(cpu, tbl[node]);
>  	}
>  
> +	for_each_possible_cpu(cpu) {
> +		struct worker_pool *pool;
> +
> +		for_each_cpu_worker_pool(pool, cpu) {
> +			if (cpumask_empty(pool->attrs->cpumask))
> +				printk("%s: cpumask EMPTY! for pool %p on cpu %d\n", __func__, pool, cpu);
> +			printk("%s: pool %p on cpu %d node = %d\n", __func__, pool, cpu, pool->node);
> +		}
> +	}
> +
>  	wq_numa_possible_cpumask = tbl;
>  	wq_numa_enabled = true;
>  }

[-- Attachment #2: dmesg.log.bz2 --]
[-- Type: application/x-bzip2, Size: 15301 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-07-04  8:21                       ` Eryu Guan
@ 2017-07-04 11:06                         ` Michael Ellerman
  2017-07-04 12:12                           ` Eryu Guan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2017-07-04 11:06 UTC (permalink / raw)
  To: Eryu Guan
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

Eryu Guan <eguan@redhat.com> writes:

> On Tue, Jul 04, 2017 at 04:26:11PM +1000, Michael Ellerman wrote:
>> Eryu Guan <eguan@redhat.com> writes:
>> > On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
>> >> 
>> >> Can you try this patch and see if it changes anything? (with the debug
>> >> still applied).
>> >
>> > This patch fixes the crash for me. After appliying this patch (with all
>> > other debug patches still applied), kernel didn't print any warnings or
>> > calltraces or debug messages.
>> 
>> OK. It's not meant to fix it :)
>
> Understand.
>
>> 
>> I can't form any connection between your bisection result and that
>> patch, nothing is making any sense TBH.
>> 
>> What hardware are you on? And are you doing CPU hotplug or anything like that?
>
> It's a "PowerVM" guest (I'm not familiar with powerpc, I don't know what
> does that mean..) running on Power8 host. I didn't do any CPU hotplug or
> anything like that.

OK thanks.

We might have to try and sync up on irc so we can debug this a bit faster.

Can you try this hunk also?

cheers

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..7c55721b1f1d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3902,6 +3906,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 		     "ordering guarantee broken for workqueue %s\n", wq->name);
 		return ret;
 	} else {
+		WARN_ON(cpumask_empty(unbound_std_wq_attrs[highpri]->cpumask));
 		return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
 	}
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)
  2017-07-04 11:06                         ` Michael Ellerman
@ 2017-07-04 12:12                           ` Eryu Guan
  0 siblings, 0 replies; 23+ messages in thread
From: Eryu Guan @ 2017-07-04 12:12 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: tj, Balbir Singh, liwan, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)

On Tue, Jul 04, 2017 at 09:06:55PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan@redhat.com> writes:
> 
> > On Tue, Jul 04, 2017 at 04:26:11PM +1000, Michael Ellerman wrote:
> >> Eryu Guan <eguan@redhat.com> writes:
> >> > On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
> >> >> 
> >> >> Can you try this patch and see if it changes anything? (with the debug
> >> >> still applied).
> >> >
> >> > This patch fixes the crash for me. After appliying this patch (with all
> >> > other debug patches still applied), kernel didn't print any warnings or
> >> > calltraces or debug messages.
> >> 
> >> OK. It's not meant to fix it :)
> >
> > Understand.
> >
> >> 
> >> I can't form any connection between your bisection result and that
> >> patch, nothing is making any sense TBH.
> >> 
> >> What hardware are you on? And are you doing CPU hotplug or anything like that?
> >
> > It's a "PowerVM" guest (I'm not familiar with powerpc, I don't know what
> > does that mean..) running on Power8 host. I didn't do any CPU hotplug or
> > anything like that.
> 
> OK thanks.
> 
> We might have to try and sync up on irc so we can debug this a bit faster.

Sure, where can I find you? I'm in #xfs at freenode, nick eguan. But
maybe tomorrow, I have to take off today.

> 
> Can you try this hunk also?

This new WARN_ON didn't trigger (I skipped the other warning messages,
they're the same warnings as in my last reply).

Thanks,
Eryu

> 
> cheers
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39ef764..7c55721b1f1d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3902,6 +3906,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
>  		     "ordering guarantee broken for workqueue %s\n", wq->name);
>  		return ret;
>  	} else {
> +		WARN_ON(cpumask_empty(unbound_std_wq_attrs[highpri]->cpumask));
>  		return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
>  	}
>  }
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-07-04 12:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28  8:32 [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Eryu Guan
2017-06-28 17:16 ` Balbir Singh
2017-06-29  3:41   ` Eryu Guan
2017-06-29  8:47     ` Balbir Singh
2017-06-29  9:04       ` Eryu Guan
2017-06-29 10:05       ` Eryu Guan
2017-06-29 11:12         ` Michael Ellerman
2017-06-29 11:39           ` Eryu Guan
2017-06-29 12:06             ` kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host) Michael Ellerman
2017-06-29 13:59               ` Eryu Guan
2017-06-29 14:24                 ` Tejun Heo
2017-06-30  1:08                   ` Michael Ellerman
2017-06-30 11:56                     ` Tejun Heo
2017-06-30 10:07                 ` Michael Ellerman
2017-06-30 11:47                   ` Eryu Guan
2017-07-04  6:26                     ` Michael Ellerman
2017-07-04  8:21                       ` Eryu Guan
2017-07-04 11:06                         ` Michael Ellerman
2017-07-04 12:12                           ` Eryu Guan
2017-06-29  4:54 ` [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host Michael Ellerman
2017-06-29 10:27 ` Michael Ellerman
2017-06-29 10:33   ` Eryu Guan
2017-06-29 12:13     ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.