Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
@ 2019-08-22 21:33 Qian Cai
  2019-08-23 11:37 ` Will Deacon
  2019-09-05 21:08 ` Qian Cai
  0 siblings, 2 replies; 21+ messages in thread
From: Qian Cai @ 2019-08-22 21:33 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Dan Williams, linux-mm, linux-kernel, linux-arm-kernel, Peter Zijlstra

https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config

Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
CONFIG_PROVE_LOCKING=y results in hanging.

[1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
tgit@dwillia2-desk3.amr.corp.intel.com/

...
[  125.142689][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
[  125.149687][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
(features 0x0000170d)
[  125.165198][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
for cmdq
[  125.239425][ [  125.251484][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: option
mask 0x2
[  125.258233][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
(features 0x0000170d)
[  125.282750][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
for cmdq
[  125.320097][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
for evtq
[  125.332667][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
[  125.339427][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
(features 0x0000170d)
[  125.354846][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
for cmdq
[  125.375295][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
for evtq
[  125.387371][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
[  125.393955][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
(features 0x0000170d)
[  125.522605][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
for cmdq
[  125.543338][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
for evtq
[  126.694742][    T1] EFI Variables Facility v0.08 2004-May-17
[  126.799291][    T1] NET: Registered protocol family 17
[  126.978632][    T1] zswap: loaded using pool lzo/zbud
[  126.989168][    T1] kmemleak: Kernel memory leak detector initialized
[  126.989191][ T1577] kmemleak: Automatic memory scanning thread started
[  127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
[  127.388074][    T1] Freeing unused kernel memory: 22528K
[  133.527005][    T1] Checked W+X mappings: passed, no W+X pages found
[  133.533474][    T1] Run /init as init process
[  133.727196][    T1] systemd[1]: System time before build time, advancing
clock.
[  134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
left
[  134.764026][    T1] systemd[1]: systemd 239 running in system mode. (+PAM
+AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
+GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
hierarchy=legacy)
[  134.799044][    T1] systemd[1]: Detected architecture arm64.
[  134.804818][    T1] systemd[1]: Running in initial RAM disk.
<...hang...>

Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
it to continue successfully.


[  121.093846][    T1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
[  123.157524][    T1] random: systemd: uninitialized urandom read (16 bytes
read)
[  123.168562][    T1] systemd[1]: Listening on Journal Socket.
[  OK  ] Listening on Journal Socket.
[  123.203932][    T1] random: systemd: uninitialized urandom read (16 bytes
read)
[  123.212813][    T1] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-08-22 21:33 page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
@ 2019-08-23 11:37 ` Will Deacon
  2019-08-23 11:52   ` Qian Cai
  2019-09-05 21:08 ` Qian Cai
  1 sibling, 1 reply; 21+ messages in thread
From: Will Deacon @ 2019-08-23 11:37 UTC (permalink / raw)
  To: Qian Cai
  Cc: Catalin Marinas, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra

On Thu, Aug 22, 2019 at 05:33:23PM -0400, Qian Cai wrote:
> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> 
> Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> CONFIG_PROVE_LOCKING=y results in hanging.

Hmm, but the config you link to above has:

# CONFIG_PROVE_LOCKING is not set

so I'm confused. Also, which tree is this?

Will


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-08-23 11:37 ` Will Deacon
@ 2019-08-23 11:52   ` Qian Cai
  0 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-08-23 11:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra

On Fri, 2019-08-23 at 12:37 +0100, Will Deacon wrote:
> On Thu, Aug 22, 2019 at 05:33:23PM -0400, Qian Cai wrote:
> > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> > 
> > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> > CONFIG_PROVE_LOCKING=y results in hanging.
> 
> Hmm, but the config you link to above has:
> 
> # CONFIG_PROVE_LOCKING is not set
> 
> so I'm confused. Also, which tree is this?

I manually turn on CONFIG_PROVE_LOCKING=y on the top of that, and reproduce on
both the mainline and linux-next trees.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-08-22 21:33 page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
  2019-08-23 11:37 ` Will Deacon
@ 2019-09-05 21:08 ` Qian Cai
  2019-09-10 15:22   ` Qian Cai
  1 sibling, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-09-05 21:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Dan Williams, linux-mm, linux-kernel, linux-arm-kernel,
	Peter Zijlstra, Waiman Long, Thomas Gleixner

Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it
will also fix it.

On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote:
> https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> 
> Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> CONFIG_PROVE_LOCKING=y results in hanging.
> 
> [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
> tgit@dwillia2-desk3.amr.corp.intel.com/
> 
> ...
> [  125.142689][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
> [  125.149687][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [  125.165198][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
> for cmdq
> [  125.239425][ [  125.251484][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: option
> mask 0x2
> [  125.258233][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [  125.282750][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> for cmdq
> [  125.320097][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> for evtq
> [  125.332667][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
> [  125.339427][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [  125.354846][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> for cmdq
> [  125.375295][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> for evtq
> [  125.387371][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
> [  125.393955][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
> (features 0x0000170d)
> [  125.522605][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> for cmdq
> [  125.543338][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> for evtq
> [  126.694742][    T1] EFI Variables Facility v0.08 2004-May-17
> [  126.799291][    T1] NET: Registered protocol family 17
> [  126.978632][    T1] zswap: loaded using pool lzo/zbud
> [  126.989168][    T1] kmemleak: Kernel memory leak detector initialized
> [  126.989191][ T1577] kmemleak: Automatic memory scanning thread started
> [  127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
> [  127.388074][    T1] Freeing unused kernel memory: 22528K
> [  133.527005][    T1] Checked W+X mappings: passed, no W+X pages found
> [  133.533474][    T1] Run /init as init process
> [  133.727196][    T1] systemd[1]: System time before build time, advancing
> clock.
> [  134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
> left
> [  134.764026][    T1] systemd[1]: systemd 239 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
> +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
> hierarchy=legacy)
> [  134.799044][    T1] systemd[1]: Detected architecture arm64.
> [  134.804818][    T1] systemd[1]: Running in initial RAM disk.
> <...hang...>
> 
> Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
> it to continue successfully.
> 
> 
> [  121.093846][    T1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
> [  123.157524][    T1] random: systemd: uninitialized urandom read (16 bytes
> read)
> [  123.168562][    T1] systemd[1]: Listening on Journal Socket.
> [  OK  ] Listening on Journal Socket.
> [  123.203932][    T1] random: systemd: uninitialized urandom read (16 bytes
> read)
> [  123.212813][    T1] systemd[1]: Listening on udev Kernel Socket.
> [  OK  ] Listening on udev Kernel Socket.
> ...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-05 21:08 ` Qian Cai
@ 2019-09-10 15:22   ` Qian Cai
  2019-09-10 19:49     ` Qian Cai
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Qian Cai @ 2019-09-10 15:22 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Catalin Marinas, Will Deacon
  Cc: Sergey Senozhatsky, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner

On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote:
> Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it
> will also fix it.
> 
> On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote:
> > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> > 
> > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> > CONFIG_PROVE_LOCKING=y results in hanging.
> > 
> > [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
> > tgit@dwillia2-desk3.amr.corp.intel.com/
> > 
> > ...
> > [  125.142689][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
> > [  125.149687][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
> > (features 0x0000170d)
> > [  125.165198][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
> > for cmdq
> > [  125.239425][ [  125.251484][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: option
> > mask 0x2
> > [  125.258233][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
> > (features 0x0000170d)
> > [  125.282750][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > for cmdq
> > [  125.320097][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > for evtq
> > [  125.332667][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
> > [  125.339427][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> > (features 0x0000170d)
> > [  125.354846][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > for cmdq
> > [  125.375295][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > for evtq
> > [  125.387371][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
> > [  125.393955][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
> > (features 0x0000170d)
> > [  125.522605][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > for cmdq
> > [  125.543338][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > for evtq
> > [  126.694742][    T1] EFI Variables Facility v0.08 2004-May-17
> > [  126.799291][    T1] NET: Registered protocol family 17
> > [  126.978632][    T1] zswap: loaded using pool lzo/zbud
> > [  126.989168][    T1] kmemleak: Kernel memory leak detector initialized
> > [  126.989191][ T1577] kmemleak: Automatic memory scanning thread started
> > [  127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
> > [  127.388074][    T1] Freeing unused kernel memory: 22528K
> > [  133.527005][    T1] Checked W+X mappings: passed, no W+X pages found
> > [  133.533474][    T1] Run /init as init process
> > [  133.727196][    T1] systemd[1]: System time before build time, advancing
> > clock.
> > [  134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
> > left
> > [  134.764026][    T1] systemd[1]: systemd 239 running in system mode. (+PAM
> > +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
> > +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
> > hierarchy=legacy)
> > [  134.799044][    T1] systemd[1]: Detected architecture arm64.
> > [  134.804818][    T1] systemd[1]: Running in initial RAM disk.
> > <...hang...>
> > 
> > Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
> > it to continue successfully.
> > 
> > 
> > [  121.093846][    T1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
> > [  123.157524][    T1] random: systemd: uninitialized urandom read (16 bytes
> > read)
> > [  123.168562][    T1] systemd[1]: Listening on Journal Socket.
> > [  OK  ] Listening on Journal Socket.
> > [  123.203932][    T1] random: systemd: uninitialized urandom read (16 bytes
> > read)
> > [  123.212813][    T1] systemd[1]: Listening on udev Kernel Socket.
> > [  OK  ] Listening on udev Kernel Socket.
> > ...

Not sure if the arm64 hang is just an effect of the potential console deadlock
below. The lockdep splat can be reproduced by set,

CONFIG_DEBUG_OBJECTS_TIMER=n (=y will lead to the hang above)
CONFIG_PROVE_LOCKING=y
CONFIG_SLAB_FREELIST_RANDOM=y (with page_alloc.shuffle=1)

while compiling kernels,

[ 1078.214683][T43784] WARNING: possible circular locking dependency detected
[ 1078.221550][T43784] 5.3.0-rc7-next-20190904 #14 Not tainted
[ 1078.227112][T43784] ------------------------------------------------------
[ 1078.233976][T43784] vi/43784 is trying to acquire lock:
[ 1078.239192][T43784] ffff008b7cff9290 (&(&zone->lock)->rlock){-.-.}, at:
rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1078.249111][T43784] 
[ 1078.249111][T43784] but task is already holding lock:
[ 1078.256322][T43784] ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
pty_write+0x78/0x100
[ 1078.264760][T43784] 
[ 1078.264760][T43784] which lock already depends on the new lock.
[ 1078.264760][T43784] 
[ 1078.275008][T43784] 
[ 1078.275008][T43784] the existing dependency chain (in reverse order) is:
[ 1078.283869][T43784] 
[ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
[ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.296394][T43784]        lock_acquire+0x154/0x428
[ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
[ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
[ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
[ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
[ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
[ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
[ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
[ 1078.337868][T43784]        __uart_start+0x90/0xa0
[ 1078.342563][T43784]        uart_write+0x15c/0x2c8
[ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
[ 1078.352304][T43784]        n_tty_write+0x300/0x668
[ 1078.357087][T43784]        tty_write+0x2e8/0x430
[ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
[ 1078.367086][T43784]        do_iter_write+0x228/0x270
[ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
[ 1078.376735][T43784]        do_writev+0xdc/0x180
[ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
[ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
[ 1078.391606][T43784]        el0_svc+0x8/0xc
[ 1078.395691][T43784] 
[ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
[ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.407604][T43784]        lock_acquire+0x154/0x428
[ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
[ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
[ 1078.422820][T43784]        console_unlock+0x3c4/0x898
[ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
[ 1078.432732][T43784]        vprintk_default+0x48/0x58
[ 1078.437688][T43784]        vprintk_func+0x194/0x250
[ 1078.442557][T43784]        printk+0xbc/0xec
[ 1078.446732][T43784]        register_console+0x4a8/0x580
[ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
[ 1078.457250][T43784]        pl011_register_port+0x98/0x128
[ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
[ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
[ 1078.473075][T43784]        really_probe+0x15c/0x5d8
[ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
[ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
[ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
[ 1078.494201][T43784]        __device_attach+0x164/0x240
[ 1078.499331][T43784]        device_initial_probe+0x24/0x30
[ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
[ 1078.509850][T43784]        device_add+0x63c/0x960
[ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
[ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
[ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
[ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
[ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
[ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
[ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
[ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
[ 1078.575058][T43784]        acpi_init+0x330/0x3b8
[ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
[ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
[ 1078.590360][T43784]        kernel_init+0x18/0x138
[ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
[ 1078.599835][T43784] 
[ 1078.599835][T43784] -> #1 (console_owner){-...}:
[ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.611661][T43784]        lock_acquire+0x154/0x428
[ 1078.616530][T43784]        console_unlock+0x298/0x898
[ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
[ 1078.626442][T43784]        vprintk_default+0x48/0x58
[ 1078.631398][T43784]        vprintk_func+0x194/0x250
[ 1078.636267][T43784]        printk+0xbc/0xec
[ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
[ 1078.646267][T43784]        get_random_u64+0x4c/0x100
[ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
[ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
[ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
[ 1078.667132][T43784]        __free_pages+0xc4/0x118
[ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
[ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
[ 1078.682522][T43784]        memblock_free_all+0x264/0x330
[ 1078.687825][T43784]        mem_init+0x90/0x148
[ 1078.692259][T43784]        start_kernel+0x368/0x684
[ 1078.697126][T43784] 
[ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
[ 1078.704604][T43784]        check_prev_add+0x120/0x1138
[ 1078.709733][T43784]        validate_chain+0x888/0x1270
[ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.719906][T43784]        lock_acquire+0x154/0x428
[ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
[ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
[ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
[ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
[ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
[ 1078.757725][T43784]        new_slab+0x98/0xc0
[ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
[ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
[ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
[ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
[ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
[ 1078.788849][T43784]        pty_write+0x98/0x100
[ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
[ 1078.798152][T43784]        tty_write+0x2e8/0x430
[ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
[ 1078.807368][T43784]        vfs_write+0xf0/0x230
[ 1078.811890][T43784]        ksys_write+0xd4/0x180
[ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
[ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
[ 1078.826756][T43784]        el0_svc+0x8/0xc
[ 1078.830842][T43784] 
[ 1078.830842][T43784] other info that might help us debug this:
[ 1078.830842][T43784] 
[ 1078.840918][T43784] Chain exists of:
[ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-
>lock)->rlock
[ 1078.840918][T43784] 
[ 1078.854731][T43784]  Possible unsafe locking scenario:
[ 1078.854731][T43784] 
[ 1078.862029][T43784]        CPU0                    CPU1
[ 1078.867243][T43784]        ----                    ----
[ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
[ 1078.877238][T43784]                                lock(&port_lock_key);
[ 1078.883929][T43784]                                lock(&(&port->lock)-
>rlock);
[ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
[ 1078.896010][T43784] 
[ 1078.896010][T43784]  *** DEADLOCK ***
[ 1078.896010][T43784] 
[ 1078.904004][T43784] 5 locks held by vi/43784:
[ 1078.908351][T43784]  #0: ffff000c36240890 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x44/0x50
[ 1078.917133][T43784]  #1: ffff000c36240918 (&tty->atomic_write_lock){+.+.},
at: tty_write_lock+0x24/0x60
[ 1078.926521][T43784]  #2: ffff000c36240aa0 (&o_tty->termios_rwsem/1){++++},
at: n_tty_write+0x108/0x668
[ 1078.935823][T43784]  #3: ffffa0001e0b2360 (&ldata->output_lock){+.+.}, at:
n_tty_write+0x1d0/0x668
[ 1078.944777][T43784]  #4: ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
pty_write+0x78/0x100
[ 1078.953644][T43784] 
[ 1078.953644][T43784] stack backtrace:
[ 1078.959382][T43784] CPU: 97 PID: 43784 Comm: vi Not tainted 5.3.0-rc7-next-
20190904 #14
[ 1078.967376][T43784] Hardware name: HPE Apollo
70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[ 1078.977799][T43784] Call trace:
[ 1078.980932][T43784]  dump_backtrace+0x0/0x228
[ 1078.985279][T43784]  show_stack+0x24/0x30
[ 1078.989282][T43784]  dump_stack+0xe8/0x13c
[ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
[ 1078.998240][T43784]  check_noncircular+0x268/0x310
[ 1079.003022][T43784]  check_prev_add+0x120/0x1138
[ 1079.007631][T43784]  validate_chain+0x888/0x1270
[ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
[ 1079.016763][T43784]  lock_acquire+0x154/0x428
[ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
[ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
[ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
[ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
[ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
[ 1079.050934][T43784]  new_slab+0x98/0xc0
[ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
[ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
[ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
[ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
[ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
[ 1079.078932][T43784]  pty_write+0x98/0x100
[ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
[ 1079.087193][T43784]  tty_write+0x2e8/0x430
[ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
[ 1079.095367][T43784]  vfs_write+0xf0/0x230
[ 1079.099368][T43784]  ksys_write+0xd4/0x180
[ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
[ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
[ 1079.112672][T43784]  el0_svc+0x8/0xc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-10 15:22   ` Qian Cai
@ 2019-09-10 19:49     ` Qian Cai
  2019-09-12  2:28       ` CONFIG_SHUFFLE_PAGE_ALLOCATOR=y lockdep splat (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
  2019-09-10 20:35     ` page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
  2019-09-11  1:10     ` Sergey Senozhatsky
  2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-09-10 19:49 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Catalin Marinas, Will Deacon,
	Theodore Ts'o
  Cc: Sergey Senozhatsky, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner

On Tue, 2019-09-10 at 11:22 -0400, Qian Cai wrote:
> On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote:
> > Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it
> > will also fix it.
> > 
> > On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote:
> > > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> > > 
> > > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> > > CONFIG_PROVE_LOCKING=y results in hanging.
> > > 
> > > [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
> > > tgit@dwillia2-desk3.amr.corp.intel.com/
> > > 
> > > ...
> > > [  125.142689][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
> > > [  125.149687][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.165198][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.239425][ [  125.251484][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: option
> > > mask 0x2
> > > [  125.258233][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.282750][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.320097][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > > for evtq
> > > [  125.332667][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
> > > [  125.339427][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.354846][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.375295][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > > for evtq
> > > [  125.387371][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
> > > [  125.393955][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.522605][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.543338][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > > for evtq
> > > [  126.694742][    T1] EFI Variables Facility v0.08 2004-May-17
> > > [  126.799291][    T1] NET: Registered protocol family 17
> > > [  126.978632][    T1] zswap: loaded using pool lzo/zbud
> > > [  126.989168][    T1] kmemleak: Kernel memory leak detector initialized
> > > [  126.989191][ T1577] kmemleak: Automatic memory scanning thread started
> > > [  127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
> > > [  127.388074][    T1] Freeing unused kernel memory: 22528K
> > > [  133.527005][    T1] Checked W+X mappings: passed, no W+X pages found
> > > [  133.533474][    T1] Run /init as init process
> > > [  133.727196][    T1] systemd[1]: System time before build time, advancing
> > > clock.
> > > [  134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
> > > left
> > > [  134.764026][    T1] systemd[1]: systemd 239 running in system mode. (+PAM
> > > +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
> > > +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
> > > hierarchy=legacy)
> > > [  134.799044][    T1] systemd[1]: Detected architecture arm64.
> > > [  134.804818][    T1] systemd[1]: Running in initial RAM disk.
> > > <...hang...>
> > > 
> > > Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
> > > it to continue successfully.
> > > 
> > > 
> > > [  121.093846][    T1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
> > > [  123.157524][    T1] random: systemd: uninitialized urandom read (16 bytes
> > > read)
> > > [  123.168562][    T1] systemd[1]: Listening on Journal Socket.
> > > [  OK  ] Listening on Journal Socket.
> > > [  123.203932][    T1] random: systemd: uninitialized urandom read (16 bytes
> > > read)
> > > [  123.212813][    T1] systemd[1]: Listening on udev Kernel Socket.
> > > [  OK  ] Listening on udev Kernel Socket.
> > > ...
> 
> Not sure if the arm64 hang is just an effect of the potential console deadlock
> below. The lockdep splat can be reproduced by set,
> 
> CONFIG_DEBUG_OBJECTS_TIMER=n (=y will lead to the hang above)
> CONFIG_PROVE_LOCKING=y
> CONFIG_SLAB_FREELIST_RANDOM=y (with page_alloc.shuffle=1)
> 
> while compiling kernels,
> 
> [ 1078.214683][T43784] WARNING: possible circular locking dependency detected
> [ 1078.221550][T43784] 5.3.0-rc7-next-20190904 #14 Not tainted
> [ 1078.227112][T43784] ------------------------------------------------------
> [ 1078.233976][T43784] vi/43784 is trying to acquire lock:
> [ 1078.239192][T43784] ffff008b7cff9290 (&(&zone->lock)->rlock){-.-.}, at:
> rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1078.249111][T43784] 
> [ 1078.249111][T43784] but task is already holding lock:
> [ 1078.256322][T43784] ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
> pty_write+0x78/0x100
> [ 1078.264760][T43784] 
> [ 1078.264760][T43784] which lock already depends on the new lock.
> [ 1078.264760][T43784] 
> [ 1078.275008][T43784] 
> [ 1078.275008][T43784] the existing dependency chain (in reverse order) is:
> [ 1078.283869][T43784] 
> [ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
> [ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.296394][T43784]        lock_acquire+0x154/0x428
> [ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
> [ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
> [ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
> [ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
> [ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
> [ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
> [ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
> [ 1078.337868][T43784]        __uart_start+0x90/0xa0
> [ 1078.342563][T43784]        uart_write+0x15c/0x2c8
> [ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
> [ 1078.352304][T43784]        n_tty_write+0x300/0x668
> [ 1078.357087][T43784]        tty_write+0x2e8/0x430
> [ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
> [ 1078.367086][T43784]        do_iter_write+0x228/0x270
> [ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
> [ 1078.376735][T43784]        do_writev+0xdc/0x180
> [ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
> [ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.391606][T43784]        el0_svc+0x8/0xc
> [ 1078.395691][T43784] 
> [ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
> [ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.407604][T43784]        lock_acquire+0x154/0x428
> [ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
> [ 1078.422820][T43784]        console_unlock+0x3c4/0x898
> [ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.432732][T43784]        vprintk_default+0x48/0x58
> [ 1078.437688][T43784]        vprintk_func+0x194/0x250
> [ 1078.442557][T43784]        printk+0xbc/0xec
> [ 1078.446732][T43784]        register_console+0x4a8/0x580
> [ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
> [ 1078.457250][T43784]        pl011_register_port+0x98/0x128
> [ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
> [ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
> [ 1078.473075][T43784]        really_probe+0x15c/0x5d8
> [ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
> [ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
> [ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
> [ 1078.494201][T43784]        __device_attach+0x164/0x240
> [ 1078.499331][T43784]        device_initial_probe+0x24/0x30
> [ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
> [ 1078.509850][T43784]        device_add+0x63c/0x960
> [ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
> [ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
> [ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
> [ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
> [ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
> [ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
> [ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
> [ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
> [ 1078.575058][T43784]        acpi_init+0x330/0x3b8
> [ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
> [ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
> [ 1078.590360][T43784]        kernel_init+0x18/0x138
> [ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
> [ 1078.599835][T43784] 
> [ 1078.599835][T43784] -> #1 (console_owner){-...}:
> [ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.611661][T43784]        lock_acquire+0x154/0x428
> [ 1078.616530][T43784]        console_unlock+0x298/0x898
> [ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.626442][T43784]        vprintk_default+0x48/0x58
> [ 1078.631398][T43784]        vprintk_func+0x194/0x250
> [ 1078.636267][T43784]        printk+0xbc/0xec
> [ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
> [ 1078.646267][T43784]        get_random_u64+0x4c/0x100
> [ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
> [ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
> [ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
> [ 1078.667132][T43784]        __free_pages+0xc4/0x118
> [ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
> [ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
> [ 1078.682522][T43784]        memblock_free_all+0x264/0x330
> [ 1078.687825][T43784]        mem_init+0x90/0x148
> [ 1078.692259][T43784]        start_kernel+0x368/0x684
> [ 1078.697126][T43784] 
> [ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
> [ 1078.704604][T43784]        check_prev_add+0x120/0x1138
> [ 1078.709733][T43784]        validate_chain+0x888/0x1270
> [ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.719906][T43784]        lock_acquire+0x154/0x428
> [ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
> [ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
> [ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
> [ 1078.757725][T43784]        new_slab+0x98/0xc0
> [ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
> [ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
> [ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
> [ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
> [ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1078.788849][T43784]        pty_write+0x98/0x100
> [ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
> [ 1078.798152][T43784]        tty_write+0x2e8/0x430
> [ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
> [ 1078.807368][T43784]        vfs_write+0xf0/0x230
> [ 1078.811890][T43784]        ksys_write+0xd4/0x180
> [ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
> [ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.826756][T43784]        el0_svc+0x8/0xc
> [ 1078.830842][T43784] 
> [ 1078.830842][T43784] other info that might help us debug this:
> [ 1078.830842][T43784] 
> [ 1078.840918][T43784] Chain exists of:
> [ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-
> > lock)->rlock
> 
> [ 1078.840918][T43784] 
> [ 1078.854731][T43784]  Possible unsafe locking scenario:
> [ 1078.854731][T43784] 
> [ 1078.862029][T43784]        CPU0                    CPU1
> [ 1078.867243][T43784]        ----                    ----
> [ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
> [ 1078.877238][T43784]                                lock(&port_lock_key);
> [ 1078.883929][T43784]                                lock(&(&port->lock)-
> > rlock);
> 
> [ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
> [ 1078.896010][T43784] 
> [ 1078.896010][T43784]  *** DEADLOCK ***
> [ 1078.896010][T43784] 
> [ 1078.904004][T43784] 5 locks held by vi/43784:
> [ 1078.908351][T43784]  #0: ffff000c36240890 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x44/0x50
> [ 1078.917133][T43784]  #1: ffff000c36240918 (&tty->atomic_write_lock){+.+.},
> at: tty_write_lock+0x24/0x60
> [ 1078.926521][T43784]  #2: ffff000c36240aa0 (&o_tty->termios_rwsem/1){++++},
> at: n_tty_write+0x108/0x668
> [ 1078.935823][T43784]  #3: ffffa0001e0b2360 (&ldata->output_lock){+.+.}, at:
> n_tty_write+0x1d0/0x668
> [ 1078.944777][T43784]  #4: ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
> pty_write+0x78/0x100
> [ 1078.953644][T43784] 
> [ 1078.953644][T43784] stack backtrace:
> [ 1078.959382][T43784] CPU: 97 PID: 43784 Comm: vi Not tainted 5.3.0-rc7-next-
> 20190904 #14
> [ 1078.967376][T43784] Hardware name: HPE Apollo
> 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
> [ 1078.977799][T43784] Call trace:
> [ 1078.980932][T43784]  dump_backtrace+0x0/0x228
> [ 1078.985279][T43784]  show_stack+0x24/0x30
> [ 1078.989282][T43784]  dump_stack+0xe8/0x13c
> [ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
> [ 1078.998240][T43784]  check_noncircular+0x268/0x310
> [ 1079.003022][T43784]  check_prev_add+0x120/0x1138
> [ 1079.007631][T43784]  validate_chain+0x888/0x1270
> [ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
> [ 1079.016763][T43784]  lock_acquire+0x154/0x428
> [ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
> [ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
> [ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
> [ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
> [ 1079.050934][T43784]  new_slab+0x98/0xc0
> [ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
> [ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
> [ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
> [ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
> [ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1079.078932][T43784]  pty_write+0x98/0x100
> [ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
> [ 1079.087193][T43784]  tty_write+0x2e8/0x430
> [ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
> [ 1079.095367][T43784]  vfs_write+0xf0/0x230
> [ 1079.099368][T43784]  ksys_write+0xd4/0x180
> [ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
> [ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
> [ 1079.112672][T43784]  el0_svc+0x8/0xc

Hmm, it feels like that CONFIG_SHUFFLE_PAGE_ALLOCATOR=y introduces some unique
locking patterns that the lockdep does not like via,

allocate_slab
  shuffle_freelist
    get_random_u32

Here is another splat with while compiling/installing a kernel,

[ 1254.443119][    C2] WARNING: possible circular locking dependency detected
[ 1254.450038][    C2] 5.3.0-rc5-next-20190822 #1 Not tainted
[ 1254.455559][    C2] ------------------------------------------------------
[ 1254.462988][    C2] swapper/2/0 is trying to acquire lock:
[ 1254.468509][    C2] ffffffffa2925218 (random_write_wait.lock){..-.}, at:
__wake_up_common_lock+0xc6/0x150
[ 1254.478154][    C2] 
[ 1254.478154][    C2] but task is already holding lock:
[ 1254.485896][    C2] ffff88845373fda0 (batched_entropy_u32.lock){-.-.}, at:
get_random_u32+0x4c/0xe0
[ 1254.495007][    C2] 
[ 1254.495007][    C2] which lock already depends on the new lock.
[ 1254.495007][    C2] 
[ 1254.505331][    C2] 
[ 1254.505331][    C2] the existing dependency chain (in reverse order) is:
[ 1254.514755][    C2] 
[ 1254.514755][    C2] -> #3 (batched_entropy_u32.lock){-.-.}:
[ 1254.522553][    C2]        __lock_acquire+0x5b3/0xb40
[ 1254.527638][    C2]        lock_acquire+0x126/0x280
[ 1254.533016][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
[ 1254.538624][    C2]        get_random_u32+0x4c/0xe0
[ 1254.543539][    C2]        allocate_slab+0x6d6/0x19c0
[ 1254.548625][    C2]        new_slab+0x46/0x70
[ 1254.553010][    C2]        ___slab_alloc+0x58b/0x960
[ 1254.558533][    C2]        __slab_alloc+0x43/0x70
[ 1254.563269][    C2]        kmem_cache_alloc+0x354/0x460
[ 1254.568534][    C2]        fill_pool+0x272/0x4b0
[ 1254.573182][    C2]        __debug_object_init+0x86/0x7a0
[ 1254.578615][    C2]        debug_object_init+0x16/0x20
[ 1254.584256][    C2]        hrtimer_init+0x27/0x1e0
[ 1254.589079][    C2]        init_dl_task_timer+0x20/0x40
[ 1254.594342][    C2]        __sched_fork+0x10b/0x1f0
[ 1254.599253][    C2]        init_idle+0xac/0x520
[ 1254.603816][    C2]        fork_idle+0x18c/0x230
[ 1254.608933][    C2]        idle_threads_init+0xf0/0x187
[ 1254.614193][    C2]        smp_init+0x1d/0x12d
[ 1254.618671][    C2]        kernel_init_freeable+0x37e/0x76e
[ 1254.624282][    C2]        kernel_init+0x11/0x12f
[ 1254.629016][    C2]        ret_from_fork+0x27/0x50
[ 1254.634344][    C2] 
[ 1254.634344][    C2] -> #2 (&rq->lock){-.-.}:
[ 1254.640831][    C2]        __lock_acquire+0x5b3/0xb40
[ 1254.645917][    C2]        lock_acquire+0x126/0x280
[ 1254.650827][    C2]        _raw_spin_lock+0x2f/0x40
[ 1254.655741][    C2]        task_fork_fair+0x43/0x200
[ 1254.661213][    C2]        sched_fork+0x29b/0x420
[ 1254.665949][    C2]        copy_process+0xf12/0x3180
[ 1254.670947][    C2]        _do_fork+0xef/0x950
[ 1254.675422][    C2]        kernel_thread+0xa8/0xe0
[ 1254.680244][    C2]        rest_init+0x28/0x311
[ 1254.685298][    C2]        arch_call_rest_init+0xe/0x1b
[ 1254.690558][    C2]        start_kernel+0x6eb/0x724
[ 1254.695469][    C2]        x86_64_start_reservations+0x24/0x26
[ 1254.701339][    C2]        x86_64_start_kernel+0xf4/0xfb
[ 1254.706689][    C2]        secondary_startup_64+0xb6/0xc0
[ 1254.712601][    C2] 
[ 1254.712601][    C2] -> #1 (&p->pi_lock){-.-.}:
[ 1254.719263][    C2]        __lock_acquire+0x5b3/0xb40
[ 1254.724349][    C2]        lock_acquire+0x126/0x280
[ 1254.729260][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
[ 1254.735317][    C2]        try_to_wake_up+0xad/0x1050
[ 1254.740403][    C2]        default_wake_function+0x2f/0x40
[ 1254.745929][    C2]        pollwake+0x10d/0x160
[ 1254.750491][    C2]        __wake_up_common+0xc4/0x2a0
[ 1254.755663][    C2]        __wake_up_common_lock+0xea/0x150
[ 1254.761756][    C2]        __wake_up+0x13/0x20
[ 1254.766230][    C2]        account.constprop.9+0x217/0x340
[ 1254.771754][    C2]        extract_entropy.constprop.7+0xcf/0x220
[ 1254.777886][    C2]        _xfer_secondary_pool+0x19a/0x3d0
[ 1254.783981][    C2]        push_to_pool+0x3e/0x230
[ 1254.788805][    C2]        process_one_work+0x52a/0xb40
[ 1254.794064][    C2]        worker_thread+0x63/0x5b0
[ 1254.798977][    C2]        kthread+0x1df/0x200
[ 1254.803451][    C2]        ret_from_fork+0x27/0x50
[ 1254.808787][    C2] 
[ 1254.808787][    C2] -> #0 (random_write_wait.lock){..-.}:
[ 1254.816409][    C2]        check_prev_add+0x107/0xea0
[ 1254.821494][    C2]        validate_chain+0x8fc/0x1200
[ 1254.826667][    C2]        __lock_acquire+0x5b3/0xb40
[ 1254.831751][    C2]        lock_acquire+0x126/0x280
[ 1254.837189][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
[ 1254.842797][    C2]        __wake_up_common_lock+0xc6/0x150
[ 1254.848408][    C2]        __wake_up+0x13/0x20
[ 1254.852882][    C2]        account.constprop.9+0x217/0x340
[ 1254.858988][    C2]        extract_entropy.constprop.7+0xcf/0x220
[ 1254.865122][    C2]        crng_reseed+0xa1/0x3f0
[ 1254.869859][    C2]        _extract_crng+0xc3/0xd0
[ 1254.874682][    C2]        crng_reseed+0x21b/0x3f0
[ 1254.879505][    C2]        _extract_crng+0xc3/0xd0
[ 1254.884772][    C2]        extract_crng+0x40/0x60
[ 1254.889507][    C2]        get_random_u32+0xb4/0xe0
[ 1254.894417][    C2]        allocate_slab+0x6d6/0x19c0
[ 1254.899501][    C2]        new_slab+0x46/0x70
[ 1254.903886][    C2]        ___slab_alloc+0x58b/0x960
[ 1254.909377][    C2]        __slab_alloc+0x43/0x70
[ 1254.914112][    C2]        kmem_cache_alloc+0x354/0x460
[ 1254.919375][    C2]        __build_skb+0x23/0x60
[ 1254.924024][    C2]        __netdev_alloc_skb+0x127/0x1e0
[ 1254.929470][    C2]        tg3_poll_work+0x11b2/0x1f70 [tg3]
[ 1254.935654][    C2]        tg3_poll_msix+0x67/0x330 [tg3]
[ 1254.941092][    C2]        net_rx_action+0x24e/0x7e0
[ 1254.946089][    C2]        __do_softirq+0x123/0x767
[ 1254.951000][    C2]        irq_exit+0xd6/0xf0
[ 1254.955385][    C2]        do_IRQ+0xe2/0x1a0
[ 1254.960155][    C2]        ret_from_intr+0x0/0x2a
[ 1254.964896][    C2]        cpuidle_enter_state+0x156/0x8e0
[ 1254.970418][    C2]        cpuidle_enter+0x41/0x70
[ 1254.975242][    C2]        call_cpuidle+0x5e/0x90
[ 1254.979975][    C2]        do_idle+0x333/0x370
[ 1254.984972][    C2]        cpu_startup_entry+0x1d/0x1f
[ 1254.990148][    C2]        start_secondary+0x290/0x330
[ 1254.995319][    C2]        secondary_startup_64+0xb6/0xc0
[ 1255.000750][    C2] 
[ 1255.000750][    C2] other info that might help us debug this:
[ 1255.000750][    C2] 
[ 1255.011424][    C2] Chain exists of:
[ 1255.011424][    C2]   random_write_wait.lock --> &rq->lock -->
batched_entropy_u32.lock
[ 1255.011424][    C2] 
[ 1255.025245][    C2]  Possible unsafe locking scenario:
[ 1255.025245][    C2] 
[ 1255.033012][    C2]        CPU0                    CPU1
[ 1255.038270][    C2]        ----                    ----
[ 1255.043526][    C2]   lock(batched_entropy_u32.lock);
[ 1255.048610][    C2]                                lock(&rq->lock);
[
1255.054918][    C2]                                lock(batched_entropy_u32.loc
k);
[ 1255.063035][    C2]   lock(random_write_wait.lock);
[ 1255.067945][    C2] 
[ 1255.067945][    C2]  *** DEADLOCK ***
[ 1255.067945][    C2] 
[ 1255.076000][    C2] 1 lock held by swapper/2/0:
[ 1255.080558][    C2]  #0: ffff88845373fda0 (batched_entropy_u32.lock){-.-.},
at: get_random_u32+0x4c/0xe0
[ 1255.090547][    C2] 
[ 1255.090547][    C2] stack backtrace:
[ 1255.096333][    C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-rc5-next-
20190822 #1
[ 1255.104473][    C2] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 03/09/2018
[ 1255.114276][    C2] Call Trace:
[ 1255.117439][    C2]  <IRQ>
[ 1255.120169][    C2]  dump_stack+0x86/0xca
[ 1255.124205][    C2]  print_circular_bug.cold.32+0x243/0x26e
[ 1255.129816][    C2]  check_noncircular+0x29e/0x2e0
[ 1255.135221][    C2]  ? __bfs+0x238/0x380
[ 1255.139172][    C2]  ? print_circular_bug+0x120/0x120
[ 1255.144259][    C2]  ? find_usage_forwards+0x7d/0xb0
[ 1255.149260][    C2]  check_prev_add+0x107/0xea0
[ 1255.153823][    C2]  validate_chain+0x8fc/0x1200
[ 1255.159007][    C2]  ? check_prev_add+0xea0/0xea0
[ 1255.163743][    C2]  ? check_usage_backwards+0x210/0x210
[ 1255.169091][    C2]  __lock_acquire+0x5b3/0xb40
[ 1255.173655][    C2]  lock_acquire+0x126/0x280
[ 1255.178041][    C2]  ? __wake_up_common_lock+0xc6/0x150
[ 1255.183732][    C2]  _raw_spin_lock_irqsave+0x3a/0x50
[ 1255.188817][    C2]  ? __wake_up_common_lock+0xc6/0x150
[ 1255.194076][    C2]  __wake_up_common_lock+0xc6/0x150
[ 1255.199163][    C2]  ? __wake_up_common+0x2a0/0x2a0
[ 1255.204078][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
[ 1255.210428][    C2]  __wake_up+0x13/0x20
[ 1255.214379][    C2]  account.constprop.9+0x217/0x340
[ 1255.219377][    C2]  extract_entropy.constprop.7+0xcf/0x220
[ 1255.224987][    C2]  ? crng_reseed+0xa1/0x3f0
[ 1255.229375][    C2]  crng_reseed+0xa1/0x3f0
[ 1255.234122][    C2]  ? rcu_read_lock_sched_held+0xac/0xe0
[ 1255.239556][    C2]  ? check_flags.part.16+0x86/0x220
[ 1255.244641][    C2]  ? extract_entropy.constprop.7+0x220/0x220
[ 1255.250511][    C2]  ? __kasan_check_read+0x11/0x20
[ 1255.255422][    C2]  ? validate_chain+0xab/0x1200
[ 1255.260742][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
[ 1255.266616][    C2]  _extract_crng+0xc3/0xd0
[ 1255.270915][    C2]  crng_reseed+0x21b/0x3f0
[ 1255.275215][    C2]  ? extract_entropy.constprop.7+0x220/0x220
[ 1255.281085][    C2]  ? __kasan_check_write+0x14/0x20
[ 1255.286517][    C2]  ? do_raw_spin_lock+0x118/0x1d0
[ 1255.291428][    C2]  ? rwlock_bug.part.0+0x60/0x60
[ 1255.296251][    C2]  _extract_crng+0xc3/0xd0
[ 1255.300550][    C2]  extract_crng+0x40/0x60
[ 1255.304763][    C2]  get_random_u32+0xb4/0xe0
[ 1255.309640][    C2]  allocate_slab+0x6d6/0x19c0
[ 1255.314203][    C2]  new_slab+0x46/0x70
[ 1255.318066][    C2]  ___slab_alloc+0x58b/0x960
[ 1255.322539][    C2]  ? __build_skb+0x23/0x60
[ 1255.326841][    C2]  ? fault_create_debugfs_attr+0x140/0x140
[ 1255.333048][    C2]  ? __build_skb+0x23/0x60
[ 1255.337348][    C2]  __slab_alloc+0x43/0x70
[ 1255.341559][    C2]  ? __slab_alloc+0x43/0x70
[ 1255.345944][    C2]  ? __build_skb+0x23/0x60
[ 1255.350242][    C2]  kmem_cache_alloc+0x354/0x460
[ 1255.354978][    C2]  ? __netdev_alloc_skb+0x1c6/0x1e0
[ 1255.360626][    C2]  ? trace_hardirqs_on+0x3a/0x160
[ 1255.365535][    C2]  __build_skb+0x23/0x60
[ 1255.369660][    C2]  __netdev_alloc_skb+0x127/0x1e0
[ 1255.374576][    C2]  tg3_poll_work+0x11b2/0x1f70 [tg3]
[ 1255.379750][    C2]  ? find_held_lock+0x11b/0x150
[ 1255.385027][    C2]  ? tg3_tx_recover+0xa0/0xa0 [tg3]
[ 1255.390114][    C2]  ? _raw_spin_unlock_irqrestore+0x38/0x50
[ 1255.395809][    C2]  ? __kasan_check_read+0x11/0x20
[ 1255.400718][    C2]  ? validate_chain+0xab/0x1200
[ 1255.405455][    C2]  ? __wake_up_common+0x2a0/0x2a0
[ 1255.410761][    C2]  ? mark_held_locks+0x34/0xb0
[ 1255.415415][    C2]  tg3_poll_msix+0x67/0x330 [tg3]
[ 1255.420327][    C2]  net_rx_action+0x24e/0x7e0
[ 1255.424800][    C2]  ? find_held_lock+0x11b/0x150
[ 1255.429536][    C2]  ? napi_busy_loop+0x600/0x600
[ 1255.434733][    C2]  ? rcu_read_lock_sched_held+0xac/0xe0
[ 1255.440169][    C2]  ? __do_softirq+0xed/0x767
[ 1255.444642][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
[ 1255.450518][    C2]  ? lockdep_hardirqs_on+0x1b0/0x2a0
[ 1255.455693][    C2]  ? irq_exit+0xd6/0xf0
[ 1255.460280][    C2]  __do_softirq+0x123/0x767
[ 1255.464668][    C2]  irq_exit+0xd6/0xf0
[ 1255.468532][    C2]  do_IRQ+0xe2/0x1a0
[ 1255.472308][    C2]  common_interrupt+0xf/0xf
[ 1255.476694][    C2]  </IRQ>
[ 1255.479509][    C2] RIP: 0010:cpuidle_enter_state+0x156/0x8e0
[ 1255.485750][    C2] Code: bf ff 8b 05 a4 27 2d 01 85 c0 0f 8f 1d 04 00 00 31
ff e8 4d ba 92 ff 80 7d d0 00 0f 85 0b 02 00 00 e8 ae c0 a7 ff fb 45 85 ed <0f>
88 2d 02 00 00 4d 63 fd 49 83 ff 09 0f 87 91 06 00 00 4b 8d 04
[ 1255.505335][    C2] RSP: 0018:ffff888206637cf8 EFLAGS: 00000202 ORIG_RAX:
ffffffffffffffc8
[ 1255.514154][    C2] RAX: 0000000000000000 RBX: ffff889f98b44008 RCX:
ffffffffa116e980
[ 1255.522033][    C2] RDX: 0000000000000007 RSI: dffffc0000000000 RDI:
ffff8882066287ec
[ 1255.529913][    C2] RBP: ffff888206637d48 R08: fffffbfff4557aee R09:
0000000000000000
[ 1255.538278][    C2] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffffffa28e8ac0
[ 1255.546158][    C2] R13: 0000000000000002 R14: 0000012412160253 R15:
ffff889f98b4400c
[ 1255.554040][    C2]  ? lockdep_hardirqs_on+0x1b0/0x2a0
[ 1255.559725][    C2]  ? cpuidle_enter_state+0x152/0x8e0
[ 1255.564898][    C2]  cpuidle_enter+0x41/0x70
[ 1255.569196][    C2]  call_cpuidle+0x5e/0x90
[ 1255.573408][    C2]  do_idle+0x333/0x370
[ 1255.577358][    C2]  ? complete+0x51/0x60
[ 1255.581394][    C2]  ? arch_cpu_idle_exit+0x40/0x40
[ 1255.586777][    C2]  ? complete+0x51/0x60
[ 1255.590814][    C2]  cpu_startup_entry+0x1d/0x1f
[ 1255.595461][    C2]  start_secondary+0x290/0x330
[ 1255.600111][    C2]  ? set_cpu_sibling_map+0x18f0/0x18f0
[ 1255.605460][    C2]  secondary_startup_64+0xb6/0xc0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-10 15:22   ` Qian Cai
  2019-09-10 19:49     ` Qian Cai
@ 2019-09-10 20:35     ` Qian Cai
  2019-09-11  1:10     ` Sergey Senozhatsky
  2 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-09-10 20:35 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Catalin Marinas, Will Deacon
  Cc: Sergey Senozhatsky, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner

On Tue, 2019-09-10 at 11:22 -0400, Qian Cai wrote:
> On Thu, 2019-09-05 at 17:08 -0400, Qian Cai wrote:
> > Another data point is if change CONFIG_DEBUG_OBJECTS_TIMERS from =y to =n, it
> > will also fix it.
> > 
> > On Thu, 2019-08-22 at 17:33 -0400, Qian Cai wrote:
> > > https://raw.githubusercontent.com/cailca/linux-mm/master/arm64.config
> > > 
> > > Booting an arm64 ThunderX2 server with page_alloc.shuffle=1 [1] +
> > > CONFIG_PROVE_LOCKING=y results in hanging.
> > > 
> > > [1] https://lore.kernel.org/linux-mm/154899811208.3165233.17623209031065121886.s
> > > tgit@dwillia2-desk3.amr.corp.intel.com/
> > > 
> > > ...
> > > [  125.142689][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x2
> > > [  125.149687][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.165198][    T1] arm-smmu-v3 arm-smmu-v3.2.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.239425][ [  125.251484][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: option
> > > mask 0x2
> > > [  125.258233][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.282750][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.320097][    T1] arm-smmu-v3 arm-smmu-v3.3.auto: allocated 524288 entries
> > > for evtq
> > > [  125.332667][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: option mask 0x2
> > > [  125.339427][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.354846][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.375295][    T1] arm-smmu-v3 arm-smmu-v3.4.auto: allocated 524288 entries
> > > for evtq
> > > [  125.387371][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: option mask 0x2
> > > [  125.393955][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: ias 44-bit, oas 44-bit
> > > (features 0x0000170d)
> > > [  125.522605][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > > for cmdq
> > > [  125.543338][    T1] arm-smmu-v3 arm-smmu-v3.5.auto: allocated 524288 entries
> > > for evtq
> > > [  126.694742][    T1] EFI Variables Facility v0.08 2004-May-17
> > > [  126.799291][    T1] NET: Registered protocol family 17
> > > [  126.978632][    T1] zswap: loaded using pool lzo/zbud
> > > [  126.989168][    T1] kmemleak: Kernel memory leak detector initialized
> > > [  126.989191][ T1577] kmemleak: Automatic memory scanning thread started
> > > [  127.044079][ T1335] pcieport 0000:0f:00.0: Adding to iommu group 0
> > > [  127.388074][    T1] Freeing unused kernel memory: 22528K
> > > [  133.527005][    T1] Checked W+X mappings: passed, no W+X pages found
> > > [  133.533474][    T1] Run /init as init process
> > > [  133.727196][    T1] systemd[1]: System time before build time, advancing
> > > clock.
> > > [  134.576021][ T1587] modprobe (1587) used greatest stack depth: 27056 bytes
> > > left
> > > [  134.764026][    T1] systemd[1]: systemd 239 running in system mode. (+PAM
> > > +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT
> > > +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-
> > > hierarchy=legacy)
> > > [  134.799044][    T1] systemd[1]: Detected architecture arm64.
> > > [  134.804818][    T1] systemd[1]: Running in initial RAM disk.
> > > <...hang...>
> > > 
> > > Fix it by either set page_alloc.shuffle=0 or CONFIG_PROVE_LOCKING=n which allow
> > > it to continue successfully.
> > > 
> > > 
> > > [  121.093846][    T1] systemd[1]: Set hostname to <hpe-apollo-cn99xx>.
> > > [  123.157524][    T1] random: systemd: uninitialized urandom read (16 bytes
> > > read)
> > > [  123.168562][    T1] systemd[1]: Listening on Journal Socket.
> > > [  OK  ] Listening on Journal Socket.
> > > [  123.203932][    T1] random: systemd: uninitialized urandom read (16 bytes
> > > read)
> > > [  123.212813][    T1] systemd[1]: Listening on udev Kernel Socket.
> > > [  OK  ] Listening on udev Kernel Socket.
> > > ...
> 
> Not sure if the arm64 hang is just an effect of the potential console deadlock
> below. The lockdep splat can be reproduced by set,
> 
> CONFIG_DEBUG_OBJECTS_TIMER=n (=y will lead to the hang above)
> CONFIG_PROVE_LOCKING=y
> CONFIG_SLAB_FREELIST_RANDOM=y (with page_alloc.shuffle=1)
> 
> while compiling kernels,

This is more than likely, as this debug patch alone will fix the hang,

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9b54cdb301d3..4d5c38035f03 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2323,7 +2323,7 @@ u64 get_random_u64(void)
            return ret;
 #endif
 
-       warn_unseeded_randomness(&previous);
+       //warn_unseeded_randomness(&previous);
 
        batch = raw_cpu_ptr(&batched_entropy_u64);
        spin_lock_irqsave(&batch->batch_lock, flags);

which mostly indicates that the additional printk() from this path due to
"page_alloc.shuffle=1" causes a real deadlock,

allocate_slab
  shuffle_freelist
    get_random_u64
      warn_unseeded_randomness
        printk

> 
> [ 1078.214683][T43784] WARNING: possible circular locking dependency detected
> [ 1078.221550][T43784] 5.3.0-rc7-next-20190904 #14 Not tainted
> [ 1078.227112][T43784] ------------------------------------------------------
> [ 1078.233976][T43784] vi/43784 is trying to acquire lock:
> [ 1078.239192][T43784] ffff008b7cff9290 (&(&zone->lock)->rlock){-.-.}, at:
> rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1078.249111][T43784] 
> [ 1078.249111][T43784] but task is already holding lock:
> [ 1078.256322][T43784] ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
> pty_write+0x78/0x100
> [ 1078.264760][T43784] 
> [ 1078.264760][T43784] which lock already depends on the new lock.
> [ 1078.264760][T43784] 
> [ 1078.275008][T43784] 
> [ 1078.275008][T43784] the existing dependency chain (in reverse order) is:
> [ 1078.283869][T43784] 
> [ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
> [ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.296394][T43784]        lock_acquire+0x154/0x428
> [ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
> [ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
> [ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
> [ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
> [ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
> [ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
> [ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
> [ 1078.337868][T43784]        __uart_start+0x90/0xa0
> [ 1078.342563][T43784]        uart_write+0x15c/0x2c8
> [ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
> [ 1078.352304][T43784]        n_tty_write+0x300/0x668
> [ 1078.357087][T43784]        tty_write+0x2e8/0x430
> [ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
> [ 1078.367086][T43784]        do_iter_write+0x228/0x270
> [ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
> [ 1078.376735][T43784]        do_writev+0xdc/0x180
> [ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
> [ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.391606][T43784]        el0_svc+0x8/0xc
> [ 1078.395691][T43784] 
> [ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
> [ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.407604][T43784]        lock_acquire+0x154/0x428
> [ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
> [ 1078.422820][T43784]        console_unlock+0x3c4/0x898
> [ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.432732][T43784]        vprintk_default+0x48/0x58
> [ 1078.437688][T43784]        vprintk_func+0x194/0x250
> [ 1078.442557][T43784]        printk+0xbc/0xec
> [ 1078.446732][T43784]        register_console+0x4a8/0x580
> [ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
> [ 1078.457250][T43784]        pl011_register_port+0x98/0x128
> [ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
> [ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
> [ 1078.473075][T43784]        really_probe+0x15c/0x5d8
> [ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
> [ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
> [ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
> [ 1078.494201][T43784]        __device_attach+0x164/0x240
> [ 1078.499331][T43784]        device_initial_probe+0x24/0x30
> [ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
> [ 1078.509850][T43784]        device_add+0x63c/0x960
> [ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
> [ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
> [ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
> [ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
> [ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
> [ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
> [ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
> [ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
> [ 1078.575058][T43784]        acpi_init+0x330/0x3b8
> [ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
> [ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
> [ 1078.590360][T43784]        kernel_init+0x18/0x138
> [ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
> [ 1078.599835][T43784] 
> [ 1078.599835][T43784] -> #1 (console_owner){-...}:
> [ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.611661][T43784]        lock_acquire+0x154/0x428
> [ 1078.616530][T43784]        console_unlock+0x298/0x898
> [ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.626442][T43784]        vprintk_default+0x48/0x58
> [ 1078.631398][T43784]        vprintk_func+0x194/0x250
> [ 1078.636267][T43784]        printk+0xbc/0xec
> [ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
> [ 1078.646267][T43784]        get_random_u64+0x4c/0x100
> [ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
> [ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
> [ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
> [ 1078.667132][T43784]        __free_pages+0xc4/0x118
> [ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
> [ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
> [ 1078.682522][T43784]        memblock_free_all+0x264/0x330
> [ 1078.687825][T43784]        mem_init+0x90/0x148
> [ 1078.692259][T43784]        start_kernel+0x368/0x684
> [ 1078.697126][T43784] 
> [ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
> [ 1078.704604][T43784]        check_prev_add+0x120/0x1138
> [ 1078.709733][T43784]        validate_chain+0x888/0x1270
> [ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.719906][T43784]        lock_acquire+0x154/0x428
> [ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
> [ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
> [ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
> [ 1078.757725][T43784]        new_slab+0x98/0xc0
> [ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
> [ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
> [ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
> [ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
> [ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1078.788849][T43784]        pty_write+0x98/0x100
> [ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
> [ 1078.798152][T43784]        tty_write+0x2e8/0x430
> [ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
> [ 1078.807368][T43784]        vfs_write+0xf0/0x230
> [ 1078.811890][T43784]        ksys_write+0xd4/0x180
> [ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
> [ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.826756][T43784]        el0_svc+0x8/0xc
> [ 1078.830842][T43784] 
> [ 1078.830842][T43784] other info that might help us debug this:
> [ 1078.830842][T43784] 
> [ 1078.840918][T43784] Chain exists of:
> [ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-
> > lock)->rlock
> 
> [ 1078.840918][T43784] 
> [ 1078.854731][T43784]  Possible unsafe locking scenario:
> [ 1078.854731][T43784] 
> [ 1078.862029][T43784]        CPU0                    CPU1
> [ 1078.867243][T43784]        ----                    ----
> [ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
> [ 1078.877238][T43784]                                lock(&port_lock_key);
> [ 1078.883929][T43784]                                lock(&(&port->lock)-
> > rlock);
> 
> [ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
> [ 1078.896010][T43784] 
> [ 1078.896010][T43784]  *** DEADLOCK ***
> [ 1078.896010][T43784] 
> [ 1078.904004][T43784] 5 locks held by vi/43784:
> [ 1078.908351][T43784]  #0: ffff000c36240890 (&tty->ldisc_sem){++++}, at:
> ldsem_down_read+0x44/0x50
> [ 1078.917133][T43784]  #1: ffff000c36240918 (&tty->atomic_write_lock){+.+.},
> at: tty_write_lock+0x24/0x60
> [ 1078.926521][T43784]  #2: ffff000c36240aa0 (&o_tty->termios_rwsem/1){++++},
> at: n_tty_write+0x108/0x668
> [ 1078.935823][T43784]  #3: ffffa0001e0b2360 (&ldata->output_lock){+.+.}, at:
> n_tty_write+0x1d0/0x668
> [ 1078.944777][T43784]  #4: ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
> pty_write+0x78/0x100
> [ 1078.953644][T43784] 
> [ 1078.953644][T43784] stack backtrace:
> [ 1078.959382][T43784] CPU: 97 PID: 43784 Comm: vi Not tainted 5.3.0-rc7-next-
> 20190904 #14
> [ 1078.967376][T43784] Hardware name: HPE Apollo
> 70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
> [ 1078.977799][T43784] Call trace:
> [ 1078.980932][T43784]  dump_backtrace+0x0/0x228
> [ 1078.985279][T43784]  show_stack+0x24/0x30
> [ 1078.989282][T43784]  dump_stack+0xe8/0x13c
> [ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
> [ 1078.998240][T43784]  check_noncircular+0x268/0x310
> [ 1079.003022][T43784]  check_prev_add+0x120/0x1138
> [ 1079.007631][T43784]  validate_chain+0x888/0x1270
> [ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
> [ 1079.016763][T43784]  lock_acquire+0x154/0x428
> [ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
> [ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
> [ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
> [ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
> [ 1079.050934][T43784]  new_slab+0x98/0xc0
> [ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
> [ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
> [ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
> [ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
> [ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1079.078932][T43784]  pty_write+0x98/0x100
> [ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
> [ 1079.087193][T43784]  tty_write+0x2e8/0x430
> [ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
> [ 1079.095367][T43784]  vfs_write+0xf0/0x230
> [ 1079.099368][T43784]  ksys_write+0xd4/0x180
> [ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
> [ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
> [ 1079.112672][T43784]  el0_svc+0x8/0xc


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-10 15:22   ` Qian Cai
  2019-09-10 19:49     ` Qian Cai
  2019-09-10 20:35     ` page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
@ 2019-09-11  1:10     ` Sergey Senozhatsky
  2019-09-12 12:05       ` Qian Cai
  2 siblings, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-11  1:10 UTC (permalink / raw)
  To: Qian Cai
  Cc: Petr Mladek, Steven Rostedt, Catalin Marinas, Will Deacon,
	Sergey Senozhatsky, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

Cc-ing Ted, Arnd, Greg

On (09/10/19 11:22), Qian Cai wrote:
> [ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
> [ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.296394][T43784]        lock_acquire+0x154/0x428
> [ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
> [ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
> [ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
> [ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
> [ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
> [ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
> [ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
> [ 1078.337868][T43784]        __uart_start+0x90/0xa0
> [ 1078.342563][T43784]        uart_write+0x15c/0x2c8
> [ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
> [ 1078.352304][T43784]        n_tty_write+0x300/0x668
> [ 1078.357087][T43784]        tty_write+0x2e8/0x430
> [ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
> [ 1078.367086][T43784]        do_iter_write+0x228/0x270
> [ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
> [ 1078.376735][T43784]        do_writev+0xdc/0x180
> [ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
> [ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.391606][T43784]        el0_svc+0x8/0xc
> [ 1078.395691][T43784] 

uart_port->lock  ->  tty_port->lock

This thing along is already a bit suspicious. We re-enter tty
here: tty -> uart -> serial -> tty

And we re-enter tty under uart_port->lock.

> [ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
> [ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.407604][T43784]        lock_acquire+0x154/0x428
> [ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
> [ 1078.422820][T43784]        console_unlock+0x3c4/0x898
> [ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.432732][T43784]        vprintk_default+0x48/0x58
> [ 1078.437688][T43784]        vprintk_func+0x194/0x250
> [ 1078.442557][T43784]        printk+0xbc/0xec
> [ 1078.446732][T43784]        register_console+0x4a8/0x580
> [ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
> [ 1078.457250][T43784]        pl011_register_port+0x98/0x128
> [ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
> [ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
> [ 1078.473075][T43784]        really_probe+0x15c/0x5d8
> [ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
> [ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
> [ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
> [ 1078.494201][T43784]        __device_attach+0x164/0x240
> [ 1078.499331][T43784]        device_initial_probe+0x24/0x30
> [ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
> [ 1078.509850][T43784]        device_add+0x63c/0x960
> [ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
> [ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
> [ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
> [ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
> [ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
> [ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
> [ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
> [ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
> [ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
> [ 1078.575058][T43784]        acpi_init+0x330/0x3b8
> [ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
> [ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
> [ 1078.590360][T43784]        kernel_init+0x18/0x138
> [ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
>
> [ 1078.599835][T43784] -> #1 (console_owner){-...}:
> [ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.611661][T43784]        lock_acquire+0x154/0x428
> [ 1078.616530][T43784]        console_unlock+0x298/0x898
> [ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
> [ 1078.626442][T43784]        vprintk_default+0x48/0x58
> [ 1078.631398][T43784]        vprintk_func+0x194/0x250
> [ 1078.636267][T43784]        printk+0xbc/0xec
> [ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
> [ 1078.646267][T43784]        get_random_u64+0x4c/0x100
> [ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
> [ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
> [ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
> [ 1078.667132][T43784]        __free_pages+0xc4/0x118
> [ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
> [ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
> [ 1078.682522][T43784]        memblock_free_all+0x264/0x330
> [ 1078.687825][T43784]        mem_init+0x90/0x148
> [ 1078.692259][T43784]        start_kernel+0x368/0x684

zone->lock --> uart_port->lock

Some debugging options/warnings/error print outs/etc introduce
deadlock patterns.

This adds zone->lock --> uart_port->lock, which then brings in
uart_port->lock --> tty_port->lock, which in turn brings
tty_port->lock --> zone->lock.

> [ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
> [ 1078.704604][T43784]        check_prev_add+0x120/0x1138
> [ 1078.709733][T43784]        validate_chain+0x888/0x1270
> [ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
> [ 1078.719906][T43784]        lock_acquire+0x154/0x428
> [ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
> [ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
> [ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
> [ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
> [ 1078.757725][T43784]        new_slab+0x98/0xc0
> [ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
> [ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
> [ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
> [ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
> [ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1078.788849][T43784]        pty_write+0x98/0x100
> [ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
> [ 1078.798152][T43784]        tty_write+0x2e8/0x430
> [ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
> [ 1078.807368][T43784]        vfs_write+0xf0/0x230
> [ 1078.811890][T43784]        ksys_write+0xd4/0x180
> [ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
> [ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
> [ 1078.826756][T43784]        el0_svc+0x8/0xc

tty_port->lock --> zone->lock

> [ 1078.830842][T43784] other info that might help us debug this:
> [ 1078.830842][T43784] 
> [ 1078.840918][T43784] Chain exists of:
> [ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-> >lock)->rlock
> [ 1078.840918][T43784] 
> [ 1078.854731][T43784]  Possible unsafe locking scenario:
> [ 1078.854731][T43784] 
> [ 1078.862029][T43784]        CPU0                    CPU1
> [ 1078.867243][T43784]        ----                    ----
> [ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
> [ 1078.877238][T43784]                                lock(&port_lock_key);
> [ 1078.883929][T43784]                                lock(&(&port->lock)->rlock);
> [ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
> [ 1078.896010][T43784] 
> [ 1078.896010][T43784]  *** DEADLOCK ***
[..]
> [ 1078.980932][T43784]  dump_backtrace+0x0/0x228
> [ 1078.985279][T43784]  show_stack+0x24/0x30
> [ 1078.989282][T43784]  dump_stack+0xe8/0x13c
> [ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
> [ 1078.998240][T43784]  check_noncircular+0x268/0x310
> [ 1079.003022][T43784]  check_prev_add+0x120/0x1138
> [ 1079.007631][T43784]  validate_chain+0x888/0x1270
> [ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
> [ 1079.016763][T43784]  lock_acquire+0x154/0x428
> [ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
> [ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
> [ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
> [ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
> [ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
> [ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
> [ 1079.050934][T43784]  new_slab+0x98/0xc0
> [ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
> [ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
> [ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
> [ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
> [ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
> [ 1079.078932][T43784]  pty_write+0x98/0x100
> [ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
> [ 1079.087193][T43784]  tty_write+0x2e8/0x430
> [ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
> [ 1079.095367][T43784]  vfs_write+0xf0/0x230
> [ 1079.099368][T43784]  ksys_write+0xd4/0x180
> [ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
> [ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
> [ 1079.112672][T43784]  el0_svc+0x8/0xc

tty_port->lock --> zone->lock

For instance, I don't really like the re-entrant tty, at least not
under uart_port->lock. This, maybe, can be one of the solutions.

Another one, a quick and dirty one, (and so many people will blame
me for this) would be to break zone->{printk}->uart chain...

Something like this

---

 drivers/char/random.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9b54cdb301d3..975015857200 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
 	print_once = true;
 #endif
 	if (__ratelimit(&unseeded_warning))
-		pr_notice("random: %s called from %pS with crng_init=%d\n",
-			  func_name, caller, crng_init);
+		printk_deferred(KERN_NOTICE "random: %s called from %pS "
+				"with crng_init=%d\n", func_name, caller,
+				crng_init);
 }
 
 /*
@@ -2462,4 +2463,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size)
 	else
 		add_device_randomness(buf, size);
 }
-EXPORT_SYMBOL_GPL(add_bootloader_randomness);
\ No newline at end of file
+EXPORT_SYMBOL_GPL(add_bootloader_randomness);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* CONFIG_SHUFFLE_PAGE_ALLOCATOR=y lockdep splat (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-10 19:49     ` Qian Cai
@ 2019-09-12  2:28       ` Qian Cai
  0 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-09-12  2:28 UTC (permalink / raw)
  To: Will Deacon, Theodore Ts'o, oleg, gkohli
  Cc: Dan Williams, Linux Memory Management List, LKML,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner

Adjusted Cc a bit as this looks like more of the scheduler territory.

> On Sep 10, 2019, at 3:49 PM, Qian Cai <cai@lca.pw> wrote:
> 
> Hmm, it feels like that CONFIG_SHUFFLE_PAGE_ALLOCATOR=y introduces some unique
> locking patterns that the lockdep does not like via,
> 
> allocate_slab
>  shuffle_freelist
>    get_random_u32
> 
> Here is another splat with while compiling/installing a kernel,
> 
> [ 1254.443119][    C2] WARNING: possible circular locking dependency detected
> [ 1254.450038][    C2] 5.3.0-rc5-next-20190822 #1 Not tainted
> [ 1254.455559][    C2] ------------------------------------------------------
> [ 1254.462988][    C2] swapper/2/0 is trying to acquire lock:
> [ 1254.468509][    C2] ffffffffa2925218 (random_write_wait.lock){..-.}, at:
> __wake_up_common_lock+0xc6/0x150
> [ 1254.478154][    C2] 
> [ 1254.478154][    C2] but task is already holding lock:
> [ 1254.485896][    C2] ffff88845373fda0 (batched_entropy_u32.lock){-.-.}, at:
> get_random_u32+0x4c/0xe0
> [ 1254.495007][    C2] 
> [ 1254.495007][    C2] which lock already depends on the new lock.
> [ 1254.495007][    C2] 
> [ 1254.505331][    C2] 
> [ 1254.505331][    C2] the existing dependency chain (in reverse order) is:
> [ 1254.514755][    C2] 
> [ 1254.514755][    C2] -> #3 (batched_entropy_u32.lock){-.-.}:
> [ 1254.522553][    C2]        __lock_acquire+0x5b3/0xb40
> [ 1254.527638][    C2]        lock_acquire+0x126/0x280
> [ 1254.533016][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.538624][    C2]        get_random_u32+0x4c/0xe0
> [ 1254.543539][    C2]        allocate_slab+0x6d6/0x19c0
> [ 1254.548625][    C2]        new_slab+0x46/0x70
> [ 1254.553010][    C2]        ___slab_alloc+0x58b/0x960
> [ 1254.558533][    C2]        __slab_alloc+0x43/0x70
> [ 1254.563269][    C2]        kmem_cache_alloc+0x354/0x460
> [ 1254.568534][    C2]        fill_pool+0x272/0x4b0
> [ 1254.573182][    C2]        __debug_object_init+0x86/0x7a0
> [ 1254.578615][    C2]        debug_object_init+0x16/0x20
> [ 1254.584256][    C2]        hrtimer_init+0x27/0x1e0
> [ 1254.589079][    C2]        init_dl_task_timer+0x20/0x40
> [ 1254.594342][    C2]        __sched_fork+0x10b/0x1f0
> [ 1254.599253][    C2]        init_idle+0xac/0x520
> [ 1254.603816][    C2]        fork_idle+0x18c/0x230
> [ 1254.608933][    C2]        idle_threads_init+0xf0/0x187
> [ 1254.614193][    C2]        smp_init+0x1d/0x12d
> [ 1254.618671][    C2]        kernel_init_freeable+0x37e/0x76e
> [ 1254.624282][    C2]        kernel_init+0x11/0x12f
> [ 1254.629016][    C2]        ret_from_fork+0x27/0x50
> [ 1254.634344][    C2] 
> [ 1254.634344][    C2] -> #2 (&rq->lock){-.-.}:
> [ 1254.640831][    C2]        __lock_acquire+0x5b3/0xb40
> [ 1254.645917][    C2]        lock_acquire+0x126/0x280
> [ 1254.650827][    C2]        _raw_spin_lock+0x2f/0x40
> [ 1254.655741][    C2]        task_fork_fair+0x43/0x200
> [ 1254.661213][    C2]        sched_fork+0x29b/0x420
> [ 1254.665949][    C2]        copy_process+0xf12/0x3180
> [ 1254.670947][    C2]        _do_fork+0xef/0x950
> [ 1254.675422][    C2]        kernel_thread+0xa8/0xe0
> [ 1254.680244][    C2]        rest_init+0x28/0x311
> [ 1254.685298][    C2]        arch_call_rest_init+0xe/0x1b
> [ 1254.690558][    C2]        start_kernel+0x6eb/0x724
> [ 1254.695469][    C2]        x86_64_start_reservations+0x24/0x26
> [ 1254.701339][    C2]        x86_64_start_kernel+0xf4/0xfb
> [ 1254.706689][    C2]        secondary_startup_64+0xb6/0xc0
> [ 1254.712601][    C2] 
> [ 1254.712601][    C2] -> #1 (&p->pi_lock){-.-.}:
> [ 1254.719263][    C2]        __lock_acquire+0x5b3/0xb40
> [ 1254.724349][    C2]        lock_acquire+0x126/0x280
> [ 1254.729260][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.735317][    C2]        try_to_wake_up+0xad/0x1050
> [ 1254.740403][    C2]        default_wake_function+0x2f/0x40
> [ 1254.745929][    C2]        pollwake+0x10d/0x160
> [ 1254.750491][    C2]        __wake_up_common+0xc4/0x2a0
> [ 1254.755663][    C2]        __wake_up_common_lock+0xea/0x150
> [ 1254.761756][    C2]        __wake_up+0x13/0x20
> [ 1254.766230][    C2]        account.constprop.9+0x217/0x340
> [ 1254.771754][    C2]        extract_entropy.constprop.7+0xcf/0x220
> [ 1254.777886][    C2]        _xfer_secondary_pool+0x19a/0x3d0
> [ 1254.783981][    C2]        push_to_pool+0x3e/0x230
> [ 1254.788805][    C2]        process_one_work+0x52a/0xb40
> [ 1254.794064][    C2]        worker_thread+0x63/0x5b0
> [ 1254.798977][    C2]        kthread+0x1df/0x200
> [ 1254.803451][    C2]        ret_from_fork+0x27/0x50
> [ 1254.808787][    C2] 
> [ 1254.808787][    C2] -> #0 (random_write_wait.lock){..-.}:
> [ 1254.816409][    C2]        check_prev_add+0x107/0xea0
> [ 1254.821494][    C2]        validate_chain+0x8fc/0x1200
> [ 1254.826667][    C2]        __lock_acquire+0x5b3/0xb40
> [ 1254.831751][    C2]        lock_acquire+0x126/0x280
> [ 1254.837189][    C2]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 1254.842797][    C2]        __wake_up_common_lock+0xc6/0x150
> [ 1254.848408][    C2]        __wake_up+0x13/0x20
> [ 1254.852882][    C2]        account.constprop.9+0x217/0x340
> [ 1254.858988][    C2]        extract_entropy.constprop.7+0xcf/0x220
> [ 1254.865122][    C2]        crng_reseed+0xa1/0x3f0
> [ 1254.869859][    C2]        _extract_crng+0xc3/0xd0
> [ 1254.874682][    C2]        crng_reseed+0x21b/0x3f0
> [ 1254.879505][    C2]        _extract_crng+0xc3/0xd0
> [ 1254.884772][    C2]        extract_crng+0x40/0x60
> [ 1254.889507][    C2]        get_random_u32+0xb4/0xe0
> [ 1254.894417][    C2]        allocate_slab+0x6d6/0x19c0
> [ 1254.899501][    C2]        new_slab+0x46/0x70
> [ 1254.903886][    C2]        ___slab_alloc+0x58b/0x960
> [ 1254.909377][    C2]        __slab_alloc+0x43/0x70
> [ 1254.914112][    C2]        kmem_cache_alloc+0x354/0x460
> [ 1254.919375][    C2]        __build_skb+0x23/0x60
> [ 1254.924024][    C2]        __netdev_alloc_skb+0x127/0x1e0
> [ 1254.929470][    C2]        tg3_poll_work+0x11b2/0x1f70 [tg3]
> [ 1254.935654][    C2]        tg3_poll_msix+0x67/0x330 [tg3]
> [ 1254.941092][    C2]        net_rx_action+0x24e/0x7e0
> [ 1254.946089][    C2]        __do_softirq+0x123/0x767
> [ 1254.951000][    C2]        irq_exit+0xd6/0xf0
> [ 1254.955385][    C2]        do_IRQ+0xe2/0x1a0
> [ 1254.960155][    C2]        ret_from_intr+0x0/0x2a
> [ 1254.964896][    C2]        cpuidle_enter_state+0x156/0x8e0
> [ 1254.970418][    C2]        cpuidle_enter+0x41/0x70
> [ 1254.975242][    C2]        call_cpuidle+0x5e/0x90
> [ 1254.979975][    C2]        do_idle+0x333/0x370
> [ 1254.984972][    C2]        cpu_startup_entry+0x1d/0x1f
> [ 1254.990148][    C2]        start_secondary+0x290/0x330
> [ 1254.995319][    C2]        secondary_startup_64+0xb6/0xc0
> [ 1255.000750][    C2] 
> [ 1255.000750][    C2] other info that might help us debug this:
> [ 1255.000750][    C2] 
> [ 1255.011424][    C2] Chain exists of:
> [ 1255.011424][    C2]   random_write_wait.lock --> &rq->lock -->
> batched_entropy_u32.lock
> [ 1255.011424][    C2] 
> [ 1255.025245][    C2]  Possible unsafe locking scenario:
> [ 1255.025245][    C2] 
> [ 1255.033012][    C2]        CPU0                    CPU1
> [ 1255.038270][    C2]        ----                    ----
> [ 1255.043526][    C2]   lock(batched_entropy_u32.lock);
> [ 1255.048610][    C2]                                lock(&rq->lock);
> [
> 1255.054918][    C2]                                lock(batched_entropy_u32.loc
> k);
> [ 1255.063035][    C2]   lock(random_write_wait.lock);
> [ 1255.067945][    C2] 
> [ 1255.067945][    C2]  *** DEADLOCK ***
> [ 1255.067945][    C2] 
> [ 1255.076000][    C2] 1 lock held by swapper/2/0:
> [ 1255.080558][    C2]  #0: ffff88845373fda0 (batched_entropy_u32.lock){-.-.},
> at: get_random_u32+0x4c/0xe0
> [ 1255.090547][    C2] 
> [ 1255.090547][    C2] stack backtrace:
> [ 1255.096333][    C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-rc5-next-
> 20190822 #1
> [ 1255.104473][    C2] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
> Gen10, BIOS A40 03/09/2018
> [ 1255.114276][    C2] Call Trace:
> [ 1255.117439][    C2]  <IRQ>
> [ 1255.120169][    C2]  dump_stack+0x86/0xca
> [ 1255.124205][    C2]  print_circular_bug.cold.32+0x243/0x26e
> [ 1255.129816][    C2]  check_noncircular+0x29e/0x2e0
> [ 1255.135221][    C2]  ? __bfs+0x238/0x380
> [ 1255.139172][    C2]  ? print_circular_bug+0x120/0x120
> [ 1255.144259][    C2]  ? find_usage_forwards+0x7d/0xb0
> [ 1255.149260][    C2]  check_prev_add+0x107/0xea0
> [ 1255.153823][    C2]  validate_chain+0x8fc/0x1200
> [ 1255.159007][    C2]  ? check_prev_add+0xea0/0xea0
> [ 1255.163743][    C2]  ? check_usage_backwards+0x210/0x210
> [ 1255.169091][    C2]  __lock_acquire+0x5b3/0xb40
> [ 1255.173655][    C2]  lock_acquire+0x126/0x280
> [ 1255.178041][    C2]  ? __wake_up_common_lock+0xc6/0x150
> [ 1255.183732][    C2]  _raw_spin_lock_irqsave+0x3a/0x50
> [ 1255.188817][    C2]  ? __wake_up_common_lock+0xc6/0x150
> [ 1255.194076][    C2]  __wake_up_common_lock+0xc6/0x150
> [ 1255.199163][    C2]  ? __wake_up_common+0x2a0/0x2a0
> [ 1255.204078][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.210428][    C2]  __wake_up+0x13/0x20
> [ 1255.214379][    C2]  account.constprop.9+0x217/0x340
> [ 1255.219377][    C2]  extract_entropy.constprop.7+0xcf/0x220
> [ 1255.224987][    C2]  ? crng_reseed+0xa1/0x3f0
> [ 1255.229375][    C2]  crng_reseed+0xa1/0x3f0
> [ 1255.234122][    C2]  ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1255.239556][    C2]  ? check_flags.part.16+0x86/0x220
> [ 1255.244641][    C2]  ? extract_entropy.constprop.7+0x220/0x220
> [ 1255.250511][    C2]  ? __kasan_check_read+0x11/0x20
> [ 1255.255422][    C2]  ? validate_chain+0xab/0x1200
> [ 1255.260742][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.266616][    C2]  _extract_crng+0xc3/0xd0
> [ 1255.270915][    C2]  crng_reseed+0x21b/0x3f0
> [ 1255.275215][    C2]  ? extract_entropy.constprop.7+0x220/0x220
> [ 1255.281085][    C2]  ? __kasan_check_write+0x14/0x20
> [ 1255.286517][    C2]  ? do_raw_spin_lock+0x118/0x1d0
> [ 1255.291428][    C2]  ? rwlock_bug.part.0+0x60/0x60
> [ 1255.296251][    C2]  _extract_crng+0xc3/0xd0
> [ 1255.300550][    C2]  extract_crng+0x40/0x60
> [ 1255.304763][    C2]  get_random_u32+0xb4/0xe0
> [ 1255.309640][    C2]  allocate_slab+0x6d6/0x19c0
> [ 1255.314203][    C2]  new_slab+0x46/0x70
> [ 1255.318066][    C2]  ___slab_alloc+0x58b/0x960
> [ 1255.322539][    C2]  ? __build_skb+0x23/0x60
> [ 1255.326841][    C2]  ? fault_create_debugfs_attr+0x140/0x140
> [ 1255.333048][    C2]  ? __build_skb+0x23/0x60
> [ 1255.337348][    C2]  __slab_alloc+0x43/0x70
> [ 1255.341559][    C2]  ? __slab_alloc+0x43/0x70
> [ 1255.345944][    C2]  ? __build_skb+0x23/0x60
> [ 1255.350242][    C2]  kmem_cache_alloc+0x354/0x460
> [ 1255.354978][    C2]  ? __netdev_alloc_skb+0x1c6/0x1e0
> [ 1255.360626][    C2]  ? trace_hardirqs_on+0x3a/0x160
> [ 1255.365535][    C2]  __build_skb+0x23/0x60
> [ 1255.369660][    C2]  __netdev_alloc_skb+0x127/0x1e0
> [ 1255.374576][    C2]  tg3_poll_work+0x11b2/0x1f70 [tg3]
> [ 1255.379750][    C2]  ? find_held_lock+0x11b/0x150
> [ 1255.385027][    C2]  ? tg3_tx_recover+0xa0/0xa0 [tg3]
> [ 1255.390114][    C2]  ? _raw_spin_unlock_irqrestore+0x38/0x50
> [ 1255.395809][    C2]  ? __kasan_check_read+0x11/0x20
> [ 1255.400718][    C2]  ? validate_chain+0xab/0x1200
> [ 1255.405455][    C2]  ? __wake_up_common+0x2a0/0x2a0
> [ 1255.410761][    C2]  ? mark_held_locks+0x34/0xb0
> [ 1255.415415][    C2]  tg3_poll_msix+0x67/0x330 [tg3]
> [ 1255.420327][    C2]  net_rx_action+0x24e/0x7e0
> [ 1255.424800][    C2]  ? find_held_lock+0x11b/0x150
> [ 1255.429536][    C2]  ? napi_busy_loop+0x600/0x600
> [ 1255.434733][    C2]  ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1255.440169][    C2]  ? __do_softirq+0xed/0x767
> [ 1255.444642][    C2]  ? rcu_read_lock_any_held.part.5+0x20/0x20
> [ 1255.450518][    C2]  ? lockdep_hardirqs_on+0x1b0/0x2a0
> [ 1255.455693][    C2]  ? irq_exit+0xd6/0xf0
> [ 1255.460280][    C2]  __do_softirq+0x123/0x767
> [ 1255.464668][    C2]  irq_exit+0xd6/0xf0
> [ 1255.468532][    C2]  do_IRQ+0xe2/0x1a0
> [ 1255.472308][    C2]  common_interrupt+0xf/0xf
> [ 1255.476694][    C2]  </IRQ>
> [ 1255.479509][    C2] RIP: 0010:cpuidle_enter_state+0x156/0x8e0
> [ 1255.485750][    C2] Code: bf ff 8b 05 a4 27 2d 01 85 c0 0f 8f 1d 04 00 00 31
> ff e8 4d ba 92 ff 80 7d d0 00 0f 85 0b 02 00 00 e8 ae c0 a7 ff fb 45 85 ed <0f>
> 88 2d 02 00 00 4d 63 fd 49 83 ff 09 0f 87 91 06 00 00 4b 8d 04
> [ 1255.505335][    C2] RSP: 0018:ffff888206637cf8 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffffc8
> [ 1255.514154][    C2] RAX: 0000000000000000 RBX: ffff889f98b44008 RCX:
> ffffffffa116e980
> [ 1255.522033][    C2] RDX: 0000000000000007 RSI: dffffc0000000000 RDI:
> ffff8882066287ec
> [ 1255.529913][    C2] RBP: ffff888206637d48 R08: fffffbfff4557aee R09:
> 0000000000000000
> [ 1255.538278][    C2] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffffa28e8ac0
> [ 1255.546158][    C2] R13: 0000000000000002 R14: 0000012412160253 R15:
> ffff889f98b4400c
> [ 1255.554040][    C2]  ? lockdep_hardirqs_on+0x1b0/0x2a0
> [ 1255.559725][    C2]  ? cpuidle_enter_state+0x152/0x8e0
> [ 1255.564898][    C2]  cpuidle_enter+0x41/0x70
> [ 1255.569196][    C2]  call_cpuidle+0x5e/0x90
> [ 1255.573408][    C2]  do_idle+0x333/0x370
> [ 1255.577358][    C2]  ? complete+0x51/0x60
> [ 1255.581394][    C2]  ? arch_cpu_idle_exit+0x40/0x40
> [ 1255.586777][    C2]  ? complete+0x51/0x60
> [ 1255.590814][    C2]  cpu_startup_entry+0x1d/0x1f
> [ 1255.595461][    C2]  start_secondary+0x290/0x330
> [ 1255.600111][    C2]  ? set_cpu_sibling_map+0x18f0/0x18f0
> [ 1255.605460][    C2]  secondary_startup_64+0xb6/0xc0

This looks like a false positive. shuffle_freelist() introduced the chain,

batched_entropy_u32.lock -> random_write_wait.lock

but I can’t see how it is possible to get the reversed chain. Imaging something like this,

 __wake_up_common_lock  <— acquired random_write_wait.lock
      __sched_fork <— is that even possible?
           init_dl_task_timer
              debug_object_init
                   allocate_slab
                      shuffle_freelist
                          get_random_u32 <— acquired batched_entropy_u32.lock

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-11  1:10     ` Sergey Senozhatsky
@ 2019-09-12 12:05       ` Qian Cai
  2019-09-16 14:42         ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-09-12 12:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Catalin Marinas, Will Deacon,
	Dan Williams, linux-mm, linux-kernel, linux-arm-kernel,
	Peter Zijlstra, Waiman Long, Thomas Gleixner, Theodore Ts'o,
	Arnd Bergmann, Greg Kroah-Hartman

On Wed, 2019-09-11 at 10:10 +0900, Sergey Senozhatsky wrote:
> Cc-ing Ted, Arnd, Greg
> 
> On (09/10/19 11:22), Qian Cai wrote:
> > [ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
> > [ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
> > [ 1078.296394][T43784]        lock_acquire+0x154/0x428
> > [ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
> > [ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
> > [ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
> > [ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
> > [ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
> > [ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
> > [ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
> > [ 1078.337868][T43784]        __uart_start+0x90/0xa0
> > [ 1078.342563][T43784]        uart_write+0x15c/0x2c8
> > [ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
> > [ 1078.352304][T43784]        n_tty_write+0x300/0x668
> > [ 1078.357087][T43784]        tty_write+0x2e8/0x430
> > [ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
> > [ 1078.367086][T43784]        do_iter_write+0x228/0x270
> > [ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
> > [ 1078.376735][T43784]        do_writev+0xdc/0x180
> > [ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
> > [ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
> > [ 1078.391606][T43784]        el0_svc+0x8/0xc
> > [ 1078.395691][T43784] 
> 
> uart_port->lock  ->  tty_port->lock
> 
> This thing along is already a bit suspicious. We re-enter tty
> here: tty -> uart -> serial -> tty
> 
> And we re-enter tty under uart_port->lock.
> 
> > [ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
> > [ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
> > [ 1078.407604][T43784]        lock_acquire+0x154/0x428
> > [ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
> > [ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
> > [ 1078.422820][T43784]        console_unlock+0x3c4/0x898
> > [ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
> > [ 1078.432732][T43784]        vprintk_default+0x48/0x58
> > [ 1078.437688][T43784]        vprintk_func+0x194/0x250
> > [ 1078.442557][T43784]        printk+0xbc/0xec
> > [ 1078.446732][T43784]        register_console+0x4a8/0x580
> > [ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
> > [ 1078.457250][T43784]        pl011_register_port+0x98/0x128
> > [ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
> > [ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
> > [ 1078.473075][T43784]        really_probe+0x15c/0x5d8
> > [ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
> > [ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
> > [ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
> > [ 1078.494201][T43784]        __device_attach+0x164/0x240
> > [ 1078.499331][T43784]        device_initial_probe+0x24/0x30
> > [ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
> > [ 1078.509850][T43784]        device_add+0x63c/0x960
> > [ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
> > [ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
> > [ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
> > [ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
> > [ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
> > [ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
> > [ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
> > [ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
> > [ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
> > [ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
> > [ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
> > [ 1078.575058][T43784]        acpi_init+0x330/0x3b8
> > [ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
> > [ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
> > [ 1078.590360][T43784]        kernel_init+0x18/0x138
> > [ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
> > 
> > [ 1078.599835][T43784] -> #1 (console_owner){-...}:
> > [ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
> > [ 1078.611661][T43784]        lock_acquire+0x154/0x428
> > [ 1078.616530][T43784]        console_unlock+0x298/0x898
> > [ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
> > [ 1078.626442][T43784]        vprintk_default+0x48/0x58
> > [ 1078.631398][T43784]        vprintk_func+0x194/0x250
> > [ 1078.636267][T43784]        printk+0xbc/0xec
> > [ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
> > [ 1078.646267][T43784]        get_random_u64+0x4c/0x100
> > [ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
> > [ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
> > [ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
> > [ 1078.667132][T43784]        __free_pages+0xc4/0x118
> > [ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
> > [ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
> > [ 1078.682522][T43784]        memblock_free_all+0x264/0x330
> > [ 1078.687825][T43784]        mem_init+0x90/0x148
> > [ 1078.692259][T43784]        start_kernel+0x368/0x684
> 
> zone->lock --> uart_port->lock
> 
> Some debugging options/warnings/error print outs/etc introduce
> deadlock patterns.
> 
> This adds zone->lock --> uart_port->lock, which then brings in
> uart_port->lock --> tty_port->lock, which in turn brings
> tty_port->lock --> zone->lock.
> 
> > [ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
> > [ 1078.704604][T43784]        check_prev_add+0x120/0x1138
> > [ 1078.709733][T43784]        validate_chain+0x888/0x1270
> > [ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
> > [ 1078.719906][T43784]        lock_acquire+0x154/0x428
> > [ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
> > [ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
> > [ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
> > [ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
> > [ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
> > [ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
> > [ 1078.757725][T43784]        new_slab+0x98/0xc0
> > [ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
> > [ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
> > [ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
> > [ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
> > [ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
> > [ 1078.788849][T43784]        pty_write+0x98/0x100
> > [ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
> > [ 1078.798152][T43784]        tty_write+0x2e8/0x430
> > [ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
> > [ 1078.807368][T43784]        vfs_write+0xf0/0x230
> > [ 1078.811890][T43784]        ksys_write+0xd4/0x180
> > [ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
> > [ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
> > [ 1078.826756][T43784]        el0_svc+0x8/0xc
> 
> tty_port->lock --> zone->lock
> 
> > [ 1078.830842][T43784] other info that might help us debug this:
> > [ 1078.830842][T43784] 
> > [ 1078.840918][T43784] Chain exists of:
> > [ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-> >lock)->rlock
> > [ 1078.840918][T43784] 
> > [ 1078.854731][T43784]  Possible unsafe locking scenario:
> > [ 1078.854731][T43784] 
> > [ 1078.862029][T43784]        CPU0                    CPU1
> > [ 1078.867243][T43784]        ----                    ----
> > [ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
> > [ 1078.877238][T43784]                                lock(&port_lock_key);
> > [ 1078.883929][T43784]                                lock(&(&port->lock)->rlock);
> > [ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
> > [ 1078.896010][T43784] 
> > [ 1078.896010][T43784]  *** DEADLOCK ***
> 
> [..]
> > [ 1078.980932][T43784]  dump_backtrace+0x0/0x228
> > [ 1078.985279][T43784]  show_stack+0x24/0x30
> > [ 1078.989282][T43784]  dump_stack+0xe8/0x13c
> > [ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
> > [ 1078.998240][T43784]  check_noncircular+0x268/0x310
> > [ 1079.003022][T43784]  check_prev_add+0x120/0x1138
> > [ 1079.007631][T43784]  validate_chain+0x888/0x1270
> > [ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
> > [ 1079.016763][T43784]  lock_acquire+0x154/0x428
> > [ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
> > [ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
> > [ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
> > [ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
> > [ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
> > [ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
> > [ 1079.050934][T43784]  new_slab+0x98/0xc0
> > [ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
> > [ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
> > [ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
> > [ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
> > [ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
> > [ 1079.078932][T43784]  pty_write+0x98/0x100
> > [ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
> > [ 1079.087193][T43784]  tty_write+0x2e8/0x430
> > [ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
> > [ 1079.095367][T43784]  vfs_write+0xf0/0x230
> > [ 1079.099368][T43784]  ksys_write+0xd4/0x180
> > [ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
> > [ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
> > [ 1079.112672][T43784]  el0_svc+0x8/0xc
> 
> tty_port->lock --> zone->lock
> 
> For instance, I don't really like the re-entrant tty, at least not
> under uart_port->lock. This, maybe, can be one of the solutions.
> 
> Another one, a quick and dirty one, (and so many people will blame
> me for this) would be to break zone->{printk}->uart chain...
> 
> Something like this
> 
> ---
> 
>  drivers/char/random.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 9b54cdb301d3..975015857200 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
>  	print_once = true;
>  #endif
>  	if (__ratelimit(&unseeded_warning))
> -		pr_notice("random: %s called from %pS with crng_init=%d\n",
> -			  func_name, caller, crng_init);
> +		printk_deferred(KERN_NOTICE "random: %s called from %pS "
> +				"with crng_init=%d\n", func_name, caller,
> +				crng_init);
>  }
>  
>  /*
> @@ -2462,4 +2463,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size)
>  	else
>  		add_device_randomness(buf, size);
>  }
> -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> \ No newline at end of file
> +EXPORT_SYMBOL_GPL(add_bootloader_randomness);

This will also fix the hang.

Sergey, do you plan to submit this Ted?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-12 12:05       ` Qian Cai
@ 2019-09-16 14:42         ` Steven Rostedt
  2019-09-17  0:53           ` Sergey Senozhatsky
  2019-09-18 14:39           ` printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
  0 siblings, 2 replies; 21+ messages in thread
From: Steven Rostedt @ 2019-09-16 14:42 UTC (permalink / raw)
  To: Qian Cai
  Cc: Sergey Senozhatsky, Petr Mladek, Catalin Marinas, Will Deacon,
	Dan Williams, linux-mm, linux-kernel, linux-arm-kernel,
	Peter Zijlstra, Waiman Long, Thomas Gleixner, Theodore Ts'o,
	Arnd Bergmann, Greg Kroah-Hartman

On Thu, 12 Sep 2019 08:05:41 -0400
Qian Cai <cai@lca.pw> wrote:

> >  drivers/char/random.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > index 9b54cdb301d3..975015857200 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
> >  	print_once = true;
> >  #endif
> >  	if (__ratelimit(&unseeded_warning))
> > -		pr_notice("random: %s called from %pS with crng_init=%d\n",
> > -			  func_name, caller, crng_init);
> > +		printk_deferred(KERN_NOTICE "random: %s called from %pS "
> > +				"with crng_init=%d\n", func_name, caller,
> > +				crng_init);
> >  }
> >  
> >  /*
> > @@ -2462,4 +2463,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size)
> >  	else
> >  		add_device_randomness(buf, size);
> >  }
> > -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> > \ No newline at end of file
> > +EXPORT_SYMBOL_GPL(add_bootloader_randomness);  
> 
> This will also fix the hang.
> 
> Sergey, do you plan to submit this Ted?

Perhaps for a quick fix (and a comment that says this needs to be fixed
properly). I think the changes to printk() that was discussed at
Plumbers may also solve this properly.

-- Steve


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang
  2019-09-16 14:42         ` Steven Rostedt
@ 2019-09-17  0:53           ` Sergey Senozhatsky
  2019-09-18 14:39           ` printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
  1 sibling, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-17  0:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Qian Cai, Sergey Senozhatsky, Petr Mladek, Catalin Marinas,
	Will Deacon, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

On (09/16/19 10:42), Steven Rostedt wrote:
[..]
> > 
> > This will also fix the hang.
> > 
> > Sergey, do you plan to submit this Ted?
> 
> Perhaps for a quick fix (and a comment that says this needs to be fixed
> properly).

I guess it would make sense, since LTS and -stable kernels won't get new
printk().

	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

* printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-16 14:42         ` Steven Rostedt
  2019-09-17  0:53           ` Sergey Senozhatsky
@ 2019-09-18 14:39           ` Qian Cai
  2019-09-18 15:50             ` Sergey Senozhatsky
  1 sibling, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-09-18 14:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Catalin Marinas, Will Deacon,
	Dan Williams, linux-mm, linux-kernel, linux-arm-kernel,
	Peter Zijlstra, Waiman Long, Thomas Gleixner, Theodore Ts'o,
	Arnd Bergmann, Greg Kroah-Hartman

On Mon, 2019-09-16 at 10:42 -0400, Steven Rostedt wrote:
> On Thu, 12 Sep 2019 08:05:41 -0400
> Qian Cai <cai@lca.pw> wrote:
> 
> > >  drivers/char/random.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/char/random.c b/drivers/char/random.c
> > > index 9b54cdb301d3..975015857200 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1687,8 +1687,9 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
> > >  	print_once = true;
> > >  #endif
> > >  	if (__ratelimit(&unseeded_warning))
> > > -		pr_notice("random: %s called from %pS with crng_init=%d\n",
> > > -			  func_name, caller, crng_init);
> > > +		printk_deferred(KERN_NOTICE "random: %s called from %pS "
> > > +				"with crng_init=%d\n", func_name, caller,
> > > +				crng_init);
> > >  }
> > >  
> > >  /*
> > > @@ -2462,4 +2463,4 @@ void add_bootloader_randomness(const void *buf, unsigned int size)
> > >  	else
> > >  		add_device_randomness(buf, size);
> > >  }
> > > -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> > > \ No newline at end of file
> > > +EXPORT_SYMBOL_GPL(add_bootloader_randomness);  
> > 
> > This will also fix the hang.
> > 
> > Sergey, do you plan to submit this Ted?
> 
> Perhaps for a quick fix (and a comment that says this needs to be fixed
> properly). I think the changes to printk() that was discussed at
> Plumbers may also solve this properly.

I assume that the new printk() stuff will also fix this deadlock between
printk() and memory offline.

[  317.337595] WARNING: possible circular locking dependency detected
[  317.337596] 5.3.0-next-20190917+ #9 Not tainted
[  317.337597] ------------------------------------------------------
[  317.337597] test.sh/8738 is trying to acquire lock:
[  317.337598] ffffffffb33a4978 ((console_sem).lock){-.-.}, at:
down_trylock+0x16/0x50

[  317.337602] but task is already holding lock:
[  317.337602] ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:
start_isolate_page_range+0x1f7/0x570

[  317.337606] which lock already depends on the new lock.


[  317.337608] the existing dependency chain (in reverse order) is:

[  317.337609] -> #3 (&(&zone->lock)->rlock){-.-.}:
[  317.337612]        __lock_acquire+0x5b3/0xb40
[  317.337613]        lock_acquire+0x126/0x280
[  317.337613]        _raw_spin_lock+0x2f/0x40
[  317.337614]        rmqueue_bulk.constprop.21+0xb6/0x1160
[  317.337615]        get_page_from_freelist+0x898/0x22c0
[  317.337616]        __alloc_pages_nodemask+0x2f3/0x1cd0
[  317.337617]        alloc_page_interleave+0x18/0x130
[  317.337618]        alloc_pages_current+0xf6/0x110
[  317.337619]        allocate_slab+0x4c6/0x19c0
[  317.337620]        new_slab+0x46/0x70
[  317.337621]        ___slab_alloc+0x58b/0x960
[  317.337621]        __slab_alloc+0x43/0x70
[  317.337622]        kmem_cache_alloc+0x354/0x460
[  317.337623]        fill_pool+0x272/0x4b0
[  317.337624]        __debug_object_init+0x86/0x790
[  317.337624]        debug_object_init+0x16/0x20
[  317.337625]        hrtimer_init+0x27/0x1e0
[  317.337626]        init_dl_task_timer+0x20/0x40
[  317.337627]        __sched_fork+0x10b/0x1f0
[  317.337627]        init_idle+0xac/0x520
[  317.337628]        idle_thread_get+0x7c/0xc0
[  317.337629]        bringup_cpu+0x1a/0x1e0
[  317.337630]        cpuhp_invoke_callback+0x197/0x1120
[  317.337630]        _cpu_up+0x171/0x280
[  317.337631]        do_cpu_up+0xb1/0x120
[  317.337632]        cpu_up+0x13/0x20
[  317.337632]        smp_init+0xa4/0x12d
[  317.337633]        kernel_init_freeable+0x37e/0x76e
[  317.337634]        kernel_init+0x11/0x12f
[  317.337635]        ret_from_fork+0x3a/0x50

[  317.337635] -> #2 (&rq->lock){-.-.}:
[  317.337638]        __lock_acquire+0x5b3/0xb40
[  317.337639]        lock_acquire+0x126/0x280
[  317.337639]        _raw_spin_lock+0x2f/0x40
[  317.337640]        task_fork_fair+0x43/0x200
[  317.337641]        sched_fork+0x29b/0x420
[  317.337642]        copy_process+0xf3c/0x2fd0
[  317.337642]        _do_fork+0xef/0x950
[  317.337643]        kernel_thread+0xa8/0xe0
[  317.337644]        rest_init+0x28/0x311
[  317.337645]        arch_call_rest_init+0xe/0x1b
[  317.337645]        start_kernel+0x6eb/0x724
[  317.337646]        x86_64_start_reservations+0x24/0x26
[  317.337647]        x86_64_start_kernel+0xf4/0xfb
[  317.337648]        secondary_startup_64+0xb6/0xc0

[  317.337649] -> #1 (&p->pi_lock){-.-.}:
[  317.337651]        __lock_acquire+0x5b3/0xb40
[  317.337652]        lock_acquire+0x126/0x280
[  317.337653]        _raw_spin_lock_irqsave+0x3a/0x50
[  317.337653]        try_to_wake_up+0xb4/0x1030
[  317.337654]        wake_up_process+0x15/0x20
[  317.337655]        __up+0xaa/0xc0
[  317.337655]        up+0x55/0x60
[  317.337656]        __up_console_sem+0x37/0x60
[  317.337657]        console_unlock+0x3a0/0x750
[  317.337658]        vprintk_emit+0x10d/0x340
[  317.337658]        vprintk_default+0x1f/0x30
[  317.337659]        vprintk_func+0x44/0xd4
[  317.337660]        printk+0x9f/0xc5
[  317.337660]        crng_reseed+0x3cc/0x440
[  317.337661]        credit_entropy_bits+0x3e8/0x4f0
[  317.337662]        random_ioctl+0x1eb/0x250
[  317.337663]        do_vfs_ioctl+0x13e/0xa70
[  317.337663]        ksys_ioctl+0x41/0x80
[  317.337664]        __x64_sys_ioctl+0x43/0x4c
[  317.337665]        do_syscall_64+0xcc/0x76c
[  317.337666]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  317.337667] -> #0 ((console_sem).lock){-.-.}:
[  317.337669]        check_prev_add+0x107/0xea0
[  317.337670]        validate_chain+0x8fc/0x1200
[  317.337671]        __lock_acquire+0x5b3/0xb40
[  317.337671]        lock_acquire+0x126/0x280
[  317.337672]        _raw_spin_lock_irqsave+0x3a/0x50
[  317.337673]        down_trylock+0x16/0x50
[  317.337674]        __down_trylock_console_sem+0x2b/0xa0
[  317.337675]        console_trylock+0x16/0x60
[  317.337676]        vprintk_emit+0x100/0x340
[  317.337677]        vprintk_default+0x1f/0x30
[  317.337678]        vprintk_func+0x44/0xd4
[  317.337678]        printk+0x9f/0xc5
[  317.337679]        __dump_page.cold.2+0x73/0x210
[  317.337680]        dump_page+0x12/0x50
[  317.337680]        has_unmovable_pages+0x3e9/0x4b0
[  317.337681]        start_isolate_page_range+0x3b4/0x570
[  317.337682]        __offline_pages+0x1ad/0xa10
[  317.337683]        offline_pages+0x11/0x20
[  317.337683]        memory_subsys_offline+0x7e/0xc0
[  317.337684]        device_offline+0xd5/0x110
[  317.337685]        state_store+0xc6/0xe0
[  317.337686]        dev_attr_store+0x3f/0x60
[  317.337686]        sysfs_kf_write+0x89/0xb0
[  317.337687]        kernfs_fop_write+0x188/0x240
[  317.337688]        __vfs_write+0x50/0xa0
[  317.337688]        vfs_write+0x105/0x290
[  317.337689]        ksys_write+0xc6/0x160
[  317.337690]        __x64_sys_write+0x43/0x50
[  317.337691]        do_syscall_64+0xcc/0x76c
[  317.337691]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  317.337693] other info that might help us debug this:

[  317.337694] Chain exists of:
[  317.337694]   (console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock

[  317.337699]  Possible unsafe locking scenario:

[  317.337700]        CPU0                    CPU1
[  317.337701]        ----                    ----
[  317.337701]   lock(&(&zone->lock)->rlock);
[  317.337703]                                lock(&rq->lock);
[  317.337705]                                lock(&(&zone->lock)->rlock);
[  317.337706]   lock((console_sem).lock);

[  317.337708]  *** DEADLOCK ***

[  317.337710] 8 locks held by test.sh/8738:
[  317.337710]  #0: ffff8883940b5408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
[  317.337713]  #1: ffff889fce310280 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
[  317.337716]  #2: ffff889feb6d4830 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
[  317.337720]  #3: ffffffffb3762d40 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
[  317.337723]  #4: ffff88981f0dc990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
[  317.337726]  #5: ffffffffb3315250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
[  317.337729]  #6: ffffffffb35408b0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
[  317.337732]  #7: ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:
start_isolate_page_range+0x1f7/0x570
[  317.337736] stack backtrace:
[  317.337737] CPU: 58 PID: 8738 Comm: test.sh Not tainted 5.3.0-next-20190917+
#9
[  317.337738] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
[  317.337739] Call Trace:
[  317.337739]  dump_stack+0x86/0xca
[  317.337740]  print_circular_bug.cold.31+0x243/0x26e
[  317.337741]  check_noncircular+0x29e/0x2e0
[  317.337742]  ? debug_lockdep_rcu_enabled+0x4b/0x60
[  317.337742]  ? print_circular_bug+0x120/0x120
[  317.337743]  ? is_ftrace_trampoline+0x9/0x20
[  317.337744]  ? kernel_text_address+0x59/0xc0
[  317.337744]  ? __kernel_text_address+0x12/0x40
[  317.337745]  check_prev_add+0x107/0xea0
[  317.337746]  validate_chain+0x8fc/0x1200
[  317.337746]  ? check_prev_add+0xea0/0xea0
[  317.337747]  ? format_decode+0xd6/0x600
[  317.337748]  ? file_dentry_name+0xe0/0xe0
[  317.337749]  __lock_acquire+0x5b3/0xb40
[  317.337749]  lock_acquire+0x126/0x280
[  317.337750]  ? down_trylock+0x16/0x50
[  317.337751]  ? vprintk_emit+0x100/0x340
[  317.337752]  _raw_spin_lock_irqsave+0x3a/0x50
[  317.337753]  ? down_trylock+0x16/0x50
[  317.337753]  down_trylock+0x16/0x50
[  317.337754]  ? vprintk_emit+0x100/0x340
[  317.337755]  __down_trylock_console_sem+0x2b/0xa0
[  317.337756]  console_trylock+0x16/0x60
[  317.337756]  vprintk_emit+0x100/0x340
[  317.337757]  vprintk_default+0x1f/0x30
[  317.337758]  vprintk_func+0x44/0xd4
[  317.337758]  printk+0x9f/0xc5
[  317.337759]  ? kmsg_dump_rewind_nolock+0x64/0x64
[  317.337760]  ? __dump_page+0x1d7/0x430
[  317.337760]  __dump_page.cold.2+0x73/0x210
[  317.337761]  dump_page+0x12/0x50
[  317.337762]  has_unmovable_pages+0x3e9/0x4b0
[  317.337763]  start_isolate_page_range+0x3b4/0x570
[  317.337763]  ? unset_migratetype_isolate+0x280/0x280
[  317.337764]  ? rcu_read_lock_bh_held+0xc0/0xc0
[  317.337765]  __offline_pages+0x1ad/0xa10
[  317.337765]  ? lock_acquire+0x126/0x280
[  317.337766]  ? __add_memory+0xc0/0xc0
[  317.337767]  ? __kasan_check_write+0x14/0x20
[  317.337767]  ? __mutex_lock+0x344/0xcd0
[  317.337768]  ? _raw_spin_unlock_irqrestore+0x49/0x50
[  317.337769]  ? device_offline+0x70/0x110
[  317.337770]  ? klist_next+0x1c1/0x1e0
[  317.337770]  ? __mutex_add_waiter+0xc0/0xc0
[  317.337771]  ? klist_next+0x10b/0x1e0
[  317.337772]  ? klist_iter_exit+0x16/0x40
[  317.337772]  ? device_for_each_child+0xd0/0x110
[  317.337773]  offline_pages+0x11/0x20
[  317.337774]  memory_subsys_offline+0x7e/0xc0
[  317.337774]  device_offline+0xd5/0x110
[  317.337775]  ? auto_online_blocks_show+0x70/0x70
[  317.337776]  state_store+0xc6/0xe0
[  317.337776]  dev_attr_store+0x3f/0x60
[  317.337777]  ? device_match_name+0x40/0x40
[  317.337778]  sysfs_kf_write+0x89/0xb0
[  317.337778]  ? sysfs_file_ops+0xa0/0xa0
[  317.337779]  kernfs_fop_write+0x188/0x240
[  317.337780]  __vfs_write+0x50/0xa0
[  317.337780]  vfs_write+0x105/0x290
[  317.337781]  ksys_write+0xc6/0x160
[  317.337782]  ? __x64_sys_read+0x50/0x50
[  317.337782]  ? do_syscall_64+0x79/0x76c
[  317.337783]  ? do_syscall_64+0x79/0x76c
[  317.337784]  __x64_sys_write+0x43/0x50
[  317.337784]  do_syscall_64+0xcc/0x76c
[  317.337785]  ? trace_hardirqs_on_thunk+0x1a/0x20
[  317.337786]  ? syscall_return_slowpath+0x210/0x210
[  317.337787]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  317.337787]  ? trace_hardirqs_off_caller+0x3a/0x150
[  317.337788]  ? trace_hardirqs_off_thunk+0x1a/0x20
[  317.337789]  entry_SYSCALL_64_after_hwframe+0x49/0xbe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-18 14:39           ` printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
@ 2019-09-18 15:50             ` Sergey Senozhatsky
  2019-09-18 15:58               ` Sergey Senozhatsky
  2019-09-18 16:10               ` Qian Cai
  0 siblings, 2 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-18 15:50 UTC (permalink / raw)
  To: Qian Cai
  Cc: Steven Rostedt, Sergey Senozhatsky, Petr Mladek, Catalin Marinas,
	Will Deacon, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

On (09/18/19 10:39), Qian Cai wrote:
> > Perhaps for a quick fix (and a comment that says this needs to be fixed
> > properly). I think the changes to printk() that was discussed at
> > Plumbers may also solve this properly.
> 
> I assume that the new printk() stuff will also fix this deadlock between
> printk() and memory offline.

Mother chicken...

Do you actually see a deadlock? I'd rather expect a lockdep splat, but
anyway...

> [  317.337595] WARNING: possible circular locking dependency detected
> [  317.337596] 5.3.0-next-20190917+ #9 Not tainted
> [  317.337597] ------------------------------------------------------
> [  317.337597] test.sh/8738 is trying to acquire lock:
> [  317.337598] ffffffffb33a4978 ((console_sem).lock){-.-.}, at:> down_trylock+0x16/0x50
> 
> [  317.337602] but task is already holding lock:
> [  317.337602] ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:> start_isolate_page_range+0x1f7/0x570
> 
> [  317.337606] which lock already depends on the new lock.
>
> [  317.337608] the existing dependency chain (in reverse order) is:
> 
> [  317.337609] -> #3 (&(&zone->lock)->rlock){-.-.}:
> [  317.337612]        __lock_acquire+0x5b3/0xb40
> [  317.337613]        lock_acquire+0x126/0x280
> [  317.337613]        _raw_spin_lock+0x2f/0x40
> [  317.337614]        rmqueue_bulk.constprop.21+0xb6/0x1160
> [  317.337615]        get_page_from_freelist+0x898/0x22c0
> [  317.337616]        __alloc_pages_nodemask+0x2f3/0x1cd0
> [  317.337617]        alloc_page_interleave+0x18/0x130
> [  317.337618]        alloc_pages_current+0xf6/0x110
> [  317.337619]        allocate_slab+0x4c6/0x19c0
> [  317.337620]        new_slab+0x46/0x70
> [  317.337621]        ___slab_alloc+0x58b/0x960
> [  317.337621]        __slab_alloc+0x43/0x70
> [  317.337622]        kmem_cache_alloc+0x354/0x460
> [  317.337623]        fill_pool+0x272/0x4b0
> [  317.337624]        __debug_object_init+0x86/0x790
> [  317.337624]        debug_object_init+0x16/0x20
> [  317.337625]        hrtimer_init+0x27/0x1e0
> [  317.337626]        init_dl_task_timer+0x20/0x40
> [  317.337627]        __sched_fork+0x10b/0x1f0
> [  317.337627]        init_idle+0xac/0x520
> [  317.337628]        idle_thread_get+0x7c/0xc0
> [  317.337629]        bringup_cpu+0x1a/0x1e0
> [  317.337630]        cpuhp_invoke_callback+0x197/0x1120
> [  317.337630]        _cpu_up+0x171/0x280
> [  317.337631]        do_cpu_up+0xb1/0x120
> [  317.337632]        cpu_up+0x13/0x20
> [  317.337632]        smp_init+0xa4/0x12d
> [  317.337633]        kernel_init_freeable+0x37e/0x76e
> [  317.337634]        kernel_init+0x11/0x12f
> [  317.337635]        ret_from_fork+0x3a/0x50

So you have debug objects enabled. Right? This thing does not behave
when it comes to printing. debug_objects are slightly problematic.

This thing does

	rq->lock --> zone->lock

It takes rq->lock and then calls into __sched_fork()->hrtimer_init()->debug_objects()->MM

This doesn't look very right - a dive into MM under rq->lock.

Peter, Thomas am I wrong?

> [  317.337635] -> #2 (&rq->lock){-.-.}:
> [  317.337638]        __lock_acquire+0x5b3/0xb40
> [  317.337639]        lock_acquire+0x126/0x280
> [  317.337639]        _raw_spin_lock+0x2f/0x40
> [  317.337640]        task_fork_fair+0x43/0x200
> [  317.337641]        sched_fork+0x29b/0x420
> [  317.337642]        copy_process+0xf3c/0x2fd0
> [  317.337642]        _do_fork+0xef/0x950
> [  317.337643]        kernel_thread+0xa8/0xe0
> [  317.337644]        rest_init+0x28/0x311
> [  317.337645]        arch_call_rest_init+0xe/0x1b
> [  317.337645]        start_kernel+0x6eb/0x724
> [  317.337646]        x86_64_start_reservations+0x24/0x26
> [  317.337647]        x86_64_start_kernel+0xf4/0xfb
> [  317.337648]        secondary_startup_64+0xb6/0xc0

pi_lock --> rq->lock

> [  317.337649] -> #1 (&p->pi_lock){-.-.}:
> [  317.337651]        __lock_acquire+0x5b3/0xb40
> [  317.337652]        lock_acquire+0x126/0x280
> [  317.337653]        _raw_spin_lock_irqsave+0x3a/0x50
> [  317.337653]        try_to_wake_up+0xb4/0x1030
> [  317.337654]        wake_up_process+0x15/0x20
> [  317.337655]        __up+0xaa/0xc0
> [  317.337655]        up+0x55/0x60
> [  317.337656]        __up_console_sem+0x37/0x60
> [  317.337657]        console_unlock+0x3a0/0x750
> [  317.337658]        vprintk_emit+0x10d/0x340
> [  317.337658]        vprintk_default+0x1f/0x30
> [  317.337659]        vprintk_func+0x44/0xd4
> [  317.337660]        printk+0x9f/0xc5
> [  317.337660]        crng_reseed+0x3cc/0x440
> [  317.337661]        credit_entropy_bits+0x3e8/0x4f0
> [  317.337662]        random_ioctl+0x1eb/0x250
> [  317.337663]        do_vfs_ioctl+0x13e/0xa70
> [  317.337663]        ksys_ioctl+0x41/0x80
> [  317.337664]        __x64_sys_ioctl+0x43/0x4c
> [  317.337665]        do_syscall_64+0xcc/0x76c
> [  317.337666]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

console_sem->lock --> pi_lock

This also covers console_sem->lock --> rq->lock, and maintains
pi_lock --> rq->lock

So we have

	console_sem->lock --> pi_lock --> rq->lock

> [  317.337667] -> #0 ((console_sem).lock){-.-.}:
> [  317.337669]        check_prev_add+0x107/0xea0
> [  317.337670]        validate_chain+0x8fc/0x1200
> [  317.337671]        __lock_acquire+0x5b3/0xb40
> [  317.337671]        lock_acquire+0x126/0x280
> [  317.337672]        _raw_spin_lock_irqsave+0x3a/0x50
> [  317.337673]        down_trylock+0x16/0x50
> [  317.337674]        __down_trylock_console_sem+0x2b/0xa0
> [  317.337675]        console_trylock+0x16/0x60
> [  317.337676]        vprintk_emit+0x100/0x340
> [  317.337677]        vprintk_default+0x1f/0x30
> [  317.337678]        vprintk_func+0x44/0xd4
> [  317.337678]        printk+0x9f/0xc5
> [  317.337679]        __dump_page.cold.2+0x73/0x210
> [  317.337680]        dump_page+0x12/0x50
> [  317.337680]        has_unmovable_pages+0x3e9/0x4b0
> [  317.337681]        start_isolate_page_range+0x3b4/0x570
> [  317.337682]        __offline_pages+0x1ad/0xa10
> [  317.337683]        offline_pages+0x11/0x20
> [  317.337683]        memory_subsys_offline+0x7e/0xc0
> [  317.337684]        device_offline+0xd5/0x110
> [  317.337685]        state_store+0xc6/0xe0
> [  317.337686]        dev_attr_store+0x3f/0x60
> [  317.337686]        sysfs_kf_write+0x89/0xb0
> [  317.337687]        kernfs_fop_write+0x188/0x240
> [  317.337688]        __vfs_write+0x50/0xa0
> [  317.337688]        vfs_write+0x105/0x290
> [  317.337689]        ksys_write+0xc6/0x160
> [  317.337690]        __x64_sys_write+0x43/0x50
> [  317.337691]        do_syscall_64+0xcc/0x76c
> [  317.337691]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

zone->lock --> console_sem->lock

So then we have

	zone->lock --> console_sem->lock --> pi_lock --> rq->lock

  vs. the reverse chain

	rq->lock --> console_sem->lock

If I get this right.

> [  317.337693] other info that might help us debug this:
> 
> [  317.337694] Chain exists of:
> [  317.337694]   (console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock
> 
> [  317.337699]  Possible unsafe locking scenario:
> 
> [  317.337700]        CPU0                    CPU1
> [  317.337701]        ----                    ----
> [  317.337701]   lock(&(&zone->lock)->rlock);
> [  317.337703]                                lock(&rq->lock);
> [  317.337705]                                lock(&(&zone->lock)->rlock);
> [  317.337706]   lock((console_sem).lock);
>
> [  317.337708]  *** DEADLOCK ***
> 
> [  317.337710] 8 locks held by test.sh/8738:
> [  317.337710]  #0: ffff8883940b5408 (sb_writers#4){.+.+}, at: vfs_write+0x25f/0x290
> [  317.337713]  #1: ffff889fce310280 (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240
> [  317.337716]  #2: ffff889feb6d4830 (kn->count#115){.+.+}, at: kernfs_fop_write+0x138/0x240
> [  317.337720]  #3: ffffffffb3762d40 (device_hotplug_lock){+.+.}, at: lock_device_hotplug_sysfs+0x16/0x50
> [  317.337723]  #4: ffff88981f0dc990 (&dev->mutex){....}, at: device_offline+0x70/0x110
> [  317.337726]  #5: ffffffffb3315250 (cpu_hotplug_lock.rw_sem){++++}, at: __offline_pages+0xbf/0xa10
> [  317.337729]  #6: ffffffffb35408b0 (mem_hotplug_lock.rw_sem){++++}, at:  percpu_down_write+0x87/0x2f0
> [  317.337732]  #7: ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at: start_isolate_page_range+0x1f7/0x570
> [  317.337736] stack backtrace:
> [  317.337737] CPU: 58 PID: 8738 Comm: test.sh Not tainted 5.3.0-next-20190917+ #9
> [  317.337738] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 05/21/2019
> [  317.337739] Call Trace:
> [  317.337739]  dump_stack+0x86/0xca
> [  317.337740]  print_circular_bug.cold.31+0x243/0x26e
> [  317.337741]  check_noncircular+0x29e/0x2e0
> [  317.337742]  ? debug_lockdep_rcu_enabled+0x4b/0x60
> [  317.337742]  ? print_circular_bug+0x120/0x120
> [  317.337743]  ? is_ftrace_trampoline+0x9/0x20
> [  317.337744]  ? kernel_text_address+0x59/0xc0
> [  317.337744]  ? __kernel_text_address+0x12/0x40
> [  317.337745]  check_prev_add+0x107/0xea0
> [  317.337746]  validate_chain+0x8fc/0x1200
> [  317.337746]  ? check_prev_add+0xea0/0xea0
> [  317.337747]  ? format_decode+0xd6/0x600
> [  317.337748]  ? file_dentry_name+0xe0/0xe0
> [  317.337749]  __lock_acquire+0x5b3/0xb40
> [  317.337749]  lock_acquire+0x126/0x280
> [  317.337750]  ? down_trylock+0x16/0x50
> [  317.337751]  ? vprintk_emit+0x100/0x340
> [  317.337752]  _raw_spin_lock_irqsave+0x3a/0x50
> [  317.337753]  ? down_trylock+0x16/0x50
> [  317.337753]  down_trylock+0x16/0x50
> [  317.337754]  ? vprintk_emit+0x100/0x340
> [  317.337755]  __down_trylock_console_sem+0x2b/0xa0
> [  317.337756]  console_trylock+0x16/0x60
> [  317.337756]  vprintk_emit+0x100/0x340
> [  317.337757]  vprintk_default+0x1f/0x30
> [  317.337758]  vprintk_func+0x44/0xd4
> [  317.337758]  printk+0x9f/0xc5
> [  317.337759]  ? kmsg_dump_rewind_nolock+0x64/0x64
> [  317.337760]  ? __dump_page+0x1d7/0x430
> [  317.337760]  __dump_page.cold.2+0x73/0x210
> [  317.337761]  dump_page+0x12/0x50
> [  317.337762]  has_unmovable_pages+0x3e9/0x4b0
> [  317.337763]  start_isolate_page_range+0x3b4/0x570
> [  317.337763]  ? unset_migratetype_isolate+0x280/0x280
> [  317.337764]  ? rcu_read_lock_bh_held+0xc0/0xc0
> [  317.337765]  __offline_pages+0x1ad/0xa10
> [  317.337765]  ? lock_acquire+0x126/0x280
> [  317.337766]  ? __add_memory+0xc0/0xc0
> [  317.337767]  ? __kasan_check_write+0x14/0x20
> [  317.337767]  ? __mutex_lock+0x344/0xcd0
> [  317.337768]  ? _raw_spin_unlock_irqrestore+0x49/0x50
> [  317.337769]  ? device_offline+0x70/0x110
> [  317.337770]  ? klist_next+0x1c1/0x1e0
> [  317.337770]  ? __mutex_add_waiter+0xc0/0xc0
> [  317.337771]  ? klist_next+0x10b/0x1e0
> [  317.337772]  ? klist_iter_exit+0x16/0x40
> [  317.337772]  ? device_for_each_child+0xd0/0x110
> [  317.337773]  offline_pages+0x11/0x20
> [  317.337774]  memory_subsys_offline+0x7e/0xc0
> [  317.337774]  device_offline+0xd5/0x110
> [  317.337775]  ? auto_online_blocks_show+0x70/0x70
> [  317.337776]  state_store+0xc6/0xe0
> [  317.337776]  dev_attr_store+0x3f/0x60
> [  317.337777]  ? device_match_name+0x40/0x40
> [  317.337778]  sysfs_kf_write+0x89/0xb0
> [  317.337778]  ? sysfs_file_ops+0xa0/0xa0
> [  317.337779]  kernfs_fop_write+0x188/0x240
> [  317.337780]  __vfs_write+0x50/0xa0
> [  317.337780]  vfs_write+0x105/0x290
> [  317.337781]  ksys_write+0xc6/0x160
> [  317.337782]  ? __x64_sys_read+0x50/0x50
> [  317.337782]  ? do_syscall_64+0x79/0x76c
> [  317.337783]  ? do_syscall_64+0x79/0x76c
> [  317.337784]  __x64_sys_write+0x43/0x50
> [  317.337784]  do_syscall_64+0xcc/0x76c
> [  317.337785]  ? trace_hardirqs_on_thunk+0x1a/0x20
> [  317.337786]  ? syscall_return_slowpath+0x210/0x210
> [  317.337787]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
> [  317.337787]  ? trace_hardirqs_off_caller+0x3a/0x150
> [  317.337788]  ? trace_hardirqs_off_thunk+0x1a/0x20
> [  317.337789]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Lovely.

	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-18 15:50             ` Sergey Senozhatsky
@ 2019-09-18 15:58               ` Sergey Senozhatsky
  2019-09-18 16:10               ` Qian Cai
  1 sibling, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-18 15:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qian Cai, Steven Rostedt, Petr Mladek, Catalin Marinas,
	Will Deacon, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

A correction:

On (09/19/19 00:51), Sergey Senozhatsky wrote:
[..]
>
> zone->lock --> console_sem->lock
> 
> So then we have
> 
> 	zone->lock --> console_sem->lock --> pi_lock --> rq->lock
> 
>   vs. the reverse chain
> 
> 	rq->lock --> console_sem->lock

                     ^^^ zone->lock

	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-18 15:50             ` Sergey Senozhatsky
  2019-09-18 15:58               ` Sergey Senozhatsky
@ 2019-09-18 16:10               ` Qian Cai
  2019-09-23 10:21                 ` Sergey Senozhatsky
  1 sibling, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-09-18 16:10 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Petr Mladek, Catalin Marinas,
	Will Deacon, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

On Thu, 2019-09-19 at 00:50 +0900, Sergey Senozhatsky wrote:
> On (09/18/19 10:39), Qian Cai wrote:
> > > Perhaps for a quick fix (and a comment that says this needs to be fixed
> > > properly). I think the changes to printk() that was discussed at
> > > Plumbers may also solve this properly.
> > 
> > I assume that the new printk() stuff will also fix this deadlock between
> > printk() and memory offline.
> 
> Mother chicken...
> 
> Do you actually see a deadlock? I'd rather expect a lockdep splat, but
> anyway...

Not yet, just a lockdep splat so far.

> 
> > [  317.337595] WARNING: possible circular locking dependency detected
> > [  317.337596] 5.3.0-next-20190917+ #9 Not tainted
> > [  317.337597] ------------------------------------------------------
> > [  317.337597] test.sh/8738 is trying to acquire lock:
> > [  317.337598] ffffffffb33a4978 ((console_sem).lock){-.-.}, at:> down_trylock+0x16/0x50
> > 
> > [  317.337602] but task is already holding lock:
> > [  317.337602] ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:> start_isolate_page_range+0x1f7/0x570
> > 
> > [  317.337606] which lock already depends on the new lock.
> > 
> > [  317.337608] the existing dependency chain (in reverse order) is:
> > 
> > [  317.337609] -> #3 (&(&zone->lock)->rlock){-.-.}:
> > [  317.337612]        __lock_acquire+0x5b3/0xb40
> > [  317.337613]        lock_acquire+0x126/0x280
> > [  317.337613]        _raw_spin_lock+0x2f/0x40
> > [  317.337614]        rmqueue_bulk.constprop.21+0xb6/0x1160
> > [  317.337615]        get_page_from_freelist+0x898/0x22c0
> > [  317.337616]        __alloc_pages_nodemask+0x2f3/0x1cd0
> > [  317.337617]        alloc_page_interleave+0x18/0x130
> > [  317.337618]        alloc_pages_current+0xf6/0x110
> > [  317.337619]        allocate_slab+0x4c6/0x19c0
> > [  317.337620]        new_slab+0x46/0x70
> > [  317.337621]        ___slab_alloc+0x58b/0x960
> > [  317.337621]        __slab_alloc+0x43/0x70
> > [  317.337622]        kmem_cache_alloc+0x354/0x460
> > [  317.337623]        fill_pool+0x272/0x4b0
> > [  317.337624]        __debug_object_init+0x86/0x790
> > [  317.337624]        debug_object_init+0x16/0x20
> > [  317.337625]        hrtimer_init+0x27/0x1e0
> > [  317.337626]        init_dl_task_timer+0x20/0x40
> > [  317.337627]        __sched_fork+0x10b/0x1f0
> > [  317.337627]        init_idle+0xac/0x520
> > [  317.337628]        idle_thread_get+0x7c/0xc0
> > [  317.337629]        bringup_cpu+0x1a/0x1e0
> > [  317.337630]        cpuhp_invoke_callback+0x197/0x1120
> > [  317.337630]        _cpu_up+0x171/0x280
> > [  317.337631]        do_cpu_up+0xb1/0x120
> > [  317.337632]        cpu_up+0x13/0x20
> > [  317.337632]        smp_init+0xa4/0x12d
> > [  317.337633]        kernel_init_freeable+0x37e/0x76e
> > [  317.337634]        kernel_init+0x11/0x12f
> > [  317.337635]        ret_from_fork+0x3a/0x50
> 
> So you have debug objects enabled. Right? This thing does not behave
> when it comes to printing. debug_objects are slightly problematic.

Yes, but there is an also a similar splat without the debug_objects. It looks
like anything try to allocate memory in that path will trigger it anyway.

[  297.425908] WARNING: possible circular locking dependency detected
[  297.425908] 5.3.0-next-20190917 #8 Not tainted
[  297.425909] ------------------------------------------------------
[  297.425910] test.sh/8653 is trying to acquire lock:
[  297.425911] ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

[  297.425914] but task is already holding lock:
[  297.425915] ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0

[  297.425919] which lock already depends on the new lock.


[  297.425920] the existing dependency chain (in reverse order) is:

[  297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}:
[  297.425925]        __lock_acquire+0x5b3/0xb40
[  297.425925]        lock_acquire+0x126/0x280
[  297.425926]        _raw_spin_lock+0x2f/0x40
[  297.425927]        rmqueue_bulk.constprop.21+0xb6/0x1160
[  297.425928]        get_page_from_freelist+0x898/0x22c0
[  297.425928]        __alloc_pages_nodemask+0x2f3/0x1cd0
[  297.425929]        alloc_pages_current+0x9c/0x110
[  297.425930]        allocate_slab+0x4c6/0x19c0
[  297.425931]        new_slab+0x46/0x70
[  297.425931]        ___slab_alloc+0x58b/0x960
[  297.425932]        __slab_alloc+0x43/0x70
[  297.425933]        __kmalloc+0x3ad/0x4b0
[  297.425933]        __tty_buffer_request_room+0x100/0x250
[  297.425934]        tty_insert_flip_string_fixed_flag+0x67/0x110
[  297.425935]        pty_write+0xa2/0xf0
[  297.425936]        n_tty_write+0x36b/0x7b0
[  297.425936]        tty_write+0x284/0x4c0
[  297.425937]        __vfs_write+0x50/0xa0
[  297.425938]        vfs_write+0x105/0x290
[  297.425939]        redirected_tty_write+0x6a/0xc0
[  297.425939]        do_iter_write+0x248/0x2a0
[  297.425940]        vfs_writev+0x106/0x1e0
[  297.425941]        do_writev+0xd4/0x180
[  297.425941]        __x64_sys_writev+0x45/0x50
[  297.425942]        do_syscall_64+0xcc/0x76c
[  297.425943]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  297.425944] -> #2 (&(&port->lock)->rlock){-.-.}:
[  297.425946]        __lock_acquire+0x5b3/0xb40
[  297.425947]        lock_acquire+0x126/0x280
[  297.425948]        _raw_spin_lock_irqsave+0x3a/0x50
[  297.425949]        tty_port_tty_get+0x20/0x60
[  297.425949]        tty_port_default_wakeup+0xf/0x30
[  297.425950]        tty_port_tty_wakeup+0x39/0x40
[  297.425951]        uart_write_wakeup+0x2a/0x40
[  297.425952]        serial8250_tx_chars+0x22e/0x440
[  297.425952]        serial8250_handle_irq.part.8+0x14a/0x170
[  297.425953]        serial8250_default_handle_irq+0x5c/0x90
[  297.425954]        serial8250_interrupt+0xa6/0x130
[  297.425955]        __handle_irq_event_percpu+0x78/0x4f0
[  297.425955]        handle_irq_event_percpu+0x70/0x100
[  297.425956]        handle_irq_event+0x5a/0x8b
[  297.425957]        handle_edge_irq+0x117/0x370
[  297.425958]        do_IRQ+0x9e/0x1e0
[  297.425958]        ret_from_intr+0x0/0x2a
[  297.425959]        cpuidle_enter_state+0x156/0x8e0
[  297.425960]        cpuidle_enter+0x41/0x70
[  297.425960]        call_cpuidle+0x5e/0x90
[  297.425961]        do_idle+0x333/0x370
[  297.425962]        cpu_startup_entry+0x1d/0x1f
[  297.425962]        start_secondary+0x290/0x330
[  297.425963]        secondary_startup_64+0xb6/0xc0

[  297.425964] -> #1 (&port_lock_key){-.-.}:
[  297.425967]        __lock_acquire+0x5b3/0xb40
[  297.425967]        lock_acquire+0x126/0x280
[  297.425968]        _raw_spin_lock_irqsave+0x3a/0x50
[  297.425969]        serial8250_console_write+0x3e4/0x450
[  297.425970]        univ8250_console_write+0x4b/0x60
[  297.425970]        console_unlock+0x501/0x750
[  297.425971]        vprintk_emit+0x10d/0x340
[  297.425972]        vprintk_default+0x1f/0x30
[  297.425972]        vprintk_func+0x44/0xd4
[  297.425973]        printk+0x9f/0xc5
[  297.425974]        register_console+0x39c/0x520
[  297.425975]        univ8250_console_init+0x23/0x2d
[  297.425975]        console_init+0x338/0x4cd
[  297.425976]        start_kernel+0x534/0x724
[  297.425977]        x86_64_start_reservations+0x24/0x26
[  297.425977]        x86_64_start_kernel+0xf4/0xfb
[  297.425978]        secondary_startup_64+0xb6/0xc0

[  297.425979] -> #0 (console_owner){-.-.}:
[  297.425982]        check_prev_add+0x107/0xea0
[  297.425982]        validate_chain+0x8fc/0x1200
[  297.425983]        __lock_acquire+0x5b3/0xb40
[  297.425984]        lock_acquire+0x126/0x280
[  297.425984]        console_unlock+0x269/0x750
[  297.425985]        vprintk_emit+0x10d/0x340
[  297.425986]        vprintk_default+0x1f/0x30
[  297.425987]        vprintk_func+0x44/0xd4
[  297.425987]        printk+0x9f/0xc5
[  297.425988]        __offline_isolated_pages.cold.52+0x2f/0x30a
[  297.425989]        offline_isolated_pages_cb+0x17/0x30
[  297.425990]        walk_system_ram_range+0xda/0x160
[  297.425990]        __offline_pages+0x79c/0xa10
[  297.425991]        offline_pages+0x11/0x20
[  297.425992]        memory_subsys_offline+0x7e/0xc0
[  297.425992]        device_offline+0xd5/0x110
[  297.425993]        state_store+0xc6/0xe0
[  297.425994]        dev_attr_store+0x3f/0x60
[  297.425995]        sysfs_kf_write+0x89/0xb0
[  297.425995]        kernfs_fop_write+0x188/0x240
[  297.425996]        __vfs_write+0x50/0xa0
[  297.425997]        vfs_write+0x105/0x290
[  297.425997]        ksys_write+0xc6/0x160
[  297.425998]        __x64_sys_write+0x43/0x50
[  297.425999]        do_syscall_64+0xcc/0x76c
[  297.426000]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  297.426001] other info that might help us debug this:

[  297.426002] Chain exists of:
[  297.426002]   console_owner --> &(&port->lock)->rlock --> &(&zone->lock)-
>rlock

[  297.426007]  Possible unsafe locking scenario:

[  297.426008]        CPU0                    CPU1
[  297.426009]        ----                    ----
[  297.426009]   lock(&(&zone->lock)->rlock);
[  297.426011]                                lock(&(&port->lock)->rlock);
[  297.426013]                                lock(&(&zone->lock)->rlock);
[  297.426014]   lock(console_owner);

[  297.426016]  *** DEADLOCK ***

[  297.426017] 9 locks held by test.sh/8653:
[  297.426018]  #0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
[  297.426021]  #1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
[  297.426024]  #2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
[  297.426028]  #3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
[  297.426031]  #4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
[  297.426034]  #5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
[  297.426037]  #6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
[  297.426040]  #7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
[  297.426043]  #8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340

[  297.426047] stack backtrace:
[  297.426048] CPU: 1 PID: 8653 Comm: test.sh Not tainted 5.3.0-next-20190917 #8
[  297.426049] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
[  297.426049] Call Trace:
[  297.426050]  dump_stack+0x86/0xca
[  297.426051]  print_circular_bug.cold.31+0x243/0x26e
[  297.426051]  check_noncircular+0x29e/0x2e0
[  297.426052]  ? stack_trace_save+0x87/0xb0
[  297.426053]  ? print_circular_bug+0x120/0x120
[  297.426053]  check_prev_add+0x107/0xea0
[  297.426054]  validate_chain+0x8fc/0x1200
[  297.426055]  ? check_prev_add+0xea0/0xea0
[  297.426055]  __lock_acquire+0x5b3/0xb40
[  297.426056]  lock_acquire+0x126/0x280
[  297.426057]  ? console_unlock+0x207/0x750
[  297.426057]  ? __kasan_check_read+0x11/0x20
[  297.426058]  console_unlock+0x269/0x750
[  297.426059]  ? console_unlock+0x207/0x750
[  297.426059]  vprintk_emit+0x10d/0x340
[  297.426060]  vprintk_default+0x1f/0x30
[  297.426061]  vprintk_func+0x44/0xd4
[  297.426061]  ? do_raw_spin_lock+0x118/0x1d0
[  297.426062]  printk+0x9f/0xc5
[  297.426063]  ? kmsg_dump_rewind_nolock+0x64/0x64
[  297.426064]  ? __offline_isolated_pages+0x179/0x3e0
[  297.426064]  __offline_isolated_pages.cold.52+0x2f/0x30a
[  297.426065]  ? online_memory_block+0x20/0x20
[  297.426066]  offline_isolated_pages_cb+0x17/0x30
[  297.426067]  walk_system_ram_range+0xda/0x160
[  297.426067]  ? walk_mem_res+0x30/0x30
[  297.426068]  ? dissolve_free_huge_page+0x1e/0x2b0
[  297.426069]  __offline_pages+0x79c/0xa10
[  297.426069]  ? __add_memory+0xc0/0xc0
[  297.426070]  ? __kasan_check_write+0x14/0x20
[  297.426071]  ? __mutex_lock+0x344/0xcd0
[  297.426071]  ? _raw_spin_unlock_irqrestore+0x49/0x50
[  297.426072]  ? device_offline+0x70/0x110
[  297.426073]  ? klist_next+0x1c1/0x1e0
[  297.426073]  ? __mutex_add_waiter+0xc0/0xc0
[  297.426074]  ? klist_next+0x10b/0x1e0
[  297.426075]  ? klist_iter_exit+0x16/0x40
[  297.426076]  ? device_for_each_child+0xd0/0x110
[  297.426076]  offline_pages+0x11/0x20
[  297.426077]  memory_subsys_offline+0x7e/0xc0
[  297.426078]  device_offline+0xd5/0x110
[  297.426078]  ? auto_online_blocks_show+0x70/0x70
[  297.426079]  state_store+0xc6/0xe0
[  297.426080]  dev_attr_store+0x3f/0x60
[  297.426080]  ? device_match_name+0x40/0x40
[  297.426081]  sysfs_kf_write+0x89/0xb0
[  297.426082]  ? sysfs_file_ops+0xa0/0xa0
[  297.426082]  kernfs_fop_write+0x188/0x240
[  297.426083]  __vfs_write+0x50/0xa0
[  297.426084]  vfs_write+0x105/0x290
[  297.426084]  ksys_write+0xc6/0x160
[  297.426085]  ? __x64_sys_read+0x50/0x50
[  297.426086]  ? do_syscall_64+0x79/0x76c
[  297.426086]  ? do_syscall_64+0x79/0x76c
[  297.426087]  __x64_sys_write+0x43/0x50
[  297.426088]  do_syscall_64+0xcc/0x76c
[  297.426088]  ? trace_hardirqs_on_thunk+0x1a/0x20
[  297.426089]  ? syscall_return_slowpath+0x210/0x210
[  297.426090]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  297.426091]  ? trace_hardirqs_off_caller+0x3a/0x150
[  297.426092]  ? trace_hardirqs_off_thunk+0x1a/0x20
[  297.426092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  297.426093] RIP: 0033:0x7fd7336b4e18
[  297.426095] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f
1e fa 48 8d 05 05 59 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0
ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  297.426096] RSP: 002b:00007ffc58c7b258 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[  297.426098] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fd7336b4e18
[  297.426098] RDX: 0000000000000008 RSI: 000055ad6d519c70 RDI: 0000000000000001
[  297.426099] RBP: 000055ad6d519c70 R08: 000000000000000a R09: 00007fd733746300
[  297.426100] R10: 000000000000000a R11: 0000000000000246 R12: 00007fd733986780
[  297.426101] R13: 0000000000000008 R14: 00007fd733981740 R15: 0000000000000008


[  763.659202][    C6] WARNING: possible circular locking dependency detected
[  763.659202][    C6] 5.3.0-next-20190917 #3 Not tainted
[  763.659203][    C6] ------------------------------------------------------
[  763.659203][    C6] test.sh/8352 is trying to acquire lock:
[  763.659203][    C6] ffffffffa187e5f8 ((console_sem).lock){..-.}, at:
down_trylock+0x14/0x40
[  763.659206][    C6] 
[  763.659206][    C6] but task is already holding lock:
[  763.659206][    C6] ffff9bcf7f373c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x11e/0x2d0
[  763.659208][    C6] 
[  763.659208][    C6] which lock already depends on the new lock.
[  763.659209][    C6] 
[  763.659209][    C6] 
[  763.659209][    C6] the existing dependency chain (in reverse order) is:
[  763.659210][    C6] 
[  763.659210][    C6] -> #3 (&(&zone->lock)->rlock){-.-.}:
[  763.659211][    C6]        __lock_acquire+0x44e/0x8c0
[  763.659212][    C6]        lock_acquire+0xc0/0x1c0
[  763.659212][    C6]        _raw_spin_lock+0x2f/0x40
[  763.659212][    C6]        rmqueue_bulk.constprop.24+0x62/0xba0
[  763.659213][    C6]        get_page_from_freelist+0x581/0x1810
[  763.659213][    C6]        __alloc_pages_nodemask+0x20d/0x1750
[  763.659214][    C6]        alloc_page_interleave+0x17/0x100
[  763.659214][    C6]        alloc_pages_current+0xc0/0xe0
[  763.659214][    C6]        allocate_slab+0x4b2/0x1a20
[  763.659215][    C6]        new_slab+0x46/0x70
[  763.659215][    C6]        ___slab_alloc+0x58a/0x960
[  763.659215][    C6]        __slab_alloc+0x43/0x70
[  763.659216][    C6]        kmem_cache_alloc+0x33e/0x440
[  763.659216][    C6]        fill_pool+0x1ae/0x460
[  763.659216][    C6]        __debug_object_init+0x35/0x4a0
[  763.659217][    C6]        debug_object_init+0x16/0x20
[  763.659217][    C6]        hrtimer_init+0x25/0x130
[  763.659218][    C6]        init_dl_task_timer+0x20/0x30
[  763.659218][    C6]        __sched_fork+0x92/0x100
[  763.659218][    C6]        init_idle+0x8d/0x380
[  763.659219][    C6]        fork_idle+0xd9/0x140
[  763.659219][    C6]        idle_threads_init+0xd3/0x15e
[  763.659219][    C6]        smp_init+0x1b/0xbb
[  763.659220][    C6]        kernel_init_freeable+0x248/0x557
[  763.659220][    C6]        kernel_init+0xf/0x11e
[  763.659220][    C6]        ret_from_fork+0x27/0x50
[  763.659221][    C6] 
[  763.659221][    C6] -> #2 (&rq->lock){-.-.}:
[  763.659222][    C6]        __lock_acquire+0x44e/0x8c0
[  763.659223][    C6]        lock_acquire+0xc0/0x1c0
[  763.659223][    C6]        _raw_spin_lock+0x2f/0x40
[  763.659223][    C6]        task_fork_fair+0x37/0x150
[  763.659224][    C6]        sched_fork+0x126/0x230
[  763.659224][    C6]        copy_process+0xafc/0x1e90
[  763.659224][    C6]        _do_fork+0x89/0x720
[  763.659225][    C6]        kernel_thread+0x58/0x70
[  763.659225][    C6]        rest_init+0x28/0x302
[  763.659225][    C6]        arch_call_rest_init+0xe/0x1b
[  763.659226][    C6]        start_kernel+0x581/0x5a0
[  763.659226][    C6]        x86_64_start_reservations+0x24/0x26
[  763.659227][    C6]        x86_64_start_kernel+0xef/0xf6
[  763.659227][    C6]        secondary_startup_64+0xb6/0xc0
[  763.659227][    C6] 
[  763.659227][    C6] -> #1 (&p->pi_lock){-.-.}:
[  763.659229][    C6]        __lock_acquire+0x44e/0x8c0
[  763.659229][    C6]        lock_acquire+0xc0/0x1c0
[  763.659230][    C6]        _raw_spin_lock_irqsave+0x3a/0x50
[  763.659230][    C6]        try_to_wake_up+0x5c/0xbc0
[  763.659230][    C6]        wake_up_process+0x15/0x20
[  763.659231][    C6]        __up+0x4a/0x50
[  763.659231][    C6]        up+0x45/0x50
[  763.659231][    C6]        __up_console_sem+0x37/0x60
[  763.659232][    C6]        console_unlock+0x357/0x600
[  763.659232][    C6]        vprintk_emit+0x101/0x320
[  763.659232][    C6]        vprintk_default+0x1f/0x30
[  763.659233][    C6]        vprintk_func+0x44/0xd4
[  763.659233][    C6]        printk+0x58/0x6f
[  763.659234][    C6]        do_exit+0xd73/0xd80
[  763.659234][    C6]        do_group_exit+0x41/0xd0
[  763.659234][    C6]        __x64_sys_exit_group+0x18/0x20
[  763.659235][    C6]        do_syscall_64+0x6d/0x488
[  763.659235][    C6]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  763.659235][    C6] 
[  763.659236][    C6] -> #0 ((console_sem).lock){..-.}:
[  763.659237][    C6]        check_prev_add+0x9b/0xa10
[  763.659237][    C6]        validate_chain+0x759/0xdc0
[  763.659238][    C6]        __lock_acquire+0x44e/0x8c0
[  763.659238][    C6]        lock_acquire+0xc0/0x1c0
[  763.659239][    C6]        _raw_spin_lock_irqsave+0x3a/0x50
[  763.659239][    C6]        down_trylock+0x14/0x40
[  763.659239][    C6]        __down_trylock_console_sem+0x2b/0xa0
[  763.659240][    C6]        console_trylock+0x16/0x60
[  763.659240][    C6]        vprintk_emit+0xf4/0x320
[  763.659240][    C6]        vprintk_default+0x1f/0x30
[  763.659241][    C6]        vprintk_func+0x44/0xd4
[  763.659241][    C6]        printk+0x58/0x6f
[  763.659242][    C6]        __offline_isolated_pages.cold.55+0x38/0x28e
[  763.659242][    C6]        offline_isolated_pages_cb+0x15/0x20
[  763.659242][    C6]        walk_system_ram_range+0x7b/0xd0
[  763.659243][    C6]        __offline_pages+0x456/0xc10
[  763.659243][    C6]        offline_pages+0x11/0x20
[  763.659243][    C6]        memory_subsys_offline+0x44/0x60
[  763.659244][    C6]        device_offline+0x90/0xc0
[  763.659244][    C6]        state_store+0xbc/0xe0
[  763.659244][    C6]        dev_attr_store+0x17/0x30
[  763.659245][    C6]        sysfs_kf_write+0x4b/0x60
[  763.659245][    C6]        kernfs_fop_write+0x119/0x1c0
[  763.659245][    C6]        __vfs_write+0x1b/0x40
[  763.659246][    C6]        vfs_write+0xbd/0x1c0
[  763.659246][    C6]        ksys_write+0x64/0xe0
[  763.659247][    C6]        __x64_sys_write+0x1a/0x20
[  763.659247][    C6]        do_syscall_64+0x6d/0x488
[  763.659247][    C6]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  763.659248][    C6] 
[  763.659248][    C6] other info that might help us debug this:
[  763.659248][    C6] 
[  763.659248][    C6] Chain exists of:
[  763.659249][    C6]   (console_sem).lock --> &rq->lock --> &(&zone->lock)-
>rlock
[  763.659251][    C6] 
[  763.659251][    C6]  Possible unsafe locking scenario:
[  763.659251][    C6] 
[  763.659252][    C6]        CPU0                    CPU1
[  763.659252][    C6]        ----                    ----
[  763.659252][    C6]   lock(&(&zone->lock)->rlock);
[  763.659253][    C6]                                lock(&rq->lock);
[  763.659254][    C6]                                lock(&(&zone->lock)-
>rlock);
[  763.659255][    C6]   lock((console_sem).lock);
[  763.659256][    C6] 
[  763.659256][    C6]  *** DEADLOCK ***
[  763.659256][    C6] 
[  763.659257][    C6] 8 locks held by test.sh/8352:
[  763.659257][    C6]  #0: ffff9bdf4da39408 (sb_writers#4){.+.+}, at:
vfs_write+0x174/0x1c0
[  763.659259][    C6]  #1: ffff9be348280880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0xe4/0x1c0
[  763.659260][    C6]  #2: ffff9bdb873757d0 (kn->count#125){.+.+}, at:
kernfs_fop_write+0xed/0x1c0
[  763.659262][    C6]  #3: ffffffffa194dec0 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x15/0x40
[  763.659264][    C6]  #4: ffff9bcf7314c990 (&dev->mutex){....}, at:
device_offline+0x4e/0xc0
[  763.659265][    C6]  #5: ffffffffa185b9f0 (cpu_hotplug_lock.rw_sem){++++},
at: __offline_pages+0x3b/0xc10
[  763.659267][    C6]  #6: ffffffffa18e0b90 (mem_hotplug_lock.rw_sem){++++},
at: percpu_down_write+0x36/0x1c0
[  763.659268][    C6]  #7: ffff9bcf7f373c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x11e/0x2d0
[  763.659270][    C6] 
[  763.659270][    C6] stack backtrace:
[  763.659271][    C6] CPU: 6 PID: 8352 Comm: test.sh Not tainted 5.3.0-next-
20190917 #3
[  763.659271][    C6] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[  763.659272][    C6] Call Trace:
[  763.659272][    C6]  dump_stack+0x70/0x9a
[  763.659272][    C6]  print_circular_bug.cold.31+0x1c0/0x1eb
[  763.659273][    C6]  check_noncircular+0x18c/0x1a0
[  763.659273][    C6]  check_prev_add+0x9b/0xa10
[  763.659273][    C6]  validate_chain+0x759/0xdc0
[  763.659274][    C6]  __lock_acquire+0x44e/0x8c0
[  763.659274][    C6]  lock_acquire+0xc0/0x1c0
[  763.659274][    C6]  ? down_trylock+0x14/0x40
[  763.659275][    C6]  ? vprintk_emit+0xf4/0x320
[  763.659275][    C6]  _raw_spin_lock_irqsave+0x3a/0x50
[  763.659275][    C6]  ? down_trylock+0x14/0x40
[  763.659276][    C6]  down_trylock+0x14/0x40
[  763.659276][    C6]  __down_trylock_console_sem+0x2b/0xa0
[  763.659276][    C6]  console_trylock+0x16/0x60
[  763.659277][    C6]  vprintk_emit+0xf4/0x320
[  763.659277][    C6]  vprintk_default+0x1f/0x30
[  763.659277][    C6]  vprintk_func+0x44/0xd4
[  763.659278][    C6]  printk+0x58/0x6f
[  763.659278][    C6]  __offline_isolated_pages.cold.55+0x38/0x28e
[  763.659278][    C6]  ? online_memory_block+0x20/0x20
[  763.659279][    C6]  offline_isolated_pages_cb+0x15/0x20
[  763.659279][    C6]  walk_system_ram_range+0x7b/0xd0
[  763.659279][    C6]  __offline_pages+0x456/0xc10
[  763.659280][    C6]  offline_pages+0x11/0x20
[  763.659280][    C6]  memory_subsys_offline+0x44/0x60
[  763.659280][    C6]  device_offline+0x90/0xc0
[  763.659281][    C6]  state_store+0xbc/0xe0
[  763.659281][    C6]  dev_attr_store+0x17/0x30
[  763.659281][    C6]  sysfs_kf_write+0x4b/0x60
[  763.659282][    C6]  kernfs_fop_write+0x119/0x1c0
[  763.659282][    C6]  __vfs_write+0x1b/0x40
[  763.659282][    C6]  vfs_write+0xbd/0x1c0
[  763.659283][    C6]  ksys_write+0x64/0xe0
[  763.659283][    C6]  __x64_sys_write+0x1a/0x20
[  763.659283][    C6]  do_syscall_64+0x6d/0x488
[  763.659284][    C6]  ? trace_hardirqs_off_thunk+0x1a/0x20
[  763.659284][    C6]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

> 
> This thing does
> 
> 	rq->lock --> zone->lock
> 
> It takes rq->lock and then calls into __sched_fork()->hrtimer_init()->debug_objects()->MM
> 
> This doesn't look very right - a dive into MM under rq->lock.
> 
> Peter, Thomas am I wrong?
> 
> > [  317.337635] -> #2 (&rq->lock){-.-.}:
> > [  317.337638]        __lock_acquire+0x5b3/0xb40
> > [  317.337639]        lock_acquire+0x126/0x280
> > [  317.337639]        _raw_spin_lock+0x2f/0x40
> > [  317.337640]        task_fork_fair+0x43/0x200
> > [  317.337641]        sched_fork+0x29b/0x420
> > [  317.337642]        copy_process+0xf3c/0x2fd0
> > [  317.337642]        _do_fork+0xef/0x950
> > [  317.337643]        kernel_thread+0xa8/0xe0
> > [  317.337644]        rest_init+0x28/0x311
> > [  317.337645]        arch_call_rest_init+0xe/0x1b
> > [  317.337645]        start_kernel+0x6eb/0x724
> > [  317.337646]        x86_64_start_reservations+0x24/0x26
> > [  317.337647]        x86_64_start_kernel+0xf4/0xfb
> > [  317.337648]        secondary_startup_64+0xb6/0xc0
> 
> pi_lock --> rq->lock
> 
> > [  317.337649] -> #1 (&p->pi_lock){-.-.}:
> > [  317.337651]        __lock_acquire+0x5b3/0xb40
> > [  317.337652]        lock_acquire+0x126/0x280
> > [  317.337653]        _raw_spin_lock_irqsave+0x3a/0x50
> > [  317.337653]        try_to_wake_up+0xb4/0x1030
> > [  317.337654]        wake_up_process+0x15/0x20
> > [  317.337655]        __up+0xaa/0xc0
> > [  317.337655]        up+0x55/0x60
> > [  317.337656]        __up_console_sem+0x37/0x60
> > [  317.337657]        console_unlock+0x3a0/0x750
> > [  317.337658]        vprintk_emit+0x10d/0x340
> > [  317.337658]        vprintk_default+0x1f/0x30
> > [  317.337659]        vprintk_func+0x44/0xd4
> > [  317.337660]        printk+0x9f/0xc5
> > [  317.337660]        crng_reseed+0x3cc/0x440
> > [  317.337661]        credit_entropy_bits+0x3e8/0x4f0
> > [  317.337662]        random_ioctl+0x1eb/0x250
> > [  317.337663]        do_vfs_ioctl+0x13e/0xa70
> > [  317.337663]        ksys_ioctl+0x41/0x80
> > [  317.337664]        __x64_sys_ioctl+0x43/0x4c
> > [  317.337665]        do_syscall_64+0xcc/0x76c
> > [  317.337666]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> console_sem->lock --> pi_lock
> 
> This also covers console_sem->lock --> rq->lock, and maintains
> pi_lock --> rq->lock
> 
> So we have
> 
> 	console_sem->lock --> pi_lock --> rq->lock
> 
> > [  317.337667] -> #0 ((console_sem).lock){-.-.}:
> > [  317.337669]        check_prev_add+0x107/0xea0
> > [  317.337670]        validate_chain+0x8fc/0x1200
> > [  317.337671]        __lock_acquire+0x5b3/0xb40
> > [  317.337671]        lock_acquire+0x126/0x280
> > [  317.337672]        _raw_spin_lock_irqsave+0x3a/0x50
> > [  317.337673]        down_trylock+0x16/0x50
> > [  317.337674]        __down_trylock_console_sem+0x2b/0xa0
> > [  317.337675]        console_trylock+0x16/0x60
> > [  317.337676]        vprintk_emit+0x100/0x340
> > [  317.337677]        vprintk_default+0x1f/0x30
> > [  317.337678]        vprintk_func+0x44/0xd4
> > [  317.337678]        printk+0x9f/0xc5
> > [  317.337679]        __dump_page.cold.2+0x73/0x210
> > [  317.337680]        dump_page+0x12/0x50
> > [  317.337680]        has_unmovable_pages+0x3e9/0x4b0
> > [  317.337681]        start_isolate_page_range+0x3b4/0x570
> > [  317.337682]        __offline_pages+0x1ad/0xa10
> > [  317.337683]        offline_pages+0x11/0x20
> > [  317.337683]        memory_subsys_offline+0x7e/0xc0
> > [  317.337684]        device_offline+0xd5/0x110
> > [  317.337685]        state_store+0xc6/0xe0
> > [  317.337686]        dev_attr_store+0x3f/0x60
> > [  317.337686]        sysfs_kf_write+0x89/0xb0
> > [  317.337687]        kernfs_fop_write+0x188/0x240
> > [  317.337688]        __vfs_write+0x50/0xa0
> > [  317.337688]        vfs_write+0x105/0x290
> > [  317.337689]        ksys_write+0xc6/0x160
> > [  317.337690]        __x64_sys_write+0x43/0x50
> > [  317.337691]        do_syscall_64+0xcc/0x76c
> > [  317.337691]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> zone->lock --> console_sem->lock
> 
> So then we have
> 
> 	zone->lock --> console_sem->lock --> pi_lock --> rq->lock
> 
>   vs. the reverse chain
> 
> 	rq->lock --> console_sem->lock
> 
> If I get this right.
> 
> > [  317.337693] other info that might help us debug this:
> > 
> > [  317.337694] Chain exists of:
> > [  317.337694]   (console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock
> > 
> > [  317.337699]  Possible unsafe locking scenario:
> > 
> > [  317.337700]        CPU0                    CPU1
> > [  317.337701]        ----                    ----
> > [  317.337701]   lock(&(&zone->lock)->rlock);
> > [  317.337703]                                lock(&rq->lock);
> > [  317.337705]                                lock(&(&zone->lock)->rlock);
> > [  317.337706]   lock((console_sem).lock);
> > 
> > [  317.337708]  *** DEADLOCK ***
> > 
> > [  317.337710] 8 locks held by test.sh/8738:
> > [  317.337710]  #0: ffff8883940b5408 (sb_writers#4){.+.+}, at: vfs_write+0x25f/0x290
> > [  317.337713]  #1: ffff889fce310280 (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240
> > [  317.337716]  #2: ffff889feb6d4830 (kn->count#115){.+.+}, at: kernfs_fop_write+0x138/0x240
> > [  317.337720]  #3: ffffffffb3762d40 (device_hotplug_lock){+.+.}, at: lock_device_hotplug_sysfs+0x16/0x50
> > [  317.337723]  #4: ffff88981f0dc990 (&dev->mutex){....}, at: device_offline+0x70/0x110
> > [  317.337726]  #5: ffffffffb3315250 (cpu_hotplug_lock.rw_sem){++++}, at: __offline_pages+0xbf/0xa10
> > [  317.337729]  #6: ffffffffb35408b0 (mem_hotplug_lock.rw_sem){++++}, at:  percpu_down_write+0x87/0x2f0
> > [  317.337732]  #7: ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at: start_isolate_page_range+0x1f7/0x570
> > [  317.337736] stack backtrace:
> > [  317.337737] CPU: 58 PID: 8738 Comm: test.sh Not tainted 5.3.0-next-20190917+ #9
> > [  317.337738] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 05/21/2019
> > [  317.337739] Call Trace:
> > [  317.337739]  dump_stack+0x86/0xca
> > [  317.337740]  print_circular_bug.cold.31+0x243/0x26e
> > [  317.337741]  check_noncircular+0x29e/0x2e0
> > [  317.337742]  ? debug_lockdep_rcu_enabled+0x4b/0x60
> > [  317.337742]  ? print_circular_bug+0x120/0x120
> > [  317.337743]  ? is_ftrace_trampoline+0x9/0x20
> > [  317.337744]  ? kernel_text_address+0x59/0xc0
> > [  317.337744]  ? __kernel_text_address+0x12/0x40
> > [  317.337745]  check_prev_add+0x107/0xea0
> > [  317.337746]  validate_chain+0x8fc/0x1200
> > [  317.337746]  ? check_prev_add+0xea0/0xea0
> > [  317.337747]  ? format_decode+0xd6/0x600
> > [  317.337748]  ? file_dentry_name+0xe0/0xe0
> > [  317.337749]  __lock_acquire+0x5b3/0xb40
> > [  317.337749]  lock_acquire+0x126/0x280
> > [  317.337750]  ? down_trylock+0x16/0x50
> > [  317.337751]  ? vprintk_emit+0x100/0x340
> > [  317.337752]  _raw_spin_lock_irqsave+0x3a/0x50
> > [  317.337753]  ? down_trylock+0x16/0x50
> > [  317.337753]  down_trylock+0x16/0x50
> > [  317.337754]  ? vprintk_emit+0x100/0x340
> > [  317.337755]  __down_trylock_console_sem+0x2b/0xa0
> > [  317.337756]  console_trylock+0x16/0x60
> > [  317.337756]  vprintk_emit+0x100/0x340
> > [  317.337757]  vprintk_default+0x1f/0x30
> > [  317.337758]  vprintk_func+0x44/0xd4
> > [  317.337758]  printk+0x9f/0xc5
> > [  317.337759]  ? kmsg_dump_rewind_nolock+0x64/0x64
> > [  317.337760]  ? __dump_page+0x1d7/0x430
> > [  317.337760]  __dump_page.cold.2+0x73/0x210
> > [  317.337761]  dump_page+0x12/0x50
> > [  317.337762]  has_unmovable_pages+0x3e9/0x4b0
> > [  317.337763]  start_isolate_page_range+0x3b4/0x570
> > [  317.337763]  ? unset_migratetype_isolate+0x280/0x280
> > [  317.337764]  ? rcu_read_lock_bh_held+0xc0/0xc0
> > [  317.337765]  __offline_pages+0x1ad/0xa10
> > [  317.337765]  ? lock_acquire+0x126/0x280
> > [  317.337766]  ? __add_memory+0xc0/0xc0
> > [  317.337767]  ? __kasan_check_write+0x14/0x20
> > [  317.337767]  ? __mutex_lock+0x344/0xcd0
> > [  317.337768]  ? _raw_spin_unlock_irqrestore+0x49/0x50
> > [  317.337769]  ? device_offline+0x70/0x110
> > [  317.337770]  ? klist_next+0x1c1/0x1e0
> > [  317.337770]  ? __mutex_add_waiter+0xc0/0xc0
> > [  317.337771]  ? klist_next+0x10b/0x1e0
> > [  317.337772]  ? klist_iter_exit+0x16/0x40
> > [  317.337772]  ? device_for_each_child+0xd0/0x110
> > [  317.337773]  offline_pages+0x11/0x20
> > [  317.337774]  memory_subsys_offline+0x7e/0xc0
> > [  317.337774]  device_offline+0xd5/0x110
> > [  317.337775]  ? auto_online_blocks_show+0x70/0x70
> > [  317.337776]  state_store+0xc6/0xe0
> > [  317.337776]  dev_attr_store+0x3f/0x60
> > [  317.337777]  ? device_match_name+0x40/0x40
> > [  317.337778]  sysfs_kf_write+0x89/0xb0
> > [  317.337778]  ? sysfs_file_ops+0xa0/0xa0
> > [  317.337779]  kernfs_fop_write+0x188/0x240
> > [  317.337780]  __vfs_write+0x50/0xa0
> > [  317.337780]  vfs_write+0x105/0x290
> > [  317.337781]  ksys_write+0xc6/0x160
> > [  317.337782]  ? __x64_sys_read+0x50/0x50
> > [  317.337782]  ? do_syscall_64+0x79/0x76c
> > [  317.337783]  ? do_syscall_64+0x79/0x76c
> > [  317.337784]  __x64_sys_write+0x43/0x50
> > [  317.337784]  do_syscall_64+0xcc/0x76c
> > [  317.337785]  ? trace_hardirqs_on_thunk+0x1a/0x20
> > [  317.337786]  ? syscall_return_slowpath+0x210/0x210
> > [  317.337787]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
> > [  317.337787]  ? trace_hardirqs_off_caller+0x3a/0x150
> > [  317.337788]  ? trace_hardirqs_off_thunk+0x1a/0x20
> > [  317.337789]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Lovely.
> 
> 	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-18 16:10               ` Qian Cai
@ 2019-09-23 10:21                 ` Sergey Senozhatsky
  2019-09-23 12:58                   ` Petr Mladek
  2019-09-23 13:09                   ` Qian Cai
  0 siblings, 2 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-23 10:21 UTC (permalink / raw)
  To: Qian Cai
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	Petr Mladek, Catalin Marinas, Will Deacon, Dan Williams,
	linux-mm, linux-kernel, linux-arm-kernel, Peter Zijlstra,
	Waiman Long, Thomas Gleixner, Theodore Ts'o, Arnd Bergmann,
	Greg Kroah-Hartman

On (09/18/19 12:10), Qian Cai wrote:
[..]
> > So you have debug objects enabled. Right? This thing does not behave
> > when it comes to printing. debug_objects are slightly problematic.
> 
> Yes, but there is an also a similar splat without the debug_objects. It looks
> like anything try to allocate memory in that path will trigger it anyway.

Appears to be different, yet somehow very familiar.

> [  297.425908] WARNING: possible circular locking dependency detected
> [  297.425908] 5.3.0-next-20190917 #8 Not tainted
> [  297.425909] ------------------------------------------------------
> [  297.425910] test.sh/8653 is trying to acquire lock:
> [  297.425911] ffffffff865a4460 (console_owner){-.-.}, at:
> console_unlock+0x207/0x750
> 
> [  297.425914] but task is already holding lock:
> [  297.425915] ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
> __offline_isolated_pages+0x179/0x3e0
> 
> [  297.425919] which lock already depends on the new lock.
> 
> 
> [  297.425920] the existing dependency chain (in reverse order) is:
> 
> [  297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}:
> [  297.425925]        __lock_acquire+0x5b3/0xb40
> [  297.425925]        lock_acquire+0x126/0x280
> [  297.425926]        _raw_spin_lock+0x2f/0x40
> [  297.425927]        rmqueue_bulk.constprop.21+0xb6/0x1160
> [  297.425928]        get_page_from_freelist+0x898/0x22c0
> [  297.425928]        __alloc_pages_nodemask+0x2f3/0x1cd0
> [  297.425929]        alloc_pages_current+0x9c/0x110
> [  297.425930]        allocate_slab+0x4c6/0x19c0
> [  297.425931]        new_slab+0x46/0x70
> [  297.425931]        ___slab_alloc+0x58b/0x960
> [  297.425932]        __slab_alloc+0x43/0x70
> [  297.425933]        __kmalloc+0x3ad/0x4b0
> [  297.425933]        __tty_buffer_request_room+0x100/0x250
> [  297.425934]        tty_insert_flip_string_fixed_flag+0x67/0x110
> [  297.425935]        pty_write+0xa2/0xf0
> [  297.425936]        n_tty_write+0x36b/0x7b0
> [  297.425936]        tty_write+0x284/0x4c0
> [  297.425937]        __vfs_write+0x50/0xa0
> [  297.425938]        vfs_write+0x105/0x290
> [  297.425939]        redirected_tty_write+0x6a/0xc0
> [  297.425939]        do_iter_write+0x248/0x2a0
> [  297.425940]        vfs_writev+0x106/0x1e0
> [  297.425941]        do_writev+0xd4/0x180
> [  297.425941]        __x64_sys_writev+0x45/0x50
> [  297.425942]        do_syscall_64+0xcc/0x76c
> [  297.425943]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [  297.425944] -> #2 (&(&port->lock)->rlock){-.-.}:
> [  297.425946]        __lock_acquire+0x5b3/0xb40
> [  297.425947]        lock_acquire+0x126/0x280
> [  297.425948]        _raw_spin_lock_irqsave+0x3a/0x50
> [  297.425949]        tty_port_tty_get+0x20/0x60
> [  297.425949]        tty_port_default_wakeup+0xf/0x30
> [  297.425950]        tty_port_tty_wakeup+0x39/0x40
> [  297.425951]        uart_write_wakeup+0x2a/0x40
> [  297.425952]        serial8250_tx_chars+0x22e/0x440
> [  297.425952]        serial8250_handle_irq.part.8+0x14a/0x170
> [  297.425953]        serial8250_default_handle_irq+0x5c/0x90
> [  297.425954]        serial8250_interrupt+0xa6/0x130
> [  297.425955]        __handle_irq_event_percpu+0x78/0x4f0
> [  297.425955]        handle_irq_event_percpu+0x70/0x100
> [  297.425956]        handle_irq_event+0x5a/0x8b
> [  297.425957]        handle_edge_irq+0x117/0x370
> [  297.425958]        do_IRQ+0x9e/0x1e0
> [  297.425958]        ret_from_intr+0x0/0x2a
> [  297.425959]        cpuidle_enter_state+0x156/0x8e0
> [  297.425960]        cpuidle_enter+0x41/0x70
> [  297.425960]        call_cpuidle+0x5e/0x90
> [  297.425961]        do_idle+0x333/0x370
> [  297.425962]        cpu_startup_entry+0x1d/0x1f
> [  297.425962]        start_secondary+0x290/0x330
> [  297.425963]        secondary_startup_64+0xb6/0xc0
> 
> [  297.425964] -> #1 (&port_lock_key){-.-.}:
> [  297.425967]        __lock_acquire+0x5b3/0xb40
> [  297.425967]        lock_acquire+0x126/0x280
> [  297.425968]        _raw_spin_lock_irqsave+0x3a/0x50
> [  297.425969]        serial8250_console_write+0x3e4/0x450
> [  297.425970]        univ8250_console_write+0x4b/0x60
> [  297.425970]        console_unlock+0x501/0x750
> [  297.425971]        vprintk_emit+0x10d/0x340
> [  297.425972]        vprintk_default+0x1f/0x30
> [  297.425972]        vprintk_func+0x44/0xd4
> [  297.425973]        printk+0x9f/0xc5
> [  297.425974]        register_console+0x39c/0x520
> [  297.425975]        univ8250_console_init+0x23/0x2d
> [  297.425975]        console_init+0x338/0x4cd
> [  297.425976]        start_kernel+0x534/0x724
> [  297.425977]        x86_64_start_reservations+0x24/0x26
> [  297.425977]        x86_64_start_kernel+0xf4/0xfb
> [  297.425978]        secondary_startup_64+0xb6/0xc0
> 
> [  297.425979] -> #0 (console_owner){-.-.}:
> [  297.425982]        check_prev_add+0x107/0xea0
> [  297.425982]        validate_chain+0x8fc/0x1200
> [  297.425983]        __lock_acquire+0x5b3/0xb40
> [  297.425984]        lock_acquire+0x126/0x280
> [  297.425984]        console_unlock+0x269/0x750
> [  297.425985]        vprintk_emit+0x10d/0x340
> [  297.425986]        vprintk_default+0x1f/0x30
> [  297.425987]        vprintk_func+0x44/0xd4
> [  297.425987]        printk+0x9f/0xc5
> [  297.425988]        __offline_isolated_pages.cold.52+0x2f/0x30a
> [  297.425989]        offline_isolated_pages_cb+0x17/0x30
> [  297.425990]        walk_system_ram_range+0xda/0x160
> [  297.425990]        __offline_pages+0x79c/0xa10
> [  297.425991]        offline_pages+0x11/0x20
> [  297.425992]        memory_subsys_offline+0x7e/0xc0
> [  297.425992]        device_offline+0xd5/0x110
> [  297.425993]        state_store+0xc6/0xe0
> [  297.425994]        dev_attr_store+0x3f/0x60
> [  297.425995]        sysfs_kf_write+0x89/0xb0
> [  297.425995]        kernfs_fop_write+0x188/0x240
> [  297.425996]        __vfs_write+0x50/0xa0
> [  297.425997]        vfs_write+0x105/0x290
> [  297.425997]        ksys_write+0xc6/0x160
> [  297.425998]        __x64_sys_write+0x43/0x50
> [  297.425999]        do_syscall_64+0xcc/0x76c
> [  297.426000]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

I suppose you run with CONFIG_DEBUG_VM...

So we have

port->lock -> MM -> zone->lock
	// from pty_write()->__tty_buffer_request_room()->kmalloc()

vs

zone->lock -> printk() -> port->lock
	// from __offline_pages()->__offline_isolated_pages()->printk()


A number of debugging options make the kernel less stable.
Sad but true.

	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-23 10:21                 ` Sergey Senozhatsky
@ 2019-09-23 12:58                   ` Petr Mladek
  2019-09-23 13:24                     ` Qian Cai
  2019-09-24  1:28                     ` Sergey Senozhatsky
  2019-09-23 13:09                   ` Qian Cai
  1 sibling, 2 replies; 21+ messages in thread
From: Petr Mladek @ 2019-09-23 12:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qian Cai, Catalin Marinas, Arnd Bergmann, Sergey Senozhatsky,
	Steven Rostedt, Peter Zijlstra, Dan Williams, Will Deacon,
	linux-mm, Thomas Gleixner, Greg Kroah-Hartman, linux-arm-kernel,
	Theodore Ts'o, Waiman Long, linux-kernel

On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote:
> So we have
> 
> port->lock -> MM -> zone->lock
> 	// from pty_write()->__tty_buffer_request_room()->kmalloc()
> 
> vs
> 
> zone->lock -> printk() -> port->lock
> 	// from __offline_pages()->__offline_isolated_pages()->printk()

If I understand it correctly then this is the re-appearing problem.
The only systematic solution with the current approach is to
take port->lock in printk_safe/printk_deferred context.

But this is a massive change that almost nobody wants. Instead,
we want the changes that were discussed on Plumbers.

Now, the question is what to do with existing kernels. There were
several lockdep reports. And I am a bit lost. Did anyone seen
real deadlocks or just the lockdep reports?

To be clear. I would feel more comfortable when there are no
deadlocks. But I also do not want to invest too much time
into old kernels. All these problems were there for ages.
We could finally see them because lockdep was enabled in printk()
thanks to printk_safe. Well, it is getting worse over time with
the increasing complexity and number of debugging messages.

> A number of debugging options make the kernel less stable.
> Sad but true.

Yeah. The only good thing is that most debug options are not
enabled on production systems. It is not an excuse for ignoring
the problems. But it might be important for prioritizing.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-23 10:21                 ` Sergey Senozhatsky
  2019-09-23 12:58                   ` Petr Mladek
@ 2019-09-23 13:09                   ` Qian Cai
  1 sibling, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-09-23 13:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Petr Mladek, Catalin Marinas,
	Will Deacon, Dan Williams, linux-mm, linux-kernel,
	linux-arm-kernel, Peter Zijlstra, Waiman Long, Thomas Gleixner,
	Theodore Ts'o, Arnd Bergmann, Greg Kroah-Hartman

On Mon, 2019-09-23 at 19:21 +0900, Sergey Senozhatsky wrote:
> On (09/18/19 12:10), Qian Cai wrote:
> [..]
> > > So you have debug objects enabled. Right? This thing does not behave
> > > when it comes to printing. debug_objects are slightly problematic.
> > 
> > Yes, but there is an also a similar splat without the debug_objects. It looks
> > like anything try to allocate memory in that path will trigger it anyway.
> 
> Appears to be different, yet somehow very familiar.
> 
> > [  297.425908] WARNING: possible circular locking dependency detected
> > [  297.425908] 5.3.0-next-20190917 #8 Not tainted
> > [  297.425909] ------------------------------------------------------
> > [  297.425910] test.sh/8653 is trying to acquire lock:
> > [  297.425911] ffffffff865a4460 (console_owner){-.-.}, at:
> > console_unlock+0x207/0x750
> > 
> > [  297.425914] but task is already holding lock:
> > [  297.425915] ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
> > __offline_isolated_pages+0x179/0x3e0
> > 
> > [  297.425919] which lock already depends on the new lock.
> > 
> > 
> > [  297.425920] the existing dependency chain (in reverse order) is:
> > 
> > [  297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}:
> > [  297.425925]        __lock_acquire+0x5b3/0xb40
> > [  297.425925]        lock_acquire+0x126/0x280
> > [  297.425926]        _raw_spin_lock+0x2f/0x40
> > [  297.425927]        rmqueue_bulk.constprop.21+0xb6/0x1160
> > [  297.425928]        get_page_from_freelist+0x898/0x22c0
> > [  297.425928]        __alloc_pages_nodemask+0x2f3/0x1cd0
> > [  297.425929]        alloc_pages_current+0x9c/0x110
> > [  297.425930]        allocate_slab+0x4c6/0x19c0
> > [  297.425931]        new_slab+0x46/0x70
> > [  297.425931]        ___slab_alloc+0x58b/0x960
> > [  297.425932]        __slab_alloc+0x43/0x70
> > [  297.425933]        __kmalloc+0x3ad/0x4b0
> > [  297.425933]        __tty_buffer_request_room+0x100/0x250
> > [  297.425934]        tty_insert_flip_string_fixed_flag+0x67/0x110
> > [  297.425935]        pty_write+0xa2/0xf0
> > [  297.425936]        n_tty_write+0x36b/0x7b0
> > [  297.425936]        tty_write+0x284/0x4c0
> > [  297.425937]        __vfs_write+0x50/0xa0
> > [  297.425938]        vfs_write+0x105/0x290
> > [  297.425939]        redirected_tty_write+0x6a/0xc0
> > [  297.425939]        do_iter_write+0x248/0x2a0
> > [  297.425940]        vfs_writev+0x106/0x1e0
> > [  297.425941]        do_writev+0xd4/0x180
> > [  297.425941]        __x64_sys_writev+0x45/0x50
> > [  297.425942]        do_syscall_64+0xcc/0x76c
> > [  297.425943]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > [  297.425944] -> #2 (&(&port->lock)->rlock){-.-.}:
> > [  297.425946]        __lock_acquire+0x5b3/0xb40
> > [  297.425947]        lock_acquire+0x126/0x280
> > [  297.425948]        _raw_spin_lock_irqsave+0x3a/0x50
> > [  297.425949]        tty_port_tty_get+0x20/0x60
> > [  297.425949]        tty_port_default_wakeup+0xf/0x30
> > [  297.425950]        tty_port_tty_wakeup+0x39/0x40
> > [  297.425951]        uart_write_wakeup+0x2a/0x40
> > [  297.425952]        serial8250_tx_chars+0x22e/0x440
> > [  297.425952]        serial8250_handle_irq.part.8+0x14a/0x170
> > [  297.425953]        serial8250_default_handle_irq+0x5c/0x90
> > [  297.425954]        serial8250_interrupt+0xa6/0x130
> > [  297.425955]        __handle_irq_event_percpu+0x78/0x4f0
> > [  297.425955]        handle_irq_event_percpu+0x70/0x100
> > [  297.425956]        handle_irq_event+0x5a/0x8b
> > [  297.425957]        handle_edge_irq+0x117/0x370
> > [  297.425958]        do_IRQ+0x9e/0x1e0
> > [  297.425958]        ret_from_intr+0x0/0x2a
> > [  297.425959]        cpuidle_enter_state+0x156/0x8e0
> > [  297.425960]        cpuidle_enter+0x41/0x70
> > [  297.425960]        call_cpuidle+0x5e/0x90
> > [  297.425961]        do_idle+0x333/0x370
> > [  297.425962]        cpu_startup_entry+0x1d/0x1f
> > [  297.425962]        start_secondary+0x290/0x330
> > [  297.425963]        secondary_startup_64+0xb6/0xc0
> > 
> > [  297.425964] -> #1 (&port_lock_key){-.-.}:
> > [  297.425967]        __lock_acquire+0x5b3/0xb40
> > [  297.425967]        lock_acquire+0x126/0x280
> > [  297.425968]        _raw_spin_lock_irqsave+0x3a/0x50
> > [  297.425969]        serial8250_console_write+0x3e4/0x450
> > [  297.425970]        univ8250_console_write+0x4b/0x60
> > [  297.425970]        console_unlock+0x501/0x750
> > [  297.425971]        vprintk_emit+0x10d/0x340
> > [  297.425972]        vprintk_default+0x1f/0x30
> > [  297.425972]        vprintk_func+0x44/0xd4
> > [  297.425973]        printk+0x9f/0xc5
> > [  297.425974]        register_console+0x39c/0x520
> > [  297.425975]        univ8250_console_init+0x23/0x2d
> > [  297.425975]        console_init+0x338/0x4cd
> > [  297.425976]        start_kernel+0x534/0x724
> > [  297.425977]        x86_64_start_reservations+0x24/0x26
> > [  297.425977]        x86_64_start_kernel+0xf4/0xfb
> > [  297.425978]        secondary_startup_64+0xb6/0xc0
> > 
> > [  297.425979] -> #0 (console_owner){-.-.}:
> > [  297.425982]        check_prev_add+0x107/0xea0
> > [  297.425982]        validate_chain+0x8fc/0x1200
> > [  297.425983]        __lock_acquire+0x5b3/0xb40
> > [  297.425984]        lock_acquire+0x126/0x280
> > [  297.425984]        console_unlock+0x269/0x750
> > [  297.425985]        vprintk_emit+0x10d/0x340
> > [  297.425986]        vprintk_default+0x1f/0x30
> > [  297.425987]        vprintk_func+0x44/0xd4
> > [  297.425987]        printk+0x9f/0xc5
> > [  297.425988]        __offline_isolated_pages.cold.52+0x2f/0x30a
> > [  297.425989]        offline_isolated_pages_cb+0x17/0x30
> > [  297.425990]        walk_system_ram_range+0xda/0x160
> > [  297.425990]        __offline_pages+0x79c/0xa10
> > [  297.425991]        offline_pages+0x11/0x20
> > [  297.425992]        memory_subsys_offline+0x7e/0xc0
> > [  297.425992]        device_offline+0xd5/0x110
> > [  297.425993]        state_store+0xc6/0xe0
> > [  297.425994]        dev_attr_store+0x3f/0x60
> > [  297.425995]        sysfs_kf_write+0x89/0xb0
> > [  297.425995]        kernfs_fop_write+0x188/0x240
> > [  297.425996]        __vfs_write+0x50/0xa0
> > [  297.425997]        vfs_write+0x105/0x290
> > [  297.425997]        ksys_write+0xc6/0x160
> > [  297.425998]        __x64_sys_write+0x43/0x50
> > [  297.425999]        do_syscall_64+0xcc/0x76c
> > [  297.426000]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> I suppose you run with CONFIG_DEBUG_VM...
> 
> So we have
> 
> port->lock -> MM -> zone->lock
> 	// from pty_write()->__tty_buffer_request_room()->kmalloc()
> 
> vs
> 
> zone->lock -> printk() -> port->lock
> 	// from __offline_pages()->__offline_isolated_pages()->printk()
> 
> 
> A number of debugging options make the kernel less stable.
> Sad but true.

I am afraid it does not matter, as it still trigger a splat without either
CONFIG_DEBUG_VM and debug objects because offline_pages() will still call
printk() while holding zone->lock in a different path.

offline_pages()
  start_isolate_page_range()
    set_migratetype_isolate()
      has_unmovable_pages()
        dump_page()
          printk()

[  377.525562] WARNING: possible circular locking dependency detected
[  377.525563] 5.3.0-next-20190920+ #9 Not tainted
[  377.525564] ------------------------------------------------------
[  377.525565] test.sh/8876 is trying to acquire lock:
[  377.525565] ffffffffa2da1b80 (console_owner){-.-.}, at:
console_unlock+0x207/0x750

[  377.525569] but task is already holding lock:
[  377.525569] ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:
start_isolate_page_range+0x1ea/0x540

[  377.525573] which lock already depends on the new lock.


[  377.525575] the existing dependency chain (in reverse order) is:

[  377.525576] -> #3 (&(&zone->lock)->rlock){-.-.}:
[  377.525579]        __lock_acquire+0x5b3/0xb40
[  377.525579]        lock_acquire+0x126/0x280
[  377.525580]        _raw_spin_lock+0x2f/0x40
[  377.525581]        rmqueue_bulk.constprop.21+0xb6/0xf70
[  377.525582]        get_page_from_freelist+0x89a/0x20d0
[  377.525582]        __alloc_pages_nodemask+0x2b1/0x1c80
[  377.525583]        alloc_pages_current+0x9c/0x110
[  377.525584]        allocate_slab+0xa8f/0x16d0
[  377.525585]        new_slab+0x46/0x70
[  377.525586]        ___slab_alloc+0x424/0x630
[  377.525586]        __slab_alloc+0x43/0x70
[  377.525587]        __kmalloc+0x3e3/0x490
[  377.525588]        __tty_buffer_request_room+0x100/0x250
[  377.525589]        tty_insert_flip_string_fixed_flag+0x67/0x110
[  377.525590]        pty_write+0xa2/0xf0
[  377.525591]        n_tty_write+0x36b/0x7b0
[  377.525592]        tty_write+0x284/0x4c0
[  377.525593]        __vfs_write+0x50/0xa0
[  377.525593]        vfs_write+0x105/0x290
[  377.525595]        ksys_write+0xc6/0x160
[  377.525596]        __x64_sys_write+0x43/0x50
[  377.525596]        do_syscall_64+0xcc/0x76c
[  377.525597]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  377.525598] -> #2 (&(&port->lock)->rlock){-.-.}:
[  377.525601]        __lock_acquire+0x5b3/0xb40
[  377.525601]        lock_acquire+0x126/0x280
[  377.525602]        _raw_spin_lock_irqsave+0x3a/0x50
[  377.525603]        tty_port_tty_get+0x20/0x60
[  377.525604]        tty_port_default_wakeup+0xf/0x30
[  377.525605]        tty_port_tty_wakeup+0x39/0x40
[  377.525606]        uart_write_wakeup+0x2a/0x40
[  377.525607]        serial8250_tx_chars+0x22e/0x440
[  377.525607]        serial8250_handle_irq.part.8+0x14a/0x170
[  377.525608]        serial8250_default_handle_irq+0x5c/0x90
[  377.525609]        serial8250_interrupt+0xa6/0x130
[  377.525610]        __handle_irq_event_percpu+0x78/0x4f0
[  377.525611]        handle_irq_event_percpu+0x70/0x100
[  377.525612]        handle_irq_event+0x5a/0x8b
[  377.525613]        handle_edge_irq+0x117/0x370
[  377.525614]        do_IRQ+0x9e/0x1e0
[  377.525614]        ret_from_intr+0x0/0x2a
[  377.525615]        cpuidle_enter_state+0x156/0x8e0
[  377.525616]        cpuidle_enter+0x41/0x70
[  377.525616]        call_cpuidle+0x5e/0x90
[  377.525617]        do_idle+0x333/0x370
[  377.525618]        cpu_startup_entry+0x1d/0x1f
[  377.525619]        start_secondary+0x290/0x330
[  377.525619]        secondary_startup_64+0xb6/0xc0

[  377.525620] -> #1 (&port_lock_key){-.-.}:
[  377.525623]        __lock_acquire+0x5b3/0xb40
[  377.525623]        lock_acquire+0x126/0x280
[  377.525624]        _raw_spin_lock_irqsave+0x3a/0x50
[  377.525625]        serial8250_console_write+0x3e4/0x450
[  377.525626]        univ8250_console_write+0x4b/0x60
[  377.525627]        console_unlock+0x501/0x750
[  377.525627]        vprintk_emit+0x10d/0x340
[  377.525628]        vprintk_default+0x1f/0x30
[  377.525629]        vprintk_func+0x44/0xd4
[  377.525629]        printk+0x9f/0xc5
[  377.525630]        register_console+0x39c/0x520
[  377.525631]        univ8250_console_init+0x23/0x2d
[  377.525632]        console_init+0x338/0x4cd
[  377.525632]        start_kernel+0x52a/0x71a
[  377.525633]        x86_64_start_reservations+0x24/0x26
[  377.525634]        x86_64_start_kernel+0xf4/0xfb
[  377.525635]        secondary_startup_64+0xb6/0xc0

[  377.525636] -> #0 (console_owner){-.-.}:
[  377.525638]        check_prev_add+0x107/0xea0
[  377.525639]        validate_chain+0x8fc/0x1200
[  377.525640]        __lock_acquire+0x5b3/0xb40
[  377.525641]        lock_acquire+0x126/0x280
[  377.525641]        console_unlock+0x269/0x750
[  377.525642]        vprintk_emit+0x10d/0x340
[  377.525643]        vprintk_default+0x1f/0x30
[  377.525643]        vprintk_func+0x44/0xd4
[  377.525644]        printk+0x9f/0xc5
[  377.525645]        __dump_page.cold.0+0x73/0x20a
[  377.525645]        dump_page+0x12/0x46
[  377.525646]        has_unmovable_pages+0x2ff/0x360
[  377.525647]        start_isolate_page_range+0x3bb/0x540
[  377.525648]        __offline_pages+0x258/0xfc0
[  377.525649]        offline_pages+0x11/0x20
[  377.525649]        memory_subsys_offline+0x7e/0xc0
[  377.525650]        device_offline+0xd5/0x110
[  377.525651]        state_store+0xc6/0xe0
[  377.525651]        dev_attr_store+0x3f/0x60
[  377.525652]        sysfs_kf_write+0x89/0xb0
[  377.525653]        kernfs_fop_write+0x188/0x240
[  377.525654]        __vfs_write+0x50/0xa0
[  377.525654]        vfs_write+0x105/0x290
[  377.525655]        ksys_write+0xc6/0x160
[  377.525656]        __x64_sys_write+0x43/0x50
[  377.525656]        do_syscall_64+0xcc/0x76c
[  377.525657]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  377.525658] other info that might help us debug this:

[  377.525660] Chain exists of:
[  377.525660]   console_owner --> &(&port->lock)->rlock --> &(&zone->lock)-
>rlock

[  377.525664]  Possible unsafe locking scenario:

[  377.525665]        CPU0                    CPU1
[  377.525666]        ----                    ----
[  377.525667]   lock(&(&zone->lock)->rlock);
[  377.525669]                                lock(&(&port->lock)->rlock);
[  377.525671]                                lock(&(&zone->lock)->rlock);
[  377.525672]   lock(console_owner);

[  377.525675]  *** DEADLOCK ***

[  377.525676] 9 locks held by test.sh/8876:
[  377.525677]  #0: ffff8883bca59408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
[  377.525681]  #1: ffff888eefb58880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
[  377.525684]  #2: ffff889feb6dcc40 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
[  377.525687]  #3: ffffffffa30f7e60 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
[  377.525690]  #4: ffff88981f55c990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
[  377.525693]  #5: ffffffffa2d12d70 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xdb/0xfc0
[  377.525696]  #6: ffffffffa2f007b0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
[  377.525699]  #7: ffff88883fff4318 (&(&zone->lock)->rlock){-.-.}, at:
start_isolate_page_range+0x1ea/0x540
[  377.525703]  #8: ffffffffa2da2040 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340
[  377.525706] stack backtrace:
[  377.525707] CPU: 10 PID: 8876 Comm: test.sh Not tainted 5.3.0-next-20190920+
#9
[  377.525708] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
[  377.525709] Call Trace:
[  377.525710]  dump_stack+0x86/0xca
[  377.525710]  print_circular_bug.cold.31+0x243/0x26e
[  377.525711]  check_noncircular+0x29e/0x2e0
[  377.525712]  ? stack_trace_save+0x87/0xb0
[  377.525712]  ? print_circular_bug+0x120/0x120
[  377.525713]  check_prev_add+0x107/0xea0
[  377.525714]  validate_chain+0x8fc/0x1200
[  377.525714]  ? check_prev_add+0xea0/0xea0
[  377.525715]  __lock_acquire+0x5b3/0xb40
[  377.525716]  lock_acquire+0x126/0x280
[  377.525717]  ? console_unlock+0x207/0x750
[  377.525717]  ? __kasan_check_read+0x11/0x20
[  377.525718]  console_unlock+0x269/0x750
[  377.525719]  ? console_unlock+0x207/0x750
[  377.525719]  vprintk_emit+0x10d/0x340
[  377.525720]  vprintk_default+0x1f/0x30
[  377.525721]  vprintk_func+0x44/0xd4
[  377.525721]  printk+0x9f/0xc5
[  377.525722]  ? kmsg_dump_rewind_nolock+0x64/0x64
[  377.525723]  ? lockdep_hardirqs_off+0x74/0x140
[  377.525723]  __dump_page.cold.0+0x73/0x20a
[  377.525724]  dump_page+0x12/0x46
[  377.525725]  has_unmovable_pages+0x2ff/0x360
[  377.525725]  start_isolate_page_range+0x3bb/0x540
[  377.525726]  ? unset_migratetype_isolate+0x260/0x260
[  377.525727]  ? rcu_read_lock_bh_held+0xc0/0xc0
[  377.525728]  __offline_pages+0x258/0xfc0
[  377.525728]  ? __lock_acquire+0x670/0xb40
[  377.525729]  ? __add_memory+0xc0/0xc0
[  377.525730]  ? lock_acquire+0x126/0x280
[  377.525730]  ? device_offline+0x70/0x110
[  377.525731]  ? __kasan_check_write+0x14/0x20
[  377.525732]  ? __mutex_lock+0x344/0xcd0
[  377.525732]  ? _raw_spin_unlock_irqrestore+0x49/0x50
[  377.525733]  ? device_offline+0x70/0x110
[  377.525734]  ? klist_next+0x1c1/0x1e0
[  377.525735]  ? __mutex_add_waiter+0xc0/0xc0
[  377.525735]  ? __device_link_free_srcu+0x80/0x80
[  377.525736]  ? klist_next+0x10b/0x1e0
[  377.525737]  ? klist_iter_exit+0x16/0x40
[  377.525737]  ? device_for_each_child+0xd0/0x110
[  377.525738]  offline_pages+0x11/0x20
[  377.525739]  memory_subsys_offline+0x7e/0xc0
[  377.525739]  device_offline+0xd5/0x110
[  377.525740]  ? auto_online_blocks_show+0x70/0x70
[  377.525741]  state_store+0xc6/0xe0
[  377.525741]  dev_attr_store+0x3f/0x60
[  377.525742]  ? device_match_name+0x40/0x40
[  377.525743]  sysfs_kf_write+0x89/0xb0
[  377.525744]  ? sysfs_file_ops+0xa0/0xa0
[  377.525745]  kernfs_fop_write+0x188/0x240
[  377.525745]  __vfs_write+0x50/0xa0
[  377.525746]  vfs_write+0x105/0x290
[  377.525747]  ksys_write+0xc6/0x160
[  377.525748]  ? __x64_sys_read+0x50/0x50
[  377.525748]  ? do_syscall_64+0x79/0x76c
[  377.525749]  ? do_syscall_64+0x79/0x76c
[  377.525750]  __x64_sys_write+0x43/0x50
[  377.525751]  do_syscall_64+0xcc/0x76c
[  377.525752]  ? trace_hardirqs_on_thunk+0x1a/0x20
[  377.525753]  ? syscall_return_slowpath+0x210/0x210
[  377.525754]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  377.525754]  ? trace_hardirqs_off_caller+0x3a/0x150
[  377.525755]  ? trace_hardirqs_off_thunk+0x1a/0x20
[  377.525756]  entry_SYSCALL_64_after_hwframe+0x49/0xbe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-23 12:58                   ` Petr Mladek
@ 2019-09-23 13:24                     ` Qian Cai
  2019-09-24  1:28                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-09-23 13:24 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky
  Cc: Catalin Marinas, Arnd Bergmann, Sergey Senozhatsky,
	Steven Rostedt, Peter Zijlstra, Dan Williams, Will Deacon,
	linux-mm, Thomas Gleixner, Greg Kroah-Hartman, linux-arm-kernel,
	Theodore Ts'o, Waiman Long, linux-kernel

On Mon, 2019-09-23 at 14:58 +0200, Petr Mladek wrote:
> On Mon 2019-09-23 19:21:00, Sergey Senozhatsky wrote:
> > So we have
> > 
> > port->lock -> MM -> zone->lock
> > 	// from pty_write()->__tty_buffer_request_room()->kmalloc()
> > 
> > vs
> > 
> > zone->lock -> printk() -> port->lock
> > 	// from __offline_pages()->__offline_isolated_pages()->printk()
> 
> If I understand it correctly then this is the re-appearing problem.
> The only systematic solution with the current approach is to
> take port->lock in printk_safe/printk_deferred context.
> 
> But this is a massive change that almost nobody wants. Instead,
> we want the changes that were discussed on Plumbers.
> 
> Now, the question is what to do with existing kernels. There were
> several lockdep reports. And I am a bit lost. Did anyone seen
> real deadlocks or just the lockdep reports?

Yes, there is a real deadlock which causes an arm64 system hang on boot where
replace printk() with printk_deferred() in _warn_unseeded_randomness() will fix
it.

[ 1078.214683][T43784] WARNING: possible circular locking dependency detected
[ 1078.221550][T43784] 5.3.0-rc7-next-20190904 #14 Not tainted
[ 1078.227112][T43784] ------------------------------------------------------
[ 1078.233976][T43784] vi/43784 is trying to acquire lock:
[ 1078.239192][T43784] ffff008b7cff9290 (&(&zone->lock)->rlock){-.-.}, at:
rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1078.249111][T43784] 
[ 1078.249111][T43784] but task is already holding lock:
[ 1078.256322][T43784] ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
pty_write+0x78/0x100
[ 1078.264760][T43784] 
[ 1078.264760][T43784] which lock already depends on the new lock.
[ 1078.264760][T43784] 
[ 1078.275008][T43784] 
[ 1078.275008][T43784] the existing dependency chain (in reverse order) is:
[ 1078.283869][T43784] 
[ 1078.283869][T43784] -> #3 (&(&port->lock)->rlock){-.-.}:
[ 1078.291350][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.296394][T43784]        lock_acquire+0x154/0x428
[ 1078.301266][T43784]        _raw_spin_lock_irqsave+0x80/0xa0
[ 1078.306831][T43784]        tty_port_tty_get+0x28/0x68
[ 1078.311873][T43784]        tty_port_default_wakeup+0x20/0x40
[ 1078.317523][T43784]        tty_port_tty_wakeup+0x38/0x48
[ 1078.322827][T43784]        uart_write_wakeup+0x2c/0x50
[ 1078.327956][T43784]        pl011_tx_chars+0x240/0x260
[ 1078.332999][T43784]        pl011_start_tx+0x24/0xa8
[ 1078.337868][T43784]        __uart_start+0x90/0xa0
[ 1078.342563][T43784]        uart_write+0x15c/0x2c8
[ 1078.347261][T43784]        do_output_char+0x1c8/0x2b0
[ 1078.352304][T43784]        n_tty_write+0x300/0x668
[ 1078.357087][T43784]        tty_write+0x2e8/0x430
[ 1078.361696][T43784]        redirected_tty_write+0xcc/0xe8
[ 1078.367086][T43784]        do_iter_write+0x228/0x270
[ 1078.372041][T43784]        vfs_writev+0x10c/0x1c8
[ 1078.376735][T43784]        do_writev+0xdc/0x180
[ 1078.381257][T43784]        __arm64_sys_writev+0x50/0x60
[ 1078.386476][T43784]        el0_svc_handler+0x11c/0x1f0
[ 1078.391606][T43784]        el0_svc+0x8/0xc
[ 1078.395691][T43784] 
[ 1078.395691][T43784] -> #2 (&port_lock_key){-.-.}:
[ 1078.402561][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.407604][T43784]        lock_acquire+0x154/0x428
[ 1078.412474][T43784]        _raw_spin_lock+0x68/0x88
[ 1078.417343][T43784]        pl011_console_write+0x2ac/0x318
[ 1078.422820][T43784]        console_unlock+0x3c4/0x898
[ 1078.427863][T43784]        vprintk_emit+0x2d4/0x460
[ 1078.432732][T43784]        vprintk_default+0x48/0x58
[ 1078.437688][T43784]        vprintk_func+0x194/0x250
[ 1078.442557][T43784]        printk+0xbc/0xec
[ 1078.446732][T43784]        register_console+0x4a8/0x580
[ 1078.451947][T43784]        uart_add_one_port+0x748/0x878
[ 1078.457250][T43784]        pl011_register_port+0x98/0x128
[ 1078.462639][T43784]        sbsa_uart_probe+0x398/0x480
[ 1078.467772][T43784]        platform_drv_probe+0x70/0x108
[ 1078.473075][T43784]        really_probe+0x15c/0x5d8
[ 1078.477944][T43784]        driver_probe_device+0x94/0x1d0
[ 1078.483335][T43784]        __device_attach_driver+0x11c/0x1a8
[ 1078.489072][T43784]        bus_for_each_drv+0xf8/0x158
[ 1078.494201][T43784]        __device_attach+0x164/0x240
[ 1078.499331][T43784]        device_initial_probe+0x24/0x30
[ 1078.504721][T43784]        bus_probe_device+0xf0/0x100
[ 1078.509850][T43784]        device_add+0x63c/0x960
[ 1078.514546][T43784]        platform_device_add+0x1ac/0x3b8
[ 1078.520023][T43784]        platform_device_register_full+0x1fc/0x290
[ 1078.526373][T43784]        acpi_create_platform_device.part.0+0x264/0x3a8
[ 1078.533152][T43784]        acpi_create_platform_device+0x68/0x80
[ 1078.539150][T43784]        acpi_default_enumeration+0x34/0x78
[ 1078.544887][T43784]        acpi_bus_attach+0x340/0x3b8
[ 1078.550015][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.555057][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.560099][T43784]        acpi_bus_attach+0xf8/0x3b8
[ 1078.565142][T43784]        acpi_bus_scan+0x9c/0x100
[ 1078.570015][T43784]        acpi_scan_init+0x16c/0x320
[ 1078.575058][T43784]        acpi_init+0x330/0x3b8
[ 1078.579666][T43784]        do_one_initcall+0x158/0x7ec
[ 1078.584797][T43784]        kernel_init_freeable+0x9a8/0xa70
[ 1078.590360][T43784]        kernel_init+0x18/0x138
[ 1078.595055][T43784]        ret_from_fork+0x10/0x1c
[ 1078.599835][T43784] 
[ 1078.599835][T43784] -> #1 (console_owner){-...}:
[ 1078.606618][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.611661][T43784]        lock_acquire+0x154/0x428
[ 1078.616530][T43784]        console_unlock+0x298/0x898
[ 1078.621573][T43784]        vprintk_emit+0x2d4/0x460
[ 1078.626442][T43784]        vprintk_default+0x48/0x58
[ 1078.631398][T43784]        vprintk_func+0x194/0x250
[ 1078.636267][T43784]        printk+0xbc/0xec
[ 1078.640443][T43784]        _warn_unseeded_randomness+0xb4/0xd0
[ 1078.646267][T43784]        get_random_u64+0x4c/0x100
[ 1078.651224][T43784]        add_to_free_area_random+0x168/0x1a0
[ 1078.657047][T43784]        free_one_page+0x3dc/0xd08
[ 1078.662003][T43784]        __free_pages_ok+0x490/0xd00
[ 1078.667132][T43784]        __free_pages+0xc4/0x118
[ 1078.671914][T43784]        __free_pages_core+0x2e8/0x428
[ 1078.677219][T43784]        memblock_free_pages+0xa4/0xec
[ 1078.682522][T43784]        memblock_free_all+0x264/0x330
[ 1078.687825][T43784]        mem_init+0x90/0x148
[ 1078.692259][T43784]        start_kernel+0x368/0x684
[ 1078.697126][T43784] 
[ 1078.697126][T43784] -> #0 (&(&zone->lock)->rlock){-.-.}:
[ 1078.704604][T43784]        check_prev_add+0x120/0x1138
[ 1078.709733][T43784]        validate_chain+0x888/0x1270
[ 1078.714863][T43784]        __lock_acquire+0x5c8/0xbb0
[ 1078.719906][T43784]        lock_acquire+0x154/0x428
[ 1078.724776][T43784]        _raw_spin_lock+0x68/0x88
[ 1078.729645][T43784]        rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1078.735643][T43784]        get_page_from_freelist+0x898/0x24a0
[ 1078.741467][T43784]        __alloc_pages_nodemask+0x2a8/0x1d08
[ 1078.747291][T43784]        alloc_pages_current+0xb4/0x150
[ 1078.752682][T43784]        allocate_slab+0xab8/0x2350
[ 1078.757725][T43784]        new_slab+0x98/0xc0
[ 1078.762073][T43784]        ___slab_alloc+0x66c/0xa30
[ 1078.767029][T43784]        __slab_alloc+0x68/0xc8
[ 1078.771725][T43784]        __kmalloc+0x3d4/0x658
[ 1078.776333][T43784]        __tty_buffer_request_room+0xd4/0x220
[ 1078.782244][T43784]        tty_insert_flip_string_fixed_flag+0x6c/0x128
[ 1078.788849][T43784]        pty_write+0x98/0x100
[ 1078.793370][T43784]        n_tty_write+0x2a0/0x668
[ 1078.798152][T43784]        tty_write+0x2e8/0x430
[ 1078.802760][T43784]        __vfs_write+0x5c/0xb0
[ 1078.807368][T43784]        vfs_write+0xf0/0x230
[ 1078.811890][T43784]        ksys_write+0xd4/0x180
[ 1078.816498][T43784]        __arm64_sys_write+0x4c/0x60
[ 1078.821627][T43784]        el0_svc_handler+0x11c/0x1f0
[ 1078.826756][T43784]        el0_svc+0x8/0xc
[ 1078.830842][T43784] 
[ 1078.830842][T43784] other info that might help us debug this:
[ 1078.830842][T43784] 
[ 1078.840918][T43784] Chain exists of:
[ 1078.840918][T43784]   &(&zone->lock)->rlock --> &port_lock_key --> &(&port-
>lock)->rlock
[ 1078.840918][T43784] 
[ 1078.854731][T43784]  Possible unsafe locking scenario:
[ 1078.854731][T43784] 
[ 1078.862029][T43784]        CPU0                    CPU1
[ 1078.867243][T43784]        ----                    ----
[ 1078.872457][T43784]   lock(&(&port->lock)->rlock);
[ 1078.877238][T43784]                                lock(&port_lock_key);
[ 1078.883929][T43784]                                lock(&(&port->lock)-
>rlock);
[ 1078.891228][T43784]   lock(&(&zone->lock)->rlock);
[ 1078.896010][T43784] 
[ 1078.896010][T43784]  *** DEADLOCK ***
[ 1078.896010][T43784] 
[ 1078.904004][T43784] 5 locks held by vi/43784:
[ 1078.908351][T43784]  #0: ffff000c36240890 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x44/0x50
[ 1078.917133][T43784]  #1: ffff000c36240918 (&tty->atomic_write_lock){+.+.},
at: tty_write_lock+0x24/0x60
[ 1078.926521][T43784]  #2: ffff000c36240aa0 (&o_tty->termios_rwsem/1){++++},
at: n_tty_write+0x108/0x668
[ 1078.935823][T43784]  #3: ffffa0001e0b2360 (&ldata->output_lock){+.+.}, at:
n_tty_write+0x1d0/0x668
[ 1078.944777][T43784]  #4: ffff00938db47d40 (&(&port->lock)->rlock){-.-.}, at:
pty_write+0x78/0x100
[ 1078.953644][T43784] 
[ 1078.953644][T43784] stack backtrace:
[ 1078.959382][T43784] CPU: 97 PID: 43784 Comm: vi Not tainted 5.3.0-rc7-next-
20190904 #14
[ 1078.967376][T43784] Hardware name: HPE Apollo
70             /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[ 1078.977799][T43784] Call trace:
[ 1078.980932][T43784]  dump_backtrace+0x0/0x228
[ 1078.985279][T43784]  show_stack+0x24/0x30
[ 1078.989282][T43784]  dump_stack+0xe8/0x13c
[ 1078.993370][T43784]  print_circular_bug+0x334/0x3d8
[ 1078.998240][T43784]  check_noncircular+0x268/0x310
[ 1079.003022][T43784]  check_prev_add+0x120/0x1138
[ 1079.007631][T43784]  validate_chain+0x888/0x1270
[ 1079.012241][T43784]  __lock_acquire+0x5c8/0xbb0
[ 1079.016763][T43784]  lock_acquire+0x154/0x428
[ 1079.021111][T43784]  _raw_spin_lock+0x68/0x88
[ 1079.025460][T43784]  rmqueue_bulk.constprop.21+0xb0/0x1218
[ 1079.030937][T43784]  get_page_from_freelist+0x898/0x24a0
[ 1079.036240][T43784]  __alloc_pages_nodemask+0x2a8/0x1d08
[ 1079.041542][T43784]  alloc_pages_current+0xb4/0x150
[ 1079.046412][T43784]  allocate_slab+0xab8/0x2350
[ 1079.050934][T43784]  new_slab+0x98/0xc0
[ 1079.054761][T43784]  ___slab_alloc+0x66c/0xa30
[ 1079.059196][T43784]  __slab_alloc+0x68/0xc8
[ 1079.063371][T43784]  __kmalloc+0x3d4/0x658
[ 1079.067458][T43784]  __tty_buffer_request_room+0xd4/0x220
[ 1079.072847][T43784]  tty_insert_flip_string_fixed_flag+0x6c/0x128
[ 1079.078932][T43784]  pty_write+0x98/0x100
[ 1079.082932][T43784]  n_tty_write+0x2a0/0x668
[ 1079.087193][T43784]  tty_write+0x2e8/0x430
[ 1079.091280][T43784]  __vfs_write+0x5c/0xb0
[ 1079.095367][T43784]  vfs_write+0xf0/0x230
[ 1079.099368][T43784]  ksys_write+0xd4/0x180
[ 1079.103455][T43784]  __arm64_sys_write+0x4c/0x60
[ 1079.108064][T43784]  el0_svc_handler+0x11c/0x1f0
[ 1079.112672][T43784]  el0_svc+0x8/0xc

> 
> To be clear. I would feel more comfortable when there are no
> deadlocks. But I also do not want to invest too much time
> into old kernels. All these problems were there for ages.
> We could finally see them because lockdep was enabled in printk()
> thanks to printk_safe. Well, it is getting worse over time with
> the increasing complexity and number of debugging messages.
> 
> > A number of debugging options make the kernel less stable.
> > Sad but true.
> 
> Yeah. The only good thing is that most debug options are not
> enabled on production systems. It is not an excuse for ignoring
> the problems. But it might be important for prioritizing.

If you look at the above splat and the one I just sent for offline_pages()-
>dump_page() splat. They don't involve any debug options, so they could happen
on production systems if the timing is right (which could be tricky).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang)
  2019-09-23 12:58                   ` Petr Mladek
  2019-09-23 13:24                     ` Qian Cai
@ 2019-09-24  1:28                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2019-09-24  1:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Qian Cai, Catalin Marinas, Arnd Bergmann,
	Sergey Senozhatsky, Steven Rostedt, Peter Zijlstra, Dan Williams,
	Will Deacon, linux-mm, Thomas Gleixner, Greg Kroah-Hartman,
	linux-arm-kernel, Theodore Ts'o, Waiman Long, linux-kernel

On (09/23/19 14:58), Petr Mladek wrote:
> 
> If I understand it correctly then this is the re-appearing problem.
> The only systematic solution with the current approach is to
> take port->lock in printk_safe/printk_deferred context.

It probably is.
We have a number of reverse paths. TTY invokes MM under port->lock,
UART invokes TTY under port->lock, MM invokes UART under zone->lock,
UART invokes sched under port->lock, shced invokes UART, UART invokes
UART under port->lock (!), and so on.

> But this is a massive change that almost nobody wants. Instead,
> we want the changes that were discussed on Plumbers.
>
> Now, the question is what to do with existing kernels. There were
> several lockdep reports. And I am a bit lost. Did anyone seen
> real deadlocks or just the lockdep reports?

I think so. Qian Cai mentioned "a hang" in one of his reports
(was it unseeded random()?). I'll send a formal patch maybe,
since there were no objections.

> To be clear. I would feel more comfortable when there are no
> deadlocks. But I also do not want to invest too much time
> into old kernels. All these problems were there for ages.
> We could finally see them because lockdep was enabled in printk()
> thanks to printk_safe.

True.
Everyone was so much happier when printk() used to do

       lockdep_off();
       call_console_drivers();
       lockdep_on();

Now we can have lockdep and RCU checks enabled, yet somehow
printk_safe is still "a terrible thing" :)

	-ss


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 21:33 page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
2019-08-23 11:37 ` Will Deacon
2019-08-23 11:52   ` Qian Cai
2019-09-05 21:08 ` Qian Cai
2019-09-10 15:22   ` Qian Cai
2019-09-10 19:49     ` Qian Cai
2019-09-12  2:28       ` CONFIG_SHUFFLE_PAGE_ALLOCATOR=y lockdep splat (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
2019-09-10 20:35     ` page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang Qian Cai
2019-09-11  1:10     ` Sergey Senozhatsky
2019-09-12 12:05       ` Qian Cai
2019-09-16 14:42         ` Steven Rostedt
2019-09-17  0:53           ` Sergey Senozhatsky
2019-09-18 14:39           ` printk() + memory offline deadlock (WAS Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang) Qian Cai
2019-09-18 15:50             ` Sergey Senozhatsky
2019-09-18 15:58               ` Sergey Senozhatsky
2019-09-18 16:10               ` Qian Cai
2019-09-23 10:21                 ` Sergey Senozhatsky
2019-09-23 12:58                   ` Petr Mladek
2019-09-23 13:24                     ` Qian Cai
2019-09-24  1:28                     ` Sergey Senozhatsky
2019-09-23 13:09                   ` Qian Cai

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox