Linux-Next Archive on lore.kernel.org
 help / color / Atom feed
* BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs"
@ 2020-05-19 22:09 Qian Cai
  2020-05-20  8:43 ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Qian Cai @ 2020-05-19 22:09 UTC (permalink / raw)
  To: Marc Zyngier, Stephen Rothwell
  Cc: John Garry, Linux ARM, Linux Kernel Mailing List,
	Linux Next Mailing List, Thomas Gleixner

Reverted the linux-next commit f068a62c548c ("irqchip/gic-v3-its:
Balance initial LPI affinity across CPUs") fixed these warnings during
boot,

its_select_cpu at drivers/irqchip/irq-gic-v3-its.c:1572

[  332.819381][ T3359] BUG: sleeping function called from invalid
context at mm/slab.h:568
[  332.827405][ T3359] in_atomic(): 1, irqs_disabled(): 128,
non_block: 0, pid: 3359, name: irqbalance
[  332.836455][ T3359] INFO: lockdep is turned off.
[  332.841076][ T3359] irq event stamp: 0
[  332.844836][ T3359] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  332.851828][ T3359] hardirqs last disabled at (0):
[<ffff9000101ea65c>] copy_process+0x98c/0x1f34
[  332.860710][ T3359] softirqs last  enabled at (0):
[<ffff9000101ea690>] copy_process+0x9c0/0x1f34
[  332.869586][ T3359] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  332.876560][ T3359] CPU: 155 PID: 3359 Comm: irqbalance Tainted: G
      W    L    5.7.0-rc6-next-20200519 #1
[  332.886563][ T3359] Hardware name: HPE Apollo 70
/C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
[  332.897000][ T3359] Call trace:
[  332.900151][ T3359]  dump_backtrace+0x0/0x22c
[  332.904514][ T3359]  show_stack+0x28/0x34
[  332.908543][ T3359]  dump_stack+0x104/0x194
[  332.912738][ T3359]  ___might_sleep+0x314/0x328
[  332.917274][ T3359]  __might_sleep+0x7c/0xe0
[  332.921563][ T3359]  slab_pre_alloc_hook+0x44/0x8c
[  332.926360][ T3359]  __kmalloc_node+0xb0/0x618
[  332.930811][ T3359]  alloc_cpumask_var_node+0x48/0x94
[  332.935868][ T3359]  alloc_cpumask_var+0x10/0x1c
[  332.940496][ T3359]  its_select_cpu+0x58/0x2e4
[  332.944945][ T3359]  its_set_affinity+0xe8/0x27c
[  332.949576][ T3359]  msi_domain_set_affinity+0x78/0x114
[  332.954813][ T3359]  irq_do_set_affinity+0x84/0x198
[  332.959697][ T3359]  irq_set_affinity_locked+0x80/0x1a8
[  332.964927][ T3359]  __irq_set_affinity+0x54/0x84
[  332.969637][ T3359]  write_irq_affinity+0x16c/0x198
[  332.974520][ T3359]  irq_affinity_proc_write+0x34/0x44
[  332.979672][ T3359]  pde_write+0x5c/0x78
[  332.983602][ T3359]  proc_reg_write+0x74/0xc0
[  332.987974][ T3359]  __vfs_write+0x84/0x1d8
[  332.992163][ T3359]  vfs_write+0x13c/0x1b8
[  332.996265][ T3359]  ksys_write+0xb0/0x120
[  333.000385][ T3359]  __arm64_sys_write+0x54/0x88
[  333.005017][ T3359]  do_el0_svc+0x128/0x1dc
[  333.009213][ T3359]  el0_sync_handler+0xd0/0x268
[  333.013836][ T3359]  el0_sync+0x164/0x180
[  336.527739][ T3356] mlx5_core 0000:0b:00.1 enp11s0f1np1: Link down

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs"
  2020-05-19 22:09 BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs" Qian Cai
@ 2020-05-20  8:43 ` John Garry
  2020-05-20  9:51   ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2020-05-20  8:43 UTC (permalink / raw)
  To: Qian Cai, Marc Zyngier, Stephen Rothwell
  Cc: Linux ARM, Linux Kernel Mailing List, Linux Next Mailing List,
	Thomas Gleixner

On 19/05/2020 23:09, Qian Cai wrote:
> Reverted the linux-next commit f068a62c548c ("irqchip/gic-v3-its:
> Balance initial LPI affinity across CPUs") fixed these warnings during
> boot,

Thanks for the notice. So we need the following set to see this:
CONFIG_CPUMASK_OFFSTACK=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_PER_CPU_MAPS=y

> 
> its_select_cpu at drivers/irqchip/irq-gic-v3-its.c:1572
> 
> [  332.819381][ T3359] BUG: sleeping function called from invalid
> context at mm/slab.h:568
> [  332.827405][ T3359] in_atomic(): 1, irqs_disabled(): 128,
> non_block: 0, pid: 3359, name: irqbalance
> [  332.836455][ T3359] INFO: lockdep is turned off.
> [  332.841076][ T3359] irq event stamp: 0
> [  332.844836][ T3359] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [  332.851828][ T3359] hardirqs last disabled at (0):
> [<ffff9000101ea65c>] copy_process+0x98c/0x1f34
> [  332.860710][ T3359] softirqs last  enabled at (0):
> [<ffff9000101ea690>] copy_process+0x9c0/0x1f34
> [  332.869586][ T3359] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  332.876560][ T3359] CPU: 155 PID: 3359 Comm: irqbalance Tainted: G
>        W    L    5.7.0-rc6-next-20200519 #1
> [  332.886563][ T3359] Hardware name: HPE Apollo 70
> /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
> [  332.897000][ T3359] Call trace:
> [  332.900151][ T3359]  dump_backtrace+0x0/0x22c
> [  332.904514][ T3359]  show_stack+0x28/0x34
> [  332.908543][ T3359]  dump_stack+0x104/0x194
> [  332.912738][ T3359]  ___might_sleep+0x314/0x328
> [  332.917274][ T3359]  __might_sleep+0x7c/0xe0
> [  332.921563][ T3359]  slab_pre_alloc_hook+0x44/0x8c
> [  332.926360][ T3359]  __kmalloc_node+0xb0/0x618
> [  332.930811][ T3359]  alloc_cpumask_var_node+0x48/0x94

We could use GFP_ATOMIC flag at the callsite here, but maybe there is a 
better solution.

> [  332.935868][ T3359]  alloc_cpumask_var+0x10/0x1c
> [  332.940496][ T3359]  its_select_cpu+0x58/0x2e4
> [  332.944945][ T3359]  its_set_affinity+0xe8/0x27c
> [  332.949576][ T3359]  msi_domain_set_affinity+0x78/0x114
> [  332.954813][ T3359]  irq_do_set_affinity+0x84/0x198
> [  332.959697][ T3359]  irq_set_affinity_locked+0x80/0x1a8
> [  332.964927][ T3359]  __irq_set_affinity+0x54/0x84
> [  332.969637][ T3359]  write_irq_affinity+0x16c/0x198
> [  332.974520][ T3359]  irq_affinity_proc_write+0x34/0x44
> [  332.979672][ T3359]  pde_write+0x5c/0x78
> [  332.983602][ T3359]  proc_reg_write+0x74/0xc0
> [  332.987974][ T3359]  __vfs_write+0x84/0x1d8
> [  332.992163][ T3359]  vfs_write+0x13c/0x1b8
> [  332.996265][ T3359]  ksys_write+0xb0/0x120
> [  333.000385][ T3359]  __arm64_sys_write+0x54/0x88
> [  333.005017][ T3359]  do_el0_svc+0x128/0x1dc
> [  333.009213][ T3359]  el0_sync_handler+0xd0/0x268
> [  333.013836][ T3359]  el0_sync+0x164/0x180
> [  336.527739][ T3356] mlx5_core 0000:0b:00.1 enp11s0f1np1: Link down
> .
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs"
  2020-05-20  8:43 ` John Garry
@ 2020-05-20  9:51   ` Marc Zyngier
  2020-05-20 11:28     ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2020-05-20  9:51 UTC (permalink / raw)
  To: John Garry
  Cc: Qian Cai, Stephen Rothwell, Linux ARM, Linux Kernel Mailing List,
	Linux Next Mailing List, Thomas Gleixner

Hi John,

On 2020-05-20 09:43, John Garry wrote:
> On 19/05/2020 23:09, Qian Cai wrote:
>> Reverted the linux-next commit f068a62c548c ("irqchip/gic-v3-its:
>> Balance initial LPI affinity across CPUs") fixed these warnings during
>> boot,
> 
> Thanks for the notice. So we need the following set to see this:
> CONFIG_CPUMASK_OFFSTACK=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> CONFIG_DEBUG_PER_CPU_MAPS=y

Ah, thanks for pointing this out.

>> its_select_cpu at drivers/irqchip/irq-gic-v3-its.c:1572
>> 
>> [  332.819381][ T3359] BUG: sleeping function called from invalid
>> context at mm/slab.h:568
>> [  332.827405][ T3359] in_atomic(): 1, irqs_disabled(): 128,
>> non_block: 0, pid: 3359, name: irqbalance
>> [  332.836455][ T3359] INFO: lockdep is turned off.
>> [  332.841076][ T3359] irq event stamp: 0
>> [  332.844836][ T3359] hardirqs last  enabled at (0): 
>> [<0000000000000000>] 0x0
>> [  332.851828][ T3359] hardirqs last disabled at (0):
>> [<ffff9000101ea65c>] copy_process+0x98c/0x1f34
>> [  332.860710][ T3359] softirqs last  enabled at (0):
>> [<ffff9000101ea690>] copy_process+0x9c0/0x1f34
>> [  332.869586][ T3359] softirqs last disabled at (0): 
>> [<0000000000000000>] 0x0
>> [  332.876560][ T3359] CPU: 155 PID: 3359 Comm: irqbalance Tainted: G
>>        W    L    5.7.0-rc6-next-20200519 #1
>> [  332.886563][ T3359] Hardware name: HPE Apollo 70
>> /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
>> [  332.897000][ T3359] Call trace:
>> [  332.900151][ T3359]  dump_backtrace+0x0/0x22c
>> [  332.904514][ T3359]  show_stack+0x28/0x34
>> [  332.908543][ T3359]  dump_stack+0x104/0x194
>> [  332.912738][ T3359]  ___might_sleep+0x314/0x328
>> [  332.917274][ T3359]  __might_sleep+0x7c/0xe0
>> [  332.921563][ T3359]  slab_pre_alloc_hook+0x44/0x8c
>> [  332.926360][ T3359]  __kmalloc_node+0xb0/0x618
>> [  332.930811][ T3359]  alloc_cpumask_var_node+0x48/0x94
> 
> We could use GFP_ATOMIC flag at the callsite here, but maybe there is
> a better solution.

I don't see one, and I doubt it is worth the hassle to have anything
but GFP_ATOMIC. The default arm64 config is to have on-stack cpumasks,
and only DEBUG_PER_CPU_MAPS allows this to be changed.

I'll stash a patch on top.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs"
  2020-05-20  9:51   ` Marc Zyngier
@ 2020-05-20 11:28     ` John Garry
  0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2020-05-20 11:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Qian Cai, Stephen Rothwell, Linux ARM, Linux Kernel Mailing List,
	Linux Next Mailing List, Thomas Gleixner

On 20/05/2020 10:51, Marc Zyngier wrote:
> Hi John,
> 
> On 2020-05-20 09:43, John Garry wrote:
>> On 19/05/2020 23:09, Qian Cai wrote:
>>> Reverted the linux-next commit f068a62c548c ("irqchip/gic-v3-its:
>>> Balance initial LPI affinity across CPUs") fixed these warnings during
>>> boot,
>>
>> Thanks for the notice. So we need the following set to see this:
>> CONFIG_CPUMASK_OFFSTACK=y
>> CONFIG_DEBUG_ATOMIC_SLEEP=y
>> CONFIG_DEBUG_PER_CPU_MAPS=y
> 
> Ah, thanks for pointing this out.
> 
>>> its_select_cpu at drivers/irqchip/irq-gic-v3-its.c:1572
>>>
>>> [  332.819381][ T3359] BUG: sleeping function called from invalid
>>> context at mm/slab.h:568
>>> [  332.827405][ T3359] in_atomic(): 1, irqs_disabled(): 128,
>>> non_block: 0, pid: 3359, name: irqbalance
>>> [  332.836455][ T3359] INFO: lockdep is turned off.
>>> [  332.841076][ T3359] irq event stamp: 0
>>> [  332.844836][ T3359] hardirqs last  enabled at (0): 
>>> [<0000000000000000>] 0x0
>>> [  332.851828][ T3359] hardirqs last disabled at (0):
>>> [<ffff9000101ea65c>] copy_process+0x98c/0x1f34
>>> [  332.860710][ T3359] softirqs last  enabled at (0):
>>> [<ffff9000101ea690>] copy_process+0x9c0/0x1f34
>>> [  332.869586][ T3359] softirqs last disabled at (0): 
>>> [<0000000000000000>] 0x0
>>> [  332.876560][ T3359] CPU: 155 PID: 3359 Comm: irqbalance Tainted: G
>>>        W    L    5.7.0-rc6-next-20200519 #1
>>> [  332.886563][ T3359] Hardware name: HPE Apollo 70
>>> /C01_APACHE_MB         , BIOS L50_5.13_1.11 06/18/2019
>>> [  332.897000][ T3359] Call trace:
>>> [  332.900151][ T3359]  dump_backtrace+0x0/0x22c
>>> [  332.904514][ T3359]  show_stack+0x28/0x34
>>> [  332.908543][ T3359]  dump_stack+0x104/0x194
>>> [  332.912738][ T3359]  ___might_sleep+0x314/0x328
>>> [  332.917274][ T3359]  __might_sleep+0x7c/0xe0
>>> [  332.921563][ T3359]  slab_pre_alloc_hook+0x44/0x8c
>>> [  332.926360][ T3359]  __kmalloc_node+0xb0/0x618
>>> [  332.930811][ T3359]  alloc_cpumask_var_node+0x48/0x94
>>
>> We could use GFP_ATOMIC flag at the callsite here, but maybe there is
>> a better solution.
> 
> I don't see one, and I doubt it is worth the hassle to have anything
> but GFP_ATOMIC. The default arm64 config is to have on-stack cpumasks,
> and only DEBUG_PER_CPU_MAPS allows this to be changed.

JFYI, I was able to recreate on my D05:

[11.951739] CPU: 21 PID: 1 Comm: swapper/0 Tainted: G 
W5.7.0-rc6-next-20200519-dirty #16
[11.961262] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
IT21 Nemo 2.0 RC0 04/18/2018
[11.970435] Call trace:
[11.972890]  dump_backtrace+0x0/0x1b0
[11.976561]  show_stack+0x14/0x1c
[11.979884]  dump_stack+0xc0/0xf0
[11.983207]  ___might_sleep+0x10c/0x12c
[11.987052]  __might_sleep+0x4c/0x80
[11.990637]  __kmalloc_node+0x1ac/0x2dc
[11.994483]  alloc_cpumask_var_node+0x28/0x60
[11.998852]  alloc_cpumask_var+0x10/0x18
[12.002787]  its_select_cpu+0x2c/0x19c
[12.006546]  its_irq_domain_activate+0x38/0xf0
[12.011004]  __irq_domain_activate_irq+0x6c/0xac
[12.011108] sd 0:0:7:0: [sdh] Attached SCSI disk
[12.015636]  __irq_domain_activate_irq+0x34/0xac
[12.015638]  __irq_domain_activate_irq+0x34/0xac
[12.015640]  irq_domain_activate_irq+0x3c/0x4c
[12.015643]  irq_activate+0x20/0x30
[12.037482]  __setup_irq+0x538/0x6ec
[12.041065]  request_threaded_irq+0xdc/0x18c
[12.045350]  usb_add_hcd+0x4b0/0x6c0
[12.048934]  ehci_platform_probe+0x1f4/0x438
[12.053218]  platform_drv_probe+0x4c/0xa0
[12.057239]  really_probe+0xf4/0x35c
[12.060823]  driver_probe_device+0x54/0xb0
[12.064931]  device_driver_attach+0x68/0x70
[12.069126]  __driver_attach+0x9c/0xf8
[12.072884]  bus_for_each_dev+0x50/0xa0
[12.076730]  driver_attach+0x20/0x28
[12.080313]  bus_add_driver+0x148/0x1fc
[12.084159]  driver_register+0x6c/0x124
[12.088005]  __platform_driver_register+0x48/0x50
[12.092727]  ehci_platform_init+0x54/0x64
[12.096747]  do_one_initcall+0x50/0x194
[12.100593]  kernel_init_freeable+0x1e4/0x250
[12.104964]  kernel_init+0x10/0x108
[12.108460]  ret_from_fork+0x10/0x18
[12.112077] ehci-platform PNP0D20:00: irq 361, io mem 0xa7020000

And setting GFP_ATOMIC removes that BUG.

Cheers,
John


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 22:09 BUG: sleeping function called from atomic due to "Balance initial LPI affinity across CPUs" Qian Cai
2020-05-20  8:43 ` John Garry
2020-05-20  9:51   ` Marc Zyngier
2020-05-20 11:28     ` John Garry

Linux-Next Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-next/0 linux-next/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-next linux-next/ https://lore.kernel.org/linux-next \
		linux-next@vger.kernel.org
	public-inbox-index linux-next

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-next


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git