All of lore.kernel.org
 help / color / mirror / Atom feed
* v5.19-rc2-rt3: nouveau might sleep splat
@ 2022-06-18  9:25 Mike Galbraith
  2022-06-24  8:42 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2022-06-18  9:25 UTC (permalink / raw)
  To: RT; +Cc: Sebastian Andrzej Siewior

I haven't done any digging to see why this now happens where it
previously did not.

[    3.867408] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[    3.867409] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 387, name: systemd-udevd
[    3.867409] preempt_count: 0, expected: 0
[    3.867410] RCU nest depth: 1, expected: 1
[    3.867411] CPU: 4 PID: 387 Comm: systemd-udevd Tainted: G        W   E     5.19.0.g0639b59-master-rt #3 2864531ca0be0aecd9a6687c6a7534c103a4beb1
[    3.867413] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    3.867413] Call Trace:
[    3.867415]  <TASK>
[    3.867416]  dump_stack_lvl+0x44/0x58
[    3.867421]  __might_resched+0x141/0x160
[    3.867425]  rt_spin_lock+0x2d/0x70
[    3.867428]  crng_make_state+0x76/0x1c0
[    3.867430]  _get_random_bytes.part.16+0x4e/0x110
[    3.867432]  ? nvkm_vmm_iter.constprop.11+0x311/0x860 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867512]  ? ___slab_alloc+0x31a/0x660
[    3.867514]  ? nvkm_vmm_sparse_unref_ptes+0x80/0x80 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867579]  ? gf100_vmm_pgt_sgl+0x230/0x230 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867644]  default_pointer+0x2a0/0x2d0
[    3.867647]  vsnprintf+0x35a/0x4c0
[    3.867649]  va_format.isra.16+0x6e/0xa0
[    3.867651]  vsnprintf+0x35a/0x4c0
[    3.867653]  vprintk_store+0x141/0x520
[    3.867655]  ? vsnprintf+0x48f/0x4c0
[    3.867657]  vprintk_emit+0x44/0x170
[    3.867659]  dev_vprintk_emit+0x185/0x1ae
[    3.867662]  ? dev_printk_emit+0x4a/0x66
[    3.867663]  dev_printk_emit+0x4a/0x66
[    3.867665]  ? kfree+0x165/0x340
[    3.867667]  _dev_info+0x64/0x80
[    3.867669]  ? nvc0_fbcon_fillrect+0x132/0x170 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867738]  nouveau_fbcon_create+0x2bc/0x4e0 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867805]  __drm_fb_helper_initial_config_and_unlock+0x30b/0x570 [drm_kms_helper ecc92b0a1aa52e1818472d4bed64af1f48245922]
[    3.867820]  nouveau_fbcon_init+0x158/0x1c0 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867885]  nouveau_drm_device_init+0x1db/0x840 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.867953]  ? pci_update_current_state+0x65/0xa0
[    3.867957]  nouveau_drm_probe+0x11b/0x220 [nouveau d5cea19cce3ba6cfae6272043bf13df497098c8d]
[    3.868023]  ? rt_spin_unlock+0x13/0x40
[    3.868025]  local_pci_probe+0x42/0x90
[    3.868027]  pci_device_probe+0x176/0x240
[    3.868029]  ? sysfs_do_create_link_sd.isra.2+0x68/0xd0
[    3.868031]  really_probe+0x108/0x3d0
[    3.868033]  __driver_probe_device+0x100/0x170
[    3.868035]  driver_probe_device+0x1f/0xa0
[    3.868036]  __driver_attach+0xaa/0x170
[    3.868038]  ? __device_attach_driver+0xe0/0xe0
[    3.868039]  ? __device_attach_driver+0xe0/0xe0
[    3.868040]  bus_for_each_dev+0x73/0xc0
[    3.868042]  bus_add_driver+0x153/0x240
[    3.868044]  ? 0xffffffffc083f000
[    3.868045]  driver_register+0x5b/0xf0
[    3.868046]  ? 0xffffffffc083f000
[    3.868047]  do_one_initcall+0x3e/0x260
[    3.868050]  ? kmem_cache_alloc_trace+0x116/0x260
[    3.868052]  do_init_module+0x4b/0x1f0
[    3.868054]  load_module+0x1891/0x22a0
[    3.868058]  ? __do_sys_finit_module+0xaf/0x120
[    3.868059]  __do_sys_finit_module+0xaf/0x120
[    3.868061]  do_syscall_64+0x58/0x80
[    3.868063]  ? ksys_mmap_pgoff+0xc9/0x1e0
[    3.868066]  ? syscall_exit_to_user_mode+0x17/0x30
[    3.868068]  ? do_syscall_64+0x67/0x80
[    3.868069]  ? syscall_exit_to_user_mode+0x17/0x30
[    3.868071]  ? do_syscall_64+0x67/0x80
[    3.868072]  ? do_syscall_64+0x67/0x80
[    3.868072]  ? do_syscall_64+0x67/0x80
[    3.868073]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[    3.868076] RIP: 0033:0x7fc66d483ec9
[    3.868078] Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48
[    3.868079] RSP: 002b:00007fff450b9a48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    3.868080] RAX: ffffffffffffffda RBX: 000055d3248168a0 RCX: 00007fc66d483ec9
[    3.868081] RDX: 0000000000000000 RSI: 00007fc66ddf779d RDI: 000000000000001b
[    3.868082] RBP: 00007fc66ddf779d R08: 0000000000000000 R09: 000055d3247f69a0
[    3.868083] R10: 000000000000001b R11: 0000000000000246 R12: 0000000000020000
[    3.868083] R13: 000055d32481e460 R14: 0000000000000000 R15: 000055d32481d260
[    3.868085]  </TASK>
[    3.867406] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0xa0000, bo 00000000cb6e75e2
[    3.868165] fbcon: nouveaudrmfb (fb0) is primary device
[    4.190312] Console: switching to colour frame buffer device 240x67
[    4.474555] nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[    4.502083] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-18  9:25 v5.19-rc2-rt3: nouveau might sleep splat Mike Galbraith
@ 2022-06-24  8:42 ` Sebastian Andrzej Siewior
  2022-06-24  9:52   ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-24  8:42 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT

On 2022-06-18 11:25:39 [+0200], Mike Galbraith wrote:
> I haven't done any digging to see why this now happens where it
> previously did not.

I don't see where interrupts have been disabled but they have been. The
printk path should yell earlier but did not the crng path is kind of
late and I don't see anything obvious in between.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24  8:42 ` Sebastian Andrzej Siewior
@ 2022-06-24  9:52   ` Mike Galbraith
  2022-06-24 10:18     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2022-06-24  9:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

On Fri, 2022-06-24 at 10:42 +0200, Sebastian Andrzej Siewior wrote:
> On 2022-06-18 11:25:39 [+0200], Mike Galbraith wrote:
> > I haven't done any digging to see why this now happens where it
> > previously did not.
>
> I don't see where interrupts have been disabled but they have been. The
> printk path should yell earlier but did not the crng path is kind of
> late and I don't see anything obvious in between.

Allegedly vprintk_store().

(gdb) list *vprintk_store+0x3cc
0xffffffff8114283c is in vprintk_store (kernel/printk/printk.c:2280).
warning: Source file is more recent than executable.
2275            u32 caller_id;
2276            u16 text_len;
2277            int ret = 0;
2278            u64 ts_nsec;
2279
2280            if (!printk_enter_irqsave(recursion_ptr, irqflags))
2281                    return 0;
2282
2283            /*
2284             * Since the duration of printk() can vary depending on the message
(gdb)

[    6.520357] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[    6.520358] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 399, name: systemd-udevd
[    6.520359] preempt_count: 0, expected: 0
[    6.520359] RCU nest depth: 1, expected: 1
[    6.520360] 5 locks held by systemd-udevd/399:
[    6.520361]  #0: ffff909a815821a8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0xa1/0x170
[    6.520367]  #1: ffff909a912b0a98 (&helper->lock){+.+.}-{3:3}, at: drm_fb_helper_initial_config+0x26/0x40 [drm_kms_helper]
[    6.520383]  #2: ffffffffbd8e91b8 (filling){....}-{2:2}, at: default_pointer+0x17e/0x2d0
[    6.520387]  #3: ffffffffbd7929a0 (rcu_read_lock){....}-{1:2}, at: rt_spin_trylock+0x36/0xd0
[    6.520391]  #4: ffff909d8ea335a0 ((crngs.lock)){+.+.}-{2:2}, at: crng_make_state+0x76/0x1c0
[    6.520394] irq event stamp: 843364
[    6.520394] hardirqs last  enabled at (843363): [<ffffffffbc7dfb10>] _raw_spin_unlock_irqrestore+0x30/0x77
[    6.520397] hardirqs last disabled at (843364): [<ffffffffbbd4283c>] vprintk_store+0x3cc/0x550
[    6.520399] softirqs last  enabled at (151950): [<ffffffffbbcb8223>] __local_bh_enable_ip+0xc3/0x150
[    6.520401] softirqs last disabled at (151944): [<ffffffffbbe8c135>] btf_alloc_id+0x5/0xb0
[    6.520404] CPU: 0 PID: 399 Comm: systemd-udevd Tainted: G        W   E     5.19.0.gca1fdab-master-rt #5 2262f40963ee88d2873a82a86160b299ace0d165
[    6.520406] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    6.520407] Call Trace:
[    6.520408]  <TASK>
[    6.520409]  dump_stack_lvl+0x55/0x6d
[    6.520413]  __might_resched+0x1aa/0x1e0
[    6.520418]  rt_spin_lock+0x4d/0xe0
[    6.520419]  ? crng_make_state+0x76/0x1c0
[    6.520422]  crng_make_state+0x76/0x1c0
[    6.520426]  _get_random_bytes.part.16+0x4e/0x110
[    6.520428]  ? rt_spin_trylock+0x36/0xd0
[    6.520432]  ? preempt_count_sub+0x43/0x50
[    6.520434]  ? rt_spin_trylock+0x8b/0xd0
[    6.520435]  ? rt_spin_trylock+0x8b/0xd0
[    6.520438]  ? migrate_disable+0x119/0x120
[    6.520442]  default_pointer+0x2a0/0x2d0
[    6.520448]  vsnprintf+0x35a/0x4c0
[    6.520449]  ? kernel_text_address+0xaa/0xc0
[    6.520454]  va_format.isra.16+0x6e/0xa0
[    6.520459]  vsnprintf+0x35a/0x4c0
[    6.520464]  vprintk_store+0x150/0x550
[    6.520468]  ? string+0x40/0x50
[    6.520472]  ? vsnprintf+0x48f/0x4c0
[    6.520480]  vprintk_emit+0x44/0x170
[    6.520482]  ? __lock_acquire+0xb9d/0x18b0
[    6.520487]  dev_vprintk_emit+0x185/0x1ae
[    6.520494]  ? lock_is_held_type+0xb2/0x110
[    6.520499]  ? dev_printk_emit+0x4a/0x66
[    6.520501]  ? mark_held_locks+0x49/0x70
[    6.520502]  dev_printk_emit+0x4a/0x66
[    6.520506]  ? preempt_count_sub+0x43/0x50
[    6.520510]  _dev_info+0x64/0x80
[    6.520518]  nouveau_fbcon_create+0x2d4/0x500 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
[    6.520615]  __drm_fb_helper_initial_config_and_unlock+0x336/0x5f0 [drm_kms_helper 8e2e6f207f0cb44a547a65ef3965573df1212de6]
[    6.520635]  nouveau_fbcon_init+0x180/0x1f0 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
[    6.520706]  nouveau_drm_device_init+0x1f7/0x860 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
[    6.520778]  ? pci_update_current_state+0x65/0xa0
[    6.520785]  nouveau_drm_probe+0x11b/0x220 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
[    6.520856]  local_pci_probe+0x42/0x90
[    6.520860]  pci_device_probe+0x176/0x240
[    6.520862]  ? sysfs_do_create_link_sd.isra.2+0x68/0xd0
[    6.520869]  really_probe+0x108/0x3d0
[    6.520873]  __driver_probe_device+0x100/0x170
[    6.520876]  driver_probe_device+0x1f/0xa0
[    6.520879]  __driver_attach+0xac/0x170
[    6.520881]  ? __device_attach_driver+0xe0/0xe0
[    6.520884]  ? __device_attach_driver+0xe0/0xe0
[    6.520886]  bus_for_each_dev+0x76/0xc0
[    6.520890]  bus_add_driver+0x19f/0x240
[    6.520893]  ? 0xffffffffc06ec000
[    6.520896]  driver_register+0x5b/0xf0
[    6.520898]  ? 0xffffffffc06ec000
[    6.520899]  do_one_initcall+0x58/0x3d0
[    6.520903]  ? trace_kmalloc+0xb6/0x140
[    6.520905]  ? kmem_cache_alloc_trace+0x13c/0x1c0
[    6.520907]  ? do_init_module+0x22/0x1fc
[    6.520914]  do_init_module+0x4b/0x1fc
[    6.520918]  load_module+0x18d2/0x2380
[    6.520936]  ? __do_sys_finit_module+0xaf/0x120
[    6.520937]  __do_sys_finit_module+0xaf/0x120
[    6.520948]  do_syscall_64+0x58/0x80
[    6.520951]  ? do_syscall_64+0x67/0x80
[    6.520953]  ? lockdep_hardirqs_on_prepare+0xd8/0x170
[    6.520957]  ? do_syscall_64+0x67/0x80
[    6.520958]  ? do_syscall_64+0x67/0x80
[    6.520962]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[    6.520965] RIP: 0033:0x7f911ee69ec9
[    6.520967] Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48
[    6.520968] RSP: 002b:00007ffe6427de98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    6.520969] RAX: ffffffffffffffda RBX: 0000558cc2ab38a0 RCX: 00007f911ee69ec9
[    6.520970] RDX: 0000000000000000 RSI: 00007f911f7dd79d RDI: 000000000000001b
[    6.520971] RBP: 00007f911f7dd79d R08: 0000000000000000 R09: 0000558cc2a939a0
[    6.520971] R10: 000000000000001b R11: 0000000000000246 R12: 0000000000020000
[    6.520972] R13: 0000558cc2abec00 R14: 0000000000000000 R15: 0000558cc2ac7f80


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24  9:52   ` Mike Galbraith
@ 2022-06-24 10:18     ` Sebastian Andrzej Siewior
  2022-06-24 10:19       ` Sebastian Andrzej Siewior
  2022-06-24 14:30       ` John Ogness
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-24 10:18 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT

+ John

On 2022-06-24 11:52:26 [+0200], Mike Galbraith wrote:
> On Fri, 2022-06-24 at 10:42 +0200, Sebastian Andrzej Siewior wrote:
> > On 2022-06-18 11:25:39 [+0200], Mike Galbraith wrote:
> > > I haven't done any digging to see why this now happens where it
> > > previously did not.
> >
> > I don't see where interrupts have been disabled but they have been. The
> > printk path should yell earlier but did not the crng path is kind of
> > late and I don't see anything obvious in between.
> 
> Allegedly vprintk_store().
> 
> (gdb) list *vprintk_store+0x3cc
> 0xffffffff8114283c is in vprintk_store (kernel/printk/printk.c:2280).
> warning: Source file is more recent than executable.
> 2275            u32 caller_id;
> 2276            u16 text_len;
> 2277            int ret = 0;
> 2278            u64 ts_nsec;
> 2279
> 2280            if (!printk_enter_irqsave(recursion_ptr, irqflags))
> 2281                    return 0;
> 2282
> 2283            /*
> 2284             * Since the duration of printk() can vary depending on the message
> (gdb)

ehm. Why do we need to keep interrupts disabled for vsnprintf()?
Wouldn't it work to move that lock a few lines below where it is
actually needed?

> [    6.520357] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> [    6.520358] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 399, name: systemd-udevd
> [    6.520359] preempt_count: 0, expected: 0
> [    6.520359] RCU nest depth: 1, expected: 1
> [    6.520360] 5 locks held by systemd-udevd/399:
> [    6.520361]  #0: ffff909a815821a8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0xa1/0x170
> [    6.520367]  #1: ffff909a912b0a98 (&helper->lock){+.+.}-{3:3}, at: drm_fb_helper_initial_config+0x26/0x40 [drm_kms_helper]
> [    6.520383]  #2: ffffffffbd8e91b8 (filling){....}-{2:2}, at: default_pointer+0x17e/0x2d0
> [    6.520387]  #3: ffffffffbd7929a0 (rcu_read_lock){....}-{1:2}, at: rt_spin_trylock+0x36/0xd0
> [    6.520391]  #4: ffff909d8ea335a0 ((crngs.lock)){+.+.}-{2:2}, at: crng_make_state+0x76/0x1c0
> [    6.520394] irq event stamp: 843364
> [    6.520394] hardirqs last  enabled at (843363): [<ffffffffbc7dfb10>] _raw_spin_unlock_irqrestore+0x30/0x77
> [    6.520397] hardirqs last disabled at (843364): [<ffffffffbbd4283c>] vprintk_store+0x3cc/0x550
> [    6.520399] softirqs last  enabled at (151950): [<ffffffffbbcb8223>] __local_bh_enable_ip+0xc3/0x150
> [    6.520401] softirqs last disabled at (151944): [<ffffffffbbe8c135>] btf_alloc_id+0x5/0xb0
> [    6.520404] CPU: 0 PID: 399 Comm: systemd-udevd Tainted: G        W   E     5.19.0.gca1fdab-master-rt #5 2262f40963ee88d2873a82a86160b299ace0d165
> [    6.520406] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [    6.520407] Call Trace:
> [    6.520408]  <TASK>
> [    6.520409]  dump_stack_lvl+0x55/0x6d
> [    6.520413]  __might_resched+0x1aa/0x1e0
> [    6.520418]  rt_spin_lock+0x4d/0xe0
> [    6.520419]  ? crng_make_state+0x76/0x1c0
> [    6.520422]  crng_make_state+0x76/0x1c0
> [    6.520426]  _get_random_bytes.part.16+0x4e/0x110
> [    6.520428]  ? rt_spin_trylock+0x36/0xd0
> [    6.520432]  ? preempt_count_sub+0x43/0x50
> [    6.520434]  ? rt_spin_trylock+0x8b/0xd0
> [    6.520435]  ? rt_spin_trylock+0x8b/0xd0
> [    6.520438]  ? migrate_disable+0x119/0x120
> [    6.520442]  default_pointer+0x2a0/0x2d0
> [    6.520448]  vsnprintf+0x35a/0x4c0
> [    6.520449]  ? kernel_text_address+0xaa/0xc0
> [    6.520454]  va_format.isra.16+0x6e/0xa0
> [    6.520459]  vsnprintf+0x35a/0x4c0
> [    6.520464]  vprintk_store+0x150/0x550
> [    6.520468]  ? string+0x40/0x50
> [    6.520472]  ? vsnprintf+0x48f/0x4c0
> [    6.520480]  vprintk_emit+0x44/0x170
> [    6.520482]  ? __lock_acquire+0xb9d/0x18b0
> [    6.520487]  dev_vprintk_emit+0x185/0x1ae
> [    6.520494]  ? lock_is_held_type+0xb2/0x110
> [    6.520499]  ? dev_printk_emit+0x4a/0x66
> [    6.520501]  ? mark_held_locks+0x49/0x70
> [    6.520502]  dev_printk_emit+0x4a/0x66
> [    6.520506]  ? preempt_count_sub+0x43/0x50
> [    6.520510]  _dev_info+0x64/0x80
> [    6.520518]  nouveau_fbcon_create+0x2d4/0x500 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> [    6.520615]  __drm_fb_helper_initial_config_and_unlock+0x336/0x5f0 [drm_kms_helper 8e2e6f207f0cb44a547a65ef3965573df1212de6]
> [    6.520635]  nouveau_fbcon_init+0x180/0x1f0 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> [    6.520706]  nouveau_drm_device_init+0x1f7/0x860 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> [    6.520778]  ? pci_update_current_state+0x65/0xa0
> [    6.520785]  nouveau_drm_probe+0x11b/0x220 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> [    6.520856]  local_pci_probe+0x42/0x90
> [    6.520860]  pci_device_probe+0x176/0x240
> [    6.520862]  ? sysfs_do_create_link_sd.isra.2+0x68/0xd0
> [    6.520869]  really_probe+0x108/0x3d0
> [    6.520873]  __driver_probe_device+0x100/0x170
> [    6.520876]  driver_probe_device+0x1f/0xa0
> [    6.520879]  __driver_attach+0xac/0x170
> [    6.520881]  ? __device_attach_driver+0xe0/0xe0
> [    6.520884]  ? __device_attach_driver+0xe0/0xe0
> [    6.520886]  bus_for_each_dev+0x76/0xc0
> [    6.520890]  bus_add_driver+0x19f/0x240
> [    6.520893]  ? 0xffffffffc06ec000
> [    6.520896]  driver_register+0x5b/0xf0
> [    6.520898]  ? 0xffffffffc06ec000
> [    6.520899]  do_one_initcall+0x58/0x3d0
> [    6.520903]  ? trace_kmalloc+0xb6/0x140
> [    6.520905]  ? kmem_cache_alloc_trace+0x13c/0x1c0
> [    6.520907]  ? do_init_module+0x22/0x1fc
> [    6.520914]  do_init_module+0x4b/0x1fc
> [    6.520918]  load_module+0x18d2/0x2380
> [    6.520936]  ? __do_sys_finit_module+0xaf/0x120
> [    6.520937]  __do_sys_finit_module+0xaf/0x120
> [    6.520948]  do_syscall_64+0x58/0x80
> [    6.520951]  ? do_syscall_64+0x67/0x80
> [    6.520953]  ? lockdep_hardirqs_on_prepare+0xd8/0x170
> [    6.520957]  ? do_syscall_64+0x67/0x80
> [    6.520958]  ? do_syscall_64+0x67/0x80
> [    6.520962]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [    6.520965] RIP: 0033:0x7f911ee69ec9
> [    6.520967] Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48
> [    6.520968] RSP: 002b:00007ffe6427de98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [    6.520969] RAX: ffffffffffffffda RBX: 0000558cc2ab38a0 RCX: 00007f911ee69ec9
> [    6.520970] RDX: 0000000000000000 RSI: 00007f911f7dd79d RDI: 000000000000001b
> [    6.520971] RBP: 00007f911f7dd79d R08: 0000000000000000 R09: 0000558cc2a939a0
> [    6.520971] R10: 000000000000001b R11: 0000000000000246 R12: 0000000000020000
> [    6.520972] R13: 0000558cc2abec00 R14: 0000000000000000 R15: 0000558cc2ac7f80

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 10:18     ` Sebastian Andrzej Siewior
@ 2022-06-24 10:19       ` Sebastian Andrzej Siewior
  2022-06-24 14:30       ` John Ogness
  1 sibling, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-24 10:19 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT, John Ogness

now really + John

On 2022-06-24 12:18:52 [+0200], To Mike Galbraith wrote:
> + John
> 
> On 2022-06-24 11:52:26 [+0200], Mike Galbraith wrote:
> > On Fri, 2022-06-24 at 10:42 +0200, Sebastian Andrzej Siewior wrote:
> > > On 2022-06-18 11:25:39 [+0200], Mike Galbraith wrote:
> > > > I haven't done any digging to see why this now happens where it
> > > > previously did not.
> > >
> > > I don't see where interrupts have been disabled but they have been. The
> > > printk path should yell earlier but did not the crng path is kind of
> > > late and I don't see anything obvious in between.
> > 
> > Allegedly vprintk_store().
> > 
> > (gdb) list *vprintk_store+0x3cc
> > 0xffffffff8114283c is in vprintk_store (kernel/printk/printk.c:2280).
> > warning: Source file is more recent than executable.
> > 2275            u32 caller_id;
> > 2276            u16 text_len;
> > 2277            int ret = 0;
> > 2278            u64 ts_nsec;
> > 2279
> > 2280            if (!printk_enter_irqsave(recursion_ptr, irqflags))
> > 2281                    return 0;
> > 2282
> > 2283            /*
> > 2284             * Since the duration of printk() can vary depending on the message
> > (gdb)
> 
> ehm. Why do we need to keep interrupts disabled for vsnprintf()?
> Wouldn't it work to move that lock a few lines below where it is
> actually needed?
> 
> > [    6.520357] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> > [    6.520358] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 399, name: systemd-udevd
> > [    6.520359] preempt_count: 0, expected: 0
> > [    6.520359] RCU nest depth: 1, expected: 1
> > [    6.520360] 5 locks held by systemd-udevd/399:
> > [    6.520361]  #0: ffff909a815821a8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0xa1/0x170
> > [    6.520367]  #1: ffff909a912b0a98 (&helper->lock){+.+.}-{3:3}, at: drm_fb_helper_initial_config+0x26/0x40 [drm_kms_helper]
> > [    6.520383]  #2: ffffffffbd8e91b8 (filling){....}-{2:2}, at: default_pointer+0x17e/0x2d0
> > [    6.520387]  #3: ffffffffbd7929a0 (rcu_read_lock){....}-{1:2}, at: rt_spin_trylock+0x36/0xd0
> > [    6.520391]  #4: ffff909d8ea335a0 ((crngs.lock)){+.+.}-{2:2}, at: crng_make_state+0x76/0x1c0
> > [    6.520394] irq event stamp: 843364
> > [    6.520394] hardirqs last  enabled at (843363): [<ffffffffbc7dfb10>] _raw_spin_unlock_irqrestore+0x30/0x77
> > [    6.520397] hardirqs last disabled at (843364): [<ffffffffbbd4283c>] vprintk_store+0x3cc/0x550
> > [    6.520399] softirqs last  enabled at (151950): [<ffffffffbbcb8223>] __local_bh_enable_ip+0xc3/0x150
> > [    6.520401] softirqs last disabled at (151944): [<ffffffffbbe8c135>] btf_alloc_id+0x5/0xb0
> > [    6.520404] CPU: 0 PID: 399 Comm: systemd-udevd Tainted: G        W   E     5.19.0.gca1fdab-master-rt #5 2262f40963ee88d2873a82a86160b299ace0d165
> > [    6.520406] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> > [    6.520407] Call Trace:
> > [    6.520408]  <TASK>
> > [    6.520409]  dump_stack_lvl+0x55/0x6d
> > [    6.520413]  __might_resched+0x1aa/0x1e0
> > [    6.520418]  rt_spin_lock+0x4d/0xe0
> > [    6.520419]  ? crng_make_state+0x76/0x1c0
> > [    6.520422]  crng_make_state+0x76/0x1c0
> > [    6.520426]  _get_random_bytes.part.16+0x4e/0x110
> > [    6.520428]  ? rt_spin_trylock+0x36/0xd0
> > [    6.520432]  ? preempt_count_sub+0x43/0x50
> > [    6.520434]  ? rt_spin_trylock+0x8b/0xd0
> > [    6.520435]  ? rt_spin_trylock+0x8b/0xd0
> > [    6.520438]  ? migrate_disable+0x119/0x120
> > [    6.520442]  default_pointer+0x2a0/0x2d0
> > [    6.520448]  vsnprintf+0x35a/0x4c0
> > [    6.520449]  ? kernel_text_address+0xaa/0xc0
> > [    6.520454]  va_format.isra.16+0x6e/0xa0
> > [    6.520459]  vsnprintf+0x35a/0x4c0
> > [    6.520464]  vprintk_store+0x150/0x550
> > [    6.520468]  ? string+0x40/0x50
> > [    6.520472]  ? vsnprintf+0x48f/0x4c0
> > [    6.520480]  vprintk_emit+0x44/0x170
> > [    6.520482]  ? __lock_acquire+0xb9d/0x18b0
> > [    6.520487]  dev_vprintk_emit+0x185/0x1ae
> > [    6.520494]  ? lock_is_held_type+0xb2/0x110
> > [    6.520499]  ? dev_printk_emit+0x4a/0x66
> > [    6.520501]  ? mark_held_locks+0x49/0x70
> > [    6.520502]  dev_printk_emit+0x4a/0x66
> > [    6.520506]  ? preempt_count_sub+0x43/0x50
> > [    6.520510]  _dev_info+0x64/0x80
> > [    6.520518]  nouveau_fbcon_create+0x2d4/0x500 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> > [    6.520615]  __drm_fb_helper_initial_config_and_unlock+0x336/0x5f0 [drm_kms_helper 8e2e6f207f0cb44a547a65ef3965573df1212de6]
> > [    6.520635]  nouveau_fbcon_init+0x180/0x1f0 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> > [    6.520706]  nouveau_drm_device_init+0x1f7/0x860 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> > [    6.520778]  ? pci_update_current_state+0x65/0xa0
> > [    6.520785]  nouveau_drm_probe+0x11b/0x220 [nouveau adfaa6c52e612d43880f223e1d4b1d7b83facd58]
> > [    6.520856]  local_pci_probe+0x42/0x90
> > [    6.520860]  pci_device_probe+0x176/0x240
> > [    6.520862]  ? sysfs_do_create_link_sd.isra.2+0x68/0xd0
> > [    6.520869]  really_probe+0x108/0x3d0
> > [    6.520873]  __driver_probe_device+0x100/0x170
> > [    6.520876]  driver_probe_device+0x1f/0xa0
> > [    6.520879]  __driver_attach+0xac/0x170
> > [    6.520881]  ? __device_attach_driver+0xe0/0xe0
> > [    6.520884]  ? __device_attach_driver+0xe0/0xe0
> > [    6.520886]  bus_for_each_dev+0x76/0xc0
> > [    6.520890]  bus_add_driver+0x19f/0x240
> > [    6.520893]  ? 0xffffffffc06ec000
> > [    6.520896]  driver_register+0x5b/0xf0
> > [    6.520898]  ? 0xffffffffc06ec000
> > [    6.520899]  do_one_initcall+0x58/0x3d0
> > [    6.520903]  ? trace_kmalloc+0xb6/0x140
> > [    6.520905]  ? kmem_cache_alloc_trace+0x13c/0x1c0
> > [    6.520907]  ? do_init_module+0x22/0x1fc
> > [    6.520914]  do_init_module+0x4b/0x1fc
> > [    6.520918]  load_module+0x18d2/0x2380
> > [    6.520936]  ? __do_sys_finit_module+0xaf/0x120
> > [    6.520937]  __do_sys_finit_module+0xaf/0x120
> > [    6.520948]  do_syscall_64+0x58/0x80
> > [    6.520951]  ? do_syscall_64+0x67/0x80
> > [    6.520953]  ? lockdep_hardirqs_on_prepare+0xd8/0x170
> > [    6.520957]  ? do_syscall_64+0x67/0x80
> > [    6.520958]  ? do_syscall_64+0x67/0x80
> > [    6.520962]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [    6.520965] RIP: 0033:0x7f911ee69ec9
> > [    6.520967] Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48
> > [    6.520968] RSP: 002b:00007ffe6427de98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > [    6.520969] RAX: ffffffffffffffda RBX: 0000558cc2ab38a0 RCX: 00007f911ee69ec9
> > [    6.520970] RDX: 0000000000000000 RSI: 00007f911f7dd79d RDI: 000000000000001b
> > [    6.520971] RBP: 00007f911f7dd79d R08: 0000000000000000 R09: 0000558cc2a939a0
> > [    6.520971] R10: 000000000000001b R11: 0000000000000246 R12: 0000000000020000
> > [    6.520972] R13: 0000558cc2abec00 R14: 0000000000000000 R15: 0000558cc2ac7f80
> 
> Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 10:18     ` Sebastian Andrzej Siewior
  2022-06-24 10:19       ` Sebastian Andrzej Siewior
@ 2022-06-24 14:30       ` John Ogness
  2022-06-24 14:31         ` John Ogness
  1 sibling, 1 reply; 15+ messages in thread
From: John Ogness @ 2022-06-24 14:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Mike Galbraith; +Cc: RT

On 2022-06-24, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>> (gdb) list *vprintk_store+0x3cc
>> 0xffffffff8114283c is in vprintk_store (kernel/printk/printk.c:2280).
>> warning: Source file is more recent than executable.
>> 2275            u32 caller_id;
>> 2276            u16 text_len;
>> 2277            int ret = 0;
>> 2278            u64 ts_nsec;
>> 2279
>> 2280            if (!printk_enter_irqsave(recursion_ptr, irqflags))
>> 2281                    return 0;
>> 2282
>> 2283            /*
>> 2284             * Since the duration of printk() can vary depending on the message
>> (gdb)
>
> Why do we need to keep interrupts disabled for vsnprintf()?

vsnprintf is printing the current CPU local clock and the current CPU
ID. Are you recommending we split printk_enter_irqsave() into various
preempt_disable() and local_irq_save() calls?

As to this the actual problem reported in this thread, I believe this
patch solves it:

https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com

John Ogness

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 14:30       ` John Ogness
@ 2022-06-24 14:31         ` John Ogness
  2022-06-24 14:39           ` Sebastian Andrzej Siewior
  2022-07-21 13:06           ` Mike Galbraith
  0 siblings, 2 replies; 15+ messages in thread
From: John Ogness @ 2022-06-24 14:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Mike Galbraith; +Cc: RT

On 2022-06-24, John Ogness <john.ogness@linutronix.de> wrote:
> As to this the actual problem reported in this thread, I believe this
> patch solves it:
>
> https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com

There was a cleaner v2 later in the thread:

https://lore.kernel.org/lkml/20220609123319.17576-1-Jason@zx2c4.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 14:31         ` John Ogness
@ 2022-06-24 14:39           ` Sebastian Andrzej Siewior
  2022-06-25  3:26             ` Mike Galbraith
  2022-07-21 13:06           ` Mike Galbraith
  1 sibling, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-06-24 14:39 UTC (permalink / raw)
  To: John Ogness; +Cc: Mike Galbraith, RT

On 2022-06-24 16:37:25 [+0206], John Ogness wrote:
> On 2022-06-24, John Ogness <john.ogness@linutronix.de> wrote:
> > As to this the actual problem reported in this thread, I believe this
> > patch solves it:
> >
> > https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com
> 
> There was a cleaner v2 later in the thread:
> 
> https://lore.kernel.org/lkml/20220609123319.17576-1-Jason@zx2c4.com

In that case v5.19-rc3-rt5 should no longer show it but I am still
confused why we need to disable interrupts within vsnprintf().

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 14:39           ` Sebastian Andrzej Siewior
@ 2022-06-25  3:26             ` Mike Galbraith
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2022-06-25  3:26 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, John Ogness; +Cc: RT

On Fri, 2022-06-24 at 16:39 +0200, Sebastian Andrzej Siewior wrote:
> On 2022-06-24 16:37:25 [+0206], John Ogness wrote:
> > On 2022-06-24, John Ogness <john.ogness@linutronix.de> wrote:
> > > As to this the actual problem reported in this thread, I believe
> > > this
> > > patch solves it:
> > >
> > > https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com
> >
> > There was a cleaner v2 later in the thread:
> >
> > https://lore.kernel.org/lkml/20220609123319.17576-1-Jason@zx2c4.com
>
> In that case v5.19-rc3-rt5 should no longer show it but I am still
> confused why we need to disable interrupts within vsnprintf().

It says they're disabled to simplify recursion tracking.  I was
thinking maybe replace that with a local lock, but then the below
happened, rendering that little wart kinda moot for the moment.

07a22b61946f Revert "printk: add functions to prefer direct printing"
5831788afb17 Revert "printk: add kthread console printers"
2d9ef940f89e Revert "printk: extend console_lock for per-console locking"
007eeab7e9f0 Revert "printk: remove @console_locked"
05c96b3713aa Revert "printk: Block console kthreads when direct printing will be required"
20fb0c8272bb Revert "printk: Wait for the global console lock when the system is going down"


	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-06-24 14:31         ` John Ogness
  2022-06-24 14:39           ` Sebastian Andrzej Siewior
@ 2022-07-21 13:06           ` Mike Galbraith
  2022-07-25  8:15             ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2022-07-21 13:06 UTC (permalink / raw)
  To: John Ogness, Sebastian Andrzej Siewior; +Cc: RT

On Fri, 2022-06-24 at 16:37 +0206, John Ogness wrote:
> On 2022-06-24, John Ogness <john.ogness@linutronix.de> wrote:
> > As to this the actual problem reported in this thread, I believe
> > this
> > patch solves it:
> >
> > https://lore.kernel.org/lkml/20220609121709.12939-1-Jason@zx2c4.com
>
> There was a cleaner v2 later in the thread:
>
> https://lore.kernel.org/lkml/20220609123319.17576-1-Jason@zx2c4.com

That's in tip, but pulling my bandaid off...

[    3.903338] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[    3.903340] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 389, name: systemd-udevd
[    3.903340] preempt_count: 0, expected: 0
[    3.903341] RCU nest depth: 1, expected: 1
[    3.903342] CPU: 0 PID: 389 Comm: systemd-udevd Tainted: G            E     5.19.0.g96ff3a1-tip-rt #8 3a0ec7519629dbce1c881d809a8600cfad8358ca
[    3.903344] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    3.903345] Call Trace:
[    3.903346]  <TASK>
[    3.903347]  dump_stack_lvl+0x44/0x5c
[    3.903352]  ? migrate_disable+0x37/0xa0
[    3.903354]  __might_resched+0x171/0x1c0
[    3.903357]  rt_spin_lock+0x2d/0x70
[    3.903361]  crng_make_state+0x80/0x1d0
[    3.903364]  _get_random_bytes.part.15+0x4e/0x120
[    3.903366]  ? nvkm_vmm_iter.constprop.11+0x311/0x860 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
[    3.903449]  ? nvkm_vmm_sparse_unref_ptes+0x80/0x80 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
[    3.903518]  ? gf100_vmm_pgt_sgl+0x230/0x230 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
[    3.903588]  default_pointer+0x2a0/0x2d0
[    3.903591]  vsnprintf+0x35e/0x4c0
[    3.903593]  ? gf100_gr_object_new+0x2b/0x70 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
[    3.903669]  va_format.isra.16+0x6e/0xa0
[    3.903670]  vsnprintf+0x35e/0x4c0
[    3.903673]  vprintk_store+0x140/0x540

Bandaid:

printk: fix RT vprintk_store() might sleep splat

RT can't call vsnprintf() with IRQs disabled, so disable migration
and move the printk_enter_irqsave() call down to where it's needed.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/printk/printk.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2277,8 +2277,7 @@ int vprintk_store(int facility, int leve
 	int ret = 0;
 	u64 ts_nsec;

-	if (!printk_enter_irqsave(recursion_ptr, irqflags))
-		return 0;
+	migrate_disable();

 	/*
 	 * Since the duration of printk() can vary depending on the message
@@ -2313,6 +2312,11 @@ int vprintk_store(int facility, int leve
 	if (dev_info)
 		flags |= LOG_NEWLINE;

+	if (!printk_enter_irqsave(recursion_ptr, irqflags)) {
+		migrate_enable();
+		return 0;
+	}
+
 	if (flags & LOG_CONT) {
 		prb_rec_init_wr(&r, reserve_size);
 		if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) {
@@ -2369,6 +2373,7 @@ int vprintk_store(int facility, int leve
 	ret = text_len + trunc_msg_len;
 out:
 	printk_exit_irqrestore(recursion_ptr, irqflags);
+	migrate_enable();
 	return ret;
 }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-07-21 13:06           ` Mike Galbraith
@ 2022-07-25  8:15             ` Sebastian Andrzej Siewior
  2022-07-25  9:58               ` John Ogness
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-07-25  8:15 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: John Ogness, RT

On 2022-07-21 15:06:45 [+0200], Mike Galbraith wrote:
> That's in tip, but pulling my bandaid off...
> 
> [    3.903338] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> [    3.903340] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 389, name: systemd-udevd
> [    3.903340] preempt_count: 0, expected: 0
> [    3.903341] RCU nest depth: 1, expected: 1
> [    3.903342] CPU: 0 PID: 389 Comm: systemd-udevd Tainted: G            E     5.19.0.g96ff3a1-tip-rt #8 3a0ec7519629dbce1c881d809a8600cfad8358ca
> [    3.903344] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [    3.903345] Call Trace:
> [    3.903346]  <TASK>
> [    3.903347]  dump_stack_lvl+0x44/0x5c
> [    3.903352]  ? migrate_disable+0x37/0xa0
> [    3.903354]  __might_resched+0x171/0x1c0
> [    3.903357]  rt_spin_lock+0x2d/0x70
> [    3.903361]  crng_make_state+0x80/0x1d0
> [    3.903364]  _get_random_bytes.part.15+0x4e/0x120
> [    3.903366]  ? nvkm_vmm_iter.constprop.11+0x311/0x860 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
> [    3.903449]  ? nvkm_vmm_sparse_unref_ptes+0x80/0x80 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
> [    3.903518]  ? gf100_vmm_pgt_sgl+0x230/0x230 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
> [    3.903588]  default_pointer+0x2a0/0x2d0
> [    3.903591]  vsnprintf+0x35e/0x4c0
> [    3.903593]  ? gf100_gr_object_new+0x2b/0x70 [nouveau 9c7d46b64ad6b15f254390d4ef3bd03268a3030d]
> [    3.903669]  va_format.isra.16+0x6e/0xa0
> [    3.903670]  vsnprintf+0x35e/0x4c0
> [    3.903673]  vprintk_store+0x140/0x540

I remember asking why it is needed to disable interrupts across
vsnprintf(). John, can I take this as-it or are you sending a new batch?

> Bandaid:
> 
> printk: fix RT vprintk_store() might sleep splat
> 
> RT can't call vsnprintf() with IRQs disabled, so disable migration
> and move the printk_enter_irqsave() call down to where it's needed.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
>  kernel/printk/printk.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2277,8 +2277,7 @@ int vprintk_store(int facility, int leve
>  	int ret = 0;
>  	u64 ts_nsec;
> 
> -	if (!printk_enter_irqsave(recursion_ptr, irqflags))
> -		return 0;
> +	migrate_disable();
> 
>  	/*
>  	 * Since the duration of printk() can vary depending on the message
> @@ -2313,6 +2312,11 @@ int vprintk_store(int facility, int leve
>  	if (dev_info)
>  		flags |= LOG_NEWLINE;
> 
> +	if (!printk_enter_irqsave(recursion_ptr, irqflags)) {
> +		migrate_enable();
> +		return 0;
> +	}
> +
>  	if (flags & LOG_CONT) {
>  		prb_rec_init_wr(&r, reserve_size);
>  		if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) {
> @@ -2369,6 +2373,7 @@ int vprintk_store(int facility, int leve
>  	ret = text_len + trunc_msg_len;
>  out:
>  	printk_exit_irqrestore(recursion_ptr, irqflags);
> +	migrate_enable();
>  	return ret;
>  }
> 
Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-07-25  8:15             ` Sebastian Andrzej Siewior
@ 2022-07-25  9:58               ` John Ogness
  2022-07-25 13:26                 ` Mike Galbraith
  2022-08-03 16:10                 ` Petr Mladek
  0 siblings, 2 replies; 15+ messages in thread
From: John Ogness @ 2022-07-25  9:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Mike Galbraith; +Cc: RT, Petr Mladek

On 2022-07-25, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> I remember asking why it is needed to disable interrupts across
> vsnprintf(). John, can I take this as-it or are you sending a new
> batch?

Mike's patch only addresses the vsnprintf() to print the prefix and get
the message length. There is also a vscnprintf() for the message itself,
which is still happening with interrupts disabled (see
printk_sprint()). I suppose this patch side-steps the splat because the
first vsnprintf() already triggered the random bytes for the pointer
hash.

I am preparing a new series that addresses this by completely removing
interrupt disabling from the printk() path. This required changes to the
printk_enter() macro, recursion handling, and the printk-ringbuffer
itself.

The focus of my new series are the various non-RT issues reported during
the early 5.19-rc cycles. I still need more time to get the series into
shape for LKML.

If Mike's patch is reliably side-stepping the issue, feel free to take
it.

>> Bandaid:
>> 
>> printk: fix RT vprintk_store() might sleep splat
>> 
>> RT can't call vsnprintf() with IRQs disabled, so disable migration
>> and move the printk_enter_irqsave() call down to where it's needed.

It might be worth mentioning that vscnprintf() is called later in
vprintk_store() when interrupts are disabled, but since it is s-printing
the same string (and probably the same values), new random bytes are not
required. (But someone should verify the reasoning here. I am not really
familiar with the crng code.)

John Ogness

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-07-25  9:58               ` John Ogness
@ 2022-07-25 13:26                 ` Mike Galbraith
  2022-08-03 16:10                 ` Petr Mladek
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2022-07-25 13:26 UTC (permalink / raw)
  To: John Ogness, Sebastian Andrzej Siewior; +Cc: RT, Petr Mladek

On Mon, 2022-07-25 at 12:04 +0206, John Ogness wrote:
> I suppose this patch side-steps the splat because the first
> vsnprintf() already triggered the random bytes for the pointer
> hash.

Yeah, sounds right.

> I am preparing a new series that addresses this by completely removing
> interrupt disabling from the printk() path. This required changes to the
> printk_enter() macro, recursion handling, and the printk-ringbuffer
> itself.
>
> The focus of my new series are the various non-RT issues reported during
> the early 5.19-rc cycles. I still need more time to get the series into
> shape for LKML.
>
> If Mike's patch is reliably side-stepping the issue, feel free to take
> it.

It does, but I'm in no rush... and it is a bandaid :)

	-Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-07-25  9:58               ` John Ogness
  2022-07-25 13:26                 ` Mike Galbraith
@ 2022-08-03 16:10                 ` Petr Mladek
  2022-08-05 14:23                   ` John Ogness
  1 sibling, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2022-08-03 16:10 UTC (permalink / raw)
  To: John Ogness; +Cc: Sebastian Andrzej Siewior, Mike Galbraith, RT

On Mon 2022-07-25 12:04:50, John Ogness wrote:
> On 2022-07-25, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > I remember asking why it is needed to disable interrupts across
> > vsnprintf(). John, can I take this as-it or are you sending a new
> > batch?
> 
> Mike's patch only addresses the vsnprintf() to print the prefix and get
> the message length. There is also a vscnprintf() for the message itself,
> which is still happening with interrupts disabled (see
> printk_sprint()). I suppose this patch side-steps the splat because the
> first vsnprintf() already triggered the random bytes for the pointer
> hash.
> 
> I am preparing a new series that addresses this by completely removing
> interrupt disabling from the printk() path. This required changes to the
> printk_enter() macro, recursion handling, and the printk-ringbuffer
> itself.
> 
> The focus of my new series are the various non-RT issues reported during
> the early 5.19-rc cycles. I still need more time to get the series into
> shape for LKML.

The pull request for-5.20 was a great fiasko, see
https://lore.kernel.org/r/CAHk-=wie+VC-R5=Hm=Vrg5PLrJxb1XiV67Efx-9Cr1fBKCWHTQ@mail.gmail.com
We need to be super careful with the next pull request if there will
be any.

My proposal is to start with as conservative version of printk
kthread(s) as possible. It means to start either with a single
kthread. Or use per-console kthreads where the critical section
will be guarded by per-console.

The single kthread will help to avoid any races that were originally
prevented by console_sem().

The spin lock will cause that printk kthreads will never block
the global console_lock while sleeping (at least on non-RT system).

I am not going to accept any removal of disabled interrupts in
the first round. It opens another can of worms.

I understand that enabled interrupts will be needed on RT kernel.
We could do it later when the conservative kthread(s) are accepted.
And we could do it RT-specific. But we have to keep consoles
as reliable as possible in non-RT system.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v5.19-rc2-rt3: nouveau might sleep splat
  2022-08-03 16:10                 ` Petr Mladek
@ 2022-08-05 14:23                   ` John Ogness
  0 siblings, 0 replies; 15+ messages in thread
From: John Ogness @ 2022-08-05 14:23 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Sebastian Andrzej Siewior, Mike Galbraith, RT

On 2022-08-03, Petr Mladek <pmladek@suse.com> wrote:
>> I am preparing a new series that addresses this by completely removing
>> interrupt disabling from the printk() path. This required changes to the
>> printk_enter() macro, recursion handling, and the printk-ringbuffer
>> itself.
>
> I am not going to accept any removal of disabled interrupts in
> the first round. It opens another can of worms.

To be clear, my new series only removes interrupt disabling for RT. It
uses spin_lock_irqsave() and lock_lock_irqsave() to accomplish this so
that interrupts are still disabled for all !RT preemption models.

John

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-08-05 14:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-18  9:25 v5.19-rc2-rt3: nouveau might sleep splat Mike Galbraith
2022-06-24  8:42 ` Sebastian Andrzej Siewior
2022-06-24  9:52   ` Mike Galbraith
2022-06-24 10:18     ` Sebastian Andrzej Siewior
2022-06-24 10:19       ` Sebastian Andrzej Siewior
2022-06-24 14:30       ` John Ogness
2022-06-24 14:31         ` John Ogness
2022-06-24 14:39           ` Sebastian Andrzej Siewior
2022-06-25  3:26             ` Mike Galbraith
2022-07-21 13:06           ` Mike Galbraith
2022-07-25  8:15             ` Sebastian Andrzej Siewior
2022-07-25  9:58               ` John Ogness
2022-07-25 13:26                 ` Mike Galbraith
2022-08-03 16:10                 ` Petr Mladek
2022-08-05 14:23                   ` John Ogness

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.