All of lore.kernel.org
 help / color / mirror / Atom feed
* [5.14] problem with dynamic memory allocation for object debugging
@ 2021-08-15 10:23 Tetsuo Handa
  2021-08-15 10:42 ` Matthew Wilcox
  2021-08-16  9:10 ` Peter Zijlstra
  0 siblings, 2 replies; 3+ messages in thread
From: Tetsuo Handa @ 2021-08-15 10:23 UTC (permalink / raw)
  To: Thomas Gleixner, Zqiang, Kees Cook, Stephen Boyd, Tejun Heo,
	Lai Jiangshan, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-mm

Hello.

I'm frequently hitting "BUG: Invalid wait context", and
it seems that this problem is not fixed yet.

This is a problem between spinlock versus raw_spinlock. check_wait_context()
complains if we try to take a spinlock when we already hold a raw_spinlock.

raw_spinlock is preferred for debugging code while normal spinlock is preferred
for normal code. But sometimes debugging code wants to allocate memory.

fill_pool() (a function in debugobject code) is using raw_spinlock
but does kmalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN). fill_pool()
can work as long as fill_pool() is called without holding raw_spinlocks.

However, functions in workqueue code (e.g. __queue_work()) are also using
raw_spinlock, and triggers fill_pool() via debug_object_activate() call.
If fill_pool() hits kmalloc() path, lockdep complains because
get_partial_node() uses normal spinlock.

[   17.726419][   T41] 
[   17.727873][   T41] =============================
[   17.730788][   T41] [ BUG: Invalid wait context ]
[   17.733519][   T41] 5.14.0-rc5+ #742 Not tainted
[   17.735111][   T41] -----------------------------
[   17.736747][   T41] kworker/2:1/41 is trying to lock:
[   17.738468][   T41] ffff888005854d18 (&n->list_lock){-.-.}-{3:3}, at: get_partial_node.isra.0.part.0+0x48/0x4c0
[   17.741896][   T41] other info that might help us debug this:
[   17.743970][   T41] context-{5:5}
[   17.745132][   T41] 3 locks held by kworker/2:1/41:
[   17.746798][   T41]  #0: ffff888005871548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x21e/0x600
[   17.750250][   T41]  #1: ffffc90000823e58 ((work_completion)(&(&krcp->monitor_work)->work)){+.+.}-{0:0}, at: process_one_work+0x21e/0x600
[   17.754416][   T41]  #2: ffff88807cfebe40 (krc.lock){....}-{2:2}, at: kfree_rcu_monitor+0x1d/0x150
[   17.757302][   T52] cfg80211: failed to load regulatory.db
[   17.757462][   T41] stack backtrace:
[   17.757464][   T41] CPU: 2 PID: 41 Comm: kworker/2:1 Not tainted 5.14.0-rc5+ #742
[   17.757466][   T41] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[   17.757468][   T41] Workqueue: events kfree_rcu_monitor
[   17.757474][   T41] Call Trace:
[   17.757476][   T41]  dump_stack_lvl+0x57/0x7d
[   17.757481][   T41]  __lock_acquire.cold+0x213/0x2f2
[   17.757486][   T41]  lock_acquire+0xd0/0x300
[   17.757490][   T41]  ? get_partial_node.isra.0.part.0+0x48/0x4c0
[   17.757494][   T41]  ? __bfs+0x10a/0x240
[   17.757496][   T41]  _raw_spin_lock+0x2c/0x70
[   17.757500][   T41]  ? get_partial_node.isra.0.part.0+0x48/0x4c0
[   17.757503][   T41]  get_partial_node.isra.0.part.0+0x48/0x4c0
[   17.757506][   T41]  ? ret_from_fork+0x1f/0x30
[   17.757510][   T41]  ___slab_alloc+0x4a0/0x9c0
[   17.757513][   T41]  ? fill_pool+0x14d/0x220
[   17.757517][   T41]  ? check_prev_add+0x32f/0xc00
[   17.757519][   T41]  ? __slab_alloc.isra.0+0x22/0x80
[   17.757522][   T41]  __slab_alloc.isra.0+0x22/0x80
[   17.757524][   T41]  kmem_cache_alloc+0x334/0x360
[   17.757527][   T41]  ? fill_pool+0x14d/0x220
[   17.757530][   T41]  fill_pool+0x14d/0x220
[   17.757533][   T41]  ? debug_object_activate+0xb2/0x210
[   17.757536][   T41]  ? find_held_lock+0x2b/0x80
[   17.757537][   T41]  __debug_object_init+0x2e/0x3a0
[   17.757540][   T41]  debug_object_activate+0x12a/0x210
[   17.757543][   T41]  ? delayed_work_timer_fn+0x10/0x10
[   17.757547][   T41]  call_rcu+0x2c/0x330
[   17.757550][   T41]  queue_rcu_work+0x22/0x30
[   17.757553][   T41]  kfree_rcu_monitor+0xff/0x150
[   17.757555][   T41]  process_one_work+0x2aa/0x600
[   17.757559][   T41]  worker_thread+0x48/0x3d0
[   17.757562][   T41]  ? process_one_work+0x600/0x600
[   17.757564][   T41]  kthread+0x13e/0x170
[   17.757567][   T41]  ? set_kthread_struct+0x40/0x40
[   17.757570][   T41]  ret_from_fork+0x1f/0x30

Question is how to solve this problem.

Can all locks held in atomic memory allocation path be
updated to use raw_spinlock ? If yes, it is nice.

Can debugobject code preallocate memory asynchronously so that
debug_object_activate() can be called from any context ?

Can workqueue code stop using raw_spinlocks (e.g. reverting
commit a9b8a985294debae ("workqueue: Convert the pool::lock
and wq_mayday_lock to raw_spinlock_t")) ?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [5.14] problem with dynamic memory allocation for object debugging
  2021-08-15 10:23 [5.14] problem with dynamic memory allocation for object debugging Tetsuo Handa
@ 2021-08-15 10:42 ` Matthew Wilcox
  2021-08-16  9:10 ` Peter Zijlstra
  1 sibling, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2021-08-15 10:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Thomas Gleixner, Zqiang, Kees Cook, Stephen Boyd, Tejun Heo,
	Lai Jiangshan, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-mm

On Sun, Aug 15, 2021 at 07:23:03PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I'm frequently hitting "BUG: Invalid wait context", and
> it seems that this problem is not fixed yet.
> 
> This is a problem between spinlock versus raw_spinlock. check_wait_context()
> complains if we try to take a spinlock when we already hold a raw_spinlock.

Disable this option:

config PROVE_RAW_LOCK_NESTING
        bool "Enable raw_spinlock - spinlock nesting checks"
        depends on PROVE_LOCKING
        default n
        help
         Enable the raw_spinlock vs. spinlock nesting checks which ensure
         that the lock nesting rules for PREEMPT_RT enabled kernels are
         not violated.

         NOTE: There are known nesting problems. So if you enable this
         option expect lockdep splats until these problems have been fully
         addressed which is work in progress. This config switch allows to
         identify and analyze these problems. It will be removed and the
         check permanently enabled once the main issues have been fixed.

         If unsure, select N.

If you're not working on removing these splats yourself, then reporting
them has little value.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [5.14] problem with dynamic memory allocation for object debugging
  2021-08-15 10:23 [5.14] problem with dynamic memory allocation for object debugging Tetsuo Handa
  2021-08-15 10:42 ` Matthew Wilcox
@ 2021-08-16  9:10 ` Peter Zijlstra
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2021-08-16  9:10 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Thomas Gleixner, Zqiang, Kees Cook, Stephen Boyd, Tejun Heo,
	Lai Jiangshan, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	linux-mm

On Sun, Aug 15, 2021 at 07:23:03PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I'm frequently hitting "BUG: Invalid wait context", and
> it seems that this problem is not fixed yet.

Unless you're doing PREEMPT_RT development, don't use this lockdep
option. Once we get mainline clean, I'll default enable it and make it
go away, until then, don't use it unless you're interested.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-16  9:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15 10:23 [5.14] problem with dynamic memory allocation for object debugging Tetsuo Handa
2021-08-15 10:42 ` Matthew Wilcox
2021-08-16  9:10 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.