All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [kernel?] possible deadlock in __mod_timer (2)
@ 2023-05-10 22:10 syzbot
  2023-05-11 13:47 ` [PATCH] debugobject: don't wake up kswapd from fill_pool() Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2023-05-10 22:10 UTC (permalink / raw)
  To: bpf, brauner, linux-kernel, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    1dc3731daf1f Merge tag 'for-6.4-rc1-tag' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=153dd834280000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8bc832f563d8bf38
dashboard link: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-1dc3731d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9ac41c523f85/vmlinux-1dc3731d.xz
kernel image: https://storage.googleapis.com/syzbot-assets/40b82936b92f/bzImage-1dc3731d.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc1-syzkaller-00011-g1dc3731daf1f #0 Not tainted
------------------------------------------------------
kworker/u16:0/11 is trying to acquire lock:
ffff88807ffdaba0 (&pgdat->kswapd_wait){....}-{2:2}, at: __wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137

but task is already holding lock:
ffff88802c7296d8 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x69c/0xe80 kernel/time/timer.c:1112

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&base->lock){-.-.}-{2:2}:
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
       lock_timer_base+0x5a/0x1f0 kernel/time/timer.c:999
       __mod_timer+0x3f9/0xe80 kernel/time/timer.c:1080
       add_timer+0x62/0x90 kernel/time/timer.c:1244
       __queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
       queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
       psi_task_change+0x1bf/0x300 kernel/sched/psi.c:920
       psi_enqueue kernel/sched/stats.h:139 [inline]
       enqueue_task kernel/sched/core.c:2078 [inline]
       activate_task kernel/sched/core.c:2112 [inline]
       wake_up_new_task+0xc13/0x1000 kernel/sched/core.c:4833
       kernel_clone+0x219/0x890 kernel/fork.c:2949
       user_mode_thread+0xb1/0xf0 kernel/fork.c:2996
       rest_init+0x27/0x2b0 init/main.c:700
       arch_call_rest_init+0x13/0x30 init/main.c:834
       start_kernel+0x3b6/0x490 init/main.c:1088
       x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:556
       x86_64_start_kernel+0xb3/0xc0 arch/x86/kernel/head64.c:537
       secondary_startup_64_no_verify+0xf4/0xfb

-> #2 (&rq->__lock){-.-.}-{2:2}:
       _raw_spin_lock_nested+0x34/0x40 kernel/locking/spinlock.c:378
       raw_spin_rq_lock_nested+0x2f/0x120 kernel/sched/core.c:558
       raw_spin_rq_lock kernel/sched/sched.h:1366 [inline]
       rq_lock kernel/sched/sched.h:1653 [inline]
       task_fork_fair+0x74/0x4f0 kernel/sched/fair.c:12095
       sched_cgroup_fork+0x3d1/0x540 kernel/sched/core.c:4777
       copy_process+0x4b8a/0x7600 kernel/fork.c:2618
       kernel_clone+0xeb/0x890 kernel/fork.c:2918
       user_mode_thread+0xb1/0xf0 kernel/fork.c:2996
       rest_init+0x27/0x2b0 init/main.c:700
       arch_call_rest_init+0x13/0x30 init/main.c:834
       start_kernel+0x3b6/0x490 init/main.c:1088
       x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:556
       x86_64_start_kernel+0xb3/0xc0 arch/x86/kernel/head64.c:537
       secondary_startup_64_no_verify+0xf4/0xfb

-> #1 (&p->pi_lock){-.-.}-{2:2}:
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
       try_to_wake_up+0xab/0x1c40 kernel/sched/core.c:4191
       autoremove_wake_function+0x16/0x150 kernel/sched/wait.c:419
       __wake_up_common+0x147/0x650 kernel/sched/wait.c:107
       __wake_up_common_lock+0xd4/0x140 kernel/sched/wait.c:138
       wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
       rmqueue mm/page_alloc.c:3057 [inline]
       get_page_from_freelist+0x6c5/0x2c00 mm/page_alloc.c:3499
       __alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:4768
       __folio_alloc+0x16/0x40 mm/page_alloc.c:4800
       vma_alloc_folio+0x155/0x890 mm/mempolicy.c:2240
       wp_page_copy mm/memory.c:3070 [inline]
       do_wp_page+0x173d/0x33c0 mm/memory.c:3432
       handle_pte_fault mm/memory.c:4964 [inline]
       __handle_mm_fault+0x1635/0x41c0 mm/memory.c:5089
       handle_mm_fault+0x2af/0x9f0 mm/memory.c:5243
       do_user_addr_fault+0x2ca/0x1210 arch/x86/mm/fault.c:1349
       handle_page_fault arch/x86/mm/fault.c:1534 [inline]
       exc_page_fault+0x98/0x170 arch/x86/mm/fault.c:1590
       asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570

-> #0 (&pgdat->kswapd_wait){....}-{2:2}:
       check_prev_add kernel/locking/lockdep.c:3108 [inline]
       check_prevs_add kernel/locking/lockdep.c:3227 [inline]
       validate_chain kernel/locking/lockdep.c:3842 [inline]
       __lock_acquire+0x2f21/0x5df0 kernel/locking/lockdep.c:5074
       lock_acquire kernel/locking/lockdep.c:5691 [inline]
       lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
       __wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137
       wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
       wake_all_kswapds+0x182/0x2d0 mm/page_alloc.c:4028
       __alloc_pages_slowpath.constprop.0+0x1724/0x2170 mm/page_alloc.c:4296
       __alloc_pages+0x408/0x4a0 mm/page_alloc.c:4781
       alloc_pages+0x1aa/0x270 mm/mempolicy.c:2279
       alloc_slab_page mm/slub.c:1851 [inline]
       allocate_slab+0x25f/0x390 mm/slub.c:1998
       new_slab mm/slub.c:2051 [inline]
       ___slab_alloc+0xa91/0x1400 mm/slub.c:3192
       __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3291
       __slab_alloc_node mm/slub.c:3344 [inline]
       slab_alloc_node mm/slub.c:3441 [inline]
       slab_alloc mm/slub.c:3459 [inline]
       __kmem_cache_alloc_lru mm/slub.c:3466 [inline]
       kmem_cache_alloc+0x38e/0x3b0 mm/slub.c:3475
       kmem_cache_zalloc include/linux/slab.h:670 [inline]
       fill_pool+0x264/0x5c0 lib/debugobjects.c:168
       debug_objects_fill_pool lib/debugobjects.c:597 [inline]
       debug_object_activate+0xfd/0x400 lib/debugobjects.c:693
       debug_timer_activate kernel/time/timer.c:782 [inline]
       __mod_timer+0x80d/0xe80 kernel/time/timer.c:1119
       add_timer+0x62/0x90 kernel/time/timer.c:1244
       __queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
       queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
       process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
       worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
       kthread+0x344/0x440 kernel/kthread.c:379
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

other info that might help us debug this:

Chain exists of:
  &pgdat->kswapd_wait --> &rq->__lock --> &base->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&base->lock);
                               lock(&rq->__lock);
                               lock(&base->lock);
  lock(&pgdat->kswapd_wait);

 *** DEADLOCK ***

3 locks held by kworker/u16:0/11:
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1324 [inline]
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:643 [inline]
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:670 [inline]
 #0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: process_one_work+0x883/0x15e0 kernel/workqueue.c:2376
 #1: ffffc900003d7db0 ((work_completion)(&(&bat_priv->nc.work)->work)){+.+.}-{0:0}, at: process_one_work+0x8b7/0x15e0 kernel/workqueue.c:2380
 #2: ffff88802c7296d8 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x69c/0xe80 kernel/time/timer.c:1112

stack backtrace:
CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.4.0-rc1-syzkaller-00011-g1dc3731daf1f #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: bat_events batadv_nc_worker
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2188
 check_prev_add kernel/locking/lockdep.c:3108 [inline]
 check_prevs_add kernel/locking/lockdep.c:3227 [inline]
 validate_chain kernel/locking/lockdep.c:3842 [inline]
 __lock_acquire+0x2f21/0x5df0 kernel/locking/lockdep.c:5074
 lock_acquire kernel/locking/lockdep.c:5691 [inline]
 lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
 __wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137
 wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
 wake_all_kswapds+0x182/0x2d0 mm/page_alloc.c:4028
 __alloc_pages_slowpath.constprop.0+0x1724/0x2170 mm/page_alloc.c:4296
 __alloc_pages+0x408/0x4a0 mm/page_alloc.c:4781
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2279
 alloc_slab_page mm/slub.c:1851 [inline]
 allocate_slab+0x25f/0x390 mm/slub.c:1998
 new_slab mm/slub.c:2051 [inline]
 ___slab_alloc+0xa91/0x1400 mm/slub.c:3192
 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3291
 __slab_alloc_node mm/slub.c:3344 [inline]
 slab_alloc_node mm/slub.c:3441 [inline]
 slab_alloc mm/slub.c:3459 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3466 [inline]
 kmem_cache_alloc+0x38e/0x3b0 mm/slub.c:3475
 kmem_cache_zalloc include/linux/slab.h:670 [inline]
 fill_pool+0x264/0x5c0 lib/debugobjects.c:168
 debug_objects_fill_pool lib/debugobjects.c:597 [inline]
 debug_object_activate+0xfd/0x400 lib/debugobjects.c:693
 debug_timer_activate kernel/time/timer.c:782 [inline]
 __mod_timer+0x80d/0xe80 kernel/time/timer.c:1119
 add_timer+0x62/0x90 kernel/time/timer.c:1244
 __queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
 queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
 process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
 worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
 kthread+0x344/0x440 kernel/kthread.c:379
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-10 22:10 [syzbot] [kernel?] possible deadlock in __mod_timer (2) syzbot
@ 2023-05-11 13:47 ` Tetsuo Handa
  2023-05-12  3:44   ` Andrew Morton
  2023-05-22 12:56   ` [tip: core/debugobjects] debugobjects: Don't " tip-bot2 for Tetsuo Handa
  0 siblings, 2 replies; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-11 13:47 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs, Ingo Molnar, Thomas Gleixner, Andrew Morton
  Cc: linux-kernel, linux-mm

syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
(__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
Since fill_pool() might be called with arbitrary locks held,
fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation.

Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
---
 lib/debugobjects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 003edc5ebd67..986adca357b4 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
 
 static void fill_pool(void)
 {
-	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
+	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
 	struct debug_obj *obj;
 	unsigned long flags;
 
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-11 13:47 ` [PATCH] debugobject: don't wake up kswapd from fill_pool() Tetsuo Handa
@ 2023-05-12  3:44   ` Andrew Morton
  2023-05-12 10:57     ` Tetsuo Handa
  2023-05-22 12:56   ` [tip: core/debugobjects] debugobjects: Don't " tip-bot2 for Tetsuo Handa
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2023-05-12  3:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, Thomas Gleixner,
	linux-kernel, linux-mm

On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
> Since fill_pool() might be called with arbitrary locks held,
> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

hm.  But many GFP_ATOMIC allocation attempts are made with locks held. 
Why aren't all such callers buggy, by trying to wake kswapd with locks
held?  What's special about this one?

> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation
> 
> Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
> Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
> ---
>  lib/debugobjects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 003edc5ebd67..986adca357b4 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>  
>  static void fill_pool(void)
>  {
> -	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
> +	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;

Does this weaken fill_pool()'s allocation attempt more than necessary? 
We can still pass __GFP_HIGH?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12  3:44   ` Andrew Morton
@ 2023-05-12 10:57     ` Tetsuo Handa
  2023-05-12 12:54       ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-12 10:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, Thomas Gleixner,
	linux-kernel, linux-mm

On 2023/05/12 12:44, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>> Since fill_pool() might be called with arbitrary locks held,
>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
> 
> hm.  But many GFP_ATOMIC allocation attempts are made with locks held. 
> Why aren't all such callers buggy, by trying to wake kswapd with locks
> held?  What's special about this one?

Because debugobject cannot know what locks are held when fill_pool() does
GFP_ATOMIC allocation.

syzbot is reporting base->lock => pgdat->kswapd_wait dependency

  add_timer() {
    __mod_timer() {
      base = lock_timer_base(timer, &flags);
      new_base = get_target_base(base, timer->flags);
      if (base != new_base) {
        raw_spin_unlock(&base->lock);
        base = new_base;
        raw_spin_lock(&base->lock);
      }
      debug_timer_activate(timer) {
        debug_object_activate(timer, &timer_debug_descr) {
          debug_objects_fill_pool() {
            fill_pool() {
              kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) {
                // wakes kswapd
              }
            }
          }
        }
      }
      raw_spin_unlock_irqrestore(&base->lock, flags);
    }
  }

when pgdat->kswapd_wait => p->pi_lock dependency

  __alloc_pages() {
    get_page_from_freelist() {
      rmqueue() {
        wakeup_kswapd() {
          wake_up_interruptible(&pgdat->kswapd_wait) {
            __wake_up_common_lock() {
              spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags);
              __wake_up_common() {
                autoremove_wake_function() {
                  try_to_wake_up() {
                    raw_spin_lock_irqsave(&p->pi_lock, flags);
                    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
                  }
                }
              }
              spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags);
            }
          }
        }
      }
    }
  }

and p->pi_lock => rq->__lock => base->lock dependency

  wake_up_new_task() {
    raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
    rq = __task_rq_lock(p, &rf); // acquires rq->lock
    activate_task(rq, p, ENQUEUE_NOCLOCK) {
      enqueue_task() {
        psi_enqueue() {
          psi_task_change() {
            queue_delayed_work_on() {
              __queue_delayed_work() {
                add_timer() {
                  __mod_timer() {
                    base = lock_timer_base(timer, &flags); // acquires base->lock
                    debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency
                    raw_spin_unlock_irqrestore(&base->lock, flags);
                  }
                }
              }
            }
          }
        }
      }
    }
    task_rq_unlock(rq, p, &rf);
  }

exists.

All GFP_ATOMIC allocation users are supposed to be aware of what locks
are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM
if waking up kswapd can cause deadlock. But reality is that we can't be
careful enough to error-free. Who would imagine GFP_ATOMIC allocation
while base->lock is held can form circular locking dependency?

> 
>> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation

__GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation.
GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH.

>>
>> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>>  
>>  static void fill_pool(void)
>>  {
>> -	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
>> +	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
> 
> Does this weaken fill_pool()'s allocation attempt more than necessary? 
> We can still pass __GFP_HIGH?

What do you mean? I think that killing base->lock => pgdat->kswapd_wait
by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is
needed for avoiding base->lock => pgdat->kswapd_wait dependency from
debugobject code.

For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply
__GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages
as even __GFP_HIGH fails. And if such allocations try to allocate as many pages
as even __GFP_HIGH fails, they likely already failed before background kswapd
reclaim finds some reusable pages....


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12 10:57     ` Tetsuo Handa
@ 2023-05-12 12:54       ` Thomas Gleixner
  2023-05-12 13:09         ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2023-05-12 12:54 UTC (permalink / raw)
  To: Tetsuo Handa, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm

On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
> On 2023/05/12 12:44, Andrew Morton wrote:
>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>> 
>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>> Since fill_pool() might be called with arbitrary locks held,
>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12 12:54       ` Thomas Gleixner
@ 2023-05-12 13:09         ` Tetsuo Handa
  2023-05-12 18:07           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-12 13:09 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm

On 2023/05/12 21:54, Thomas Gleixner wrote:
> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>> On 2023/05/12 12:44, Andrew Morton wrote:
>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>
>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>> Since fill_pool() might be called with arbitrary locks held,
>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
> 
> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/

.config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
dependency but does not say about db->lock.

How can your patch fix this problem?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12 13:09         ` Tetsuo Handa
@ 2023-05-12 18:07           ` Thomas Gleixner
  2023-05-12 23:13             ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2023-05-12 18:07 UTC (permalink / raw)
  To: Tetsuo Handa, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm

On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote:
> On 2023/05/12 21:54, Thomas Gleixner wrote:
>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>>> On 2023/05/12 12:44, Andrew Morton wrote:
>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>>
>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>>> Since fill_pool() might be called with arbitrary locks held,
>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>> 
>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
>
> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
> dependency but does not say about db->lock.
>
> How can your patch fix this problem?

It's described in the changelog, no?

The main change is to make the refill invocation conditional when the
lookup fails. That's how that code has been from day one.

The patch which closed the race recently wreckaged those refill
oportunities and the fix for that introduced this problem.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12 18:07           ` Thomas Gleixner
@ 2023-05-12 23:13             ` Tetsuo Handa
  2023-05-13  8:33               ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-12 23:13 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm

On 2023/05/13 3:07, Thomas Gleixner wrote:
> On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote:
>> On 2023/05/12 21:54, Thomas Gleixner wrote:
>>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>>>> On 2023/05/12 12:44, Andrew Morton wrote:
>>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>>>
>>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>>>> Since fill_pool() might be called with arbitrary locks held,
>>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>>>
>>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
>>
>> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
>> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
>> dependency but does not say about db->lock.
>>
>> How can your patch fix this problem?
> 
> It's described in the changelog, no?

I can't find a proof that lookup_object() never returns NULL
when debug_object_activate() is called.

> 
> The main change is to make the refill invocation conditional when the
> lookup fails. That's how that code has been from day one.

Making refill conditional helps reducing frequency of doing allocations.
I want a proof that allocations never happens in the worst scenario.

Are you saying that some debugobject function other than debug_object_activate()
guarantees that memory for that object was already allocated before
debug_object_activate() is called for the first time for that object,
_and_ such debugobject function is called without locks held?

> 
> The patch which closed the race recently wreckaged those refill
> oportunities and the fix for that introduced this problem.
> 
> Thanks,
> 
>         tglx


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-12 23:13             ` Tetsuo Handa
@ 2023-05-13  8:33               ` Thomas Gleixner
  2023-05-13  9:33                 ` Tetsuo Handa
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2023-05-13  8:33 UTC (permalink / raw)
  To: Tetsuo Handa, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm,
	Peter Zijlstra

On Sat, May 13 2023 at 08:13, Tetsuo Handa wrote:
> On 2023/05/13 3:07, Thomas Gleixner wrote:
>> The main change is to make the refill invocation conditional when the
>> lookup fails. That's how that code has been from day one.
>
> Making refill conditional helps reducing frequency of doing allocations.
> I want a proof that allocations never happens in the worst scenario.
>
> Are you saying that some debugobject function other than debug_object_activate()
> guarantees that memory for that object was already allocated before
> debug_object_activate() is called for the first time for that object,
> _and_ such debugobject function is called without locks held?

The point is that the allocation in activate() only happens when the
tracked entity was not initialized _before_ activate() is invoked.

That's a bug for dynamically allocated entities, but a valid scenario
for statically initialized entities as they can be activated without
prior init() obviously.

For dynamically allocated entities the init() function takes care of the
tracking object allocation and that's where the pool is refilled. So for
those the lookup will never fail.

Now I just stared at __alloc_pages_slowpath() and looked at the
condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.

So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
wakeup path.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-13  8:33               ` Thomas Gleixner
@ 2023-05-13  9:33                 ` Tetsuo Handa
  2023-05-13 19:42                   ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-13  9:33 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm,
	Peter Zijlstra

On 2023/05/13 17:33, Thomas Gleixner wrote:
> Now I just stared at __alloc_pages_slowpath() and looked at the
> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.
> 
> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
> wakeup path.

Yes. That is exactly what my patch does.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
  2023-05-13  9:33                 ` Tetsuo Handa
@ 2023-05-13 19:42                   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2023-05-13 19:42 UTC (permalink / raw)
  To: Tetsuo Handa, Andrew Morton
  Cc: syzbot, syzkaller-bugs, Ingo Molnar, linux-kernel, linux-mm,
	Peter Zijlstra

On Sat, May 13 2023 at 18:33, Tetsuo Handa wrote:
> On 2023/05/13 17:33, Thomas Gleixner wrote:
>> Now I just stared at __alloc_pages_slowpath() and looked at the
>> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
>> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.
>> 
>> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
>> wakeup path.
>
> Yes. That is exactly what my patch does.

Indeed. For some reason your patch (though you cc'ed me) did not show up
in my inbox. I've grabbed it from lore so no need to resend.

Actually we want both changes.

  - Your's to fix the underlying ancient problem.

  - The one I did which restores the performance behaviour

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip: core/debugobjects] debugobjects: Don't wake up kswapd from fill_pool()
  2023-05-11 13:47 ` [PATCH] debugobject: don't wake up kswapd from fill_pool() Tetsuo Handa
  2023-05-12  3:44   ` Andrew Morton
@ 2023-05-22 12:56   ` tip-bot2 for Tetsuo Handa
  1 sibling, 0 replies; 12+ messages in thread
From: tip-bot2 for Tetsuo Handa @ 2023-05-22 12:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot, Tetsuo Handa, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the core/debugobjects branch of tip:

Commit-ID:     eb799279fb1f9c63c520fe8c1c41cb9154252db6
Gitweb:        https://git.kernel.org/tip/eb799279fb1f9c63c520fe8c1c41cb9154252db6
Author:        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
AuthorDate:    Thu, 11 May 2023 22:47:32 +09:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 22 May 2023 14:52:58 +02:00

debugobjects: Don't wake up kswapd from fill_pool()

syzbot is reporting a lockdep warning in fill_pool() because the allocation
from debugobjects is using GFP_ATOMIC, which is (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)
and therefore tries to wake up kswapd, which acquires kswapd_wait::lock.

Since fill_pool() might be called with arbitrary locks held, fill_pool()
should not assume that acquiring kswapd_wait::lock is safe.

Use __GFP_HIGH instead and remove __GFP_NORETRY as it is pointless for
!__GFP_DIRECT_RECLAIM allocation.

Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
Reported-by: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/6577e1fa-b6ee-f2be-2414-a2b51b1c5e30@I-love.SAKURA.ne.jp
Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
---
 lib/debugobjects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 826c617..984985c 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
 
 static void fill_pool(void)
 {
-	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
+	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
 	struct debug_obj *obj;
 	unsigned long flags;
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-22 12:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-10 22:10 [syzbot] [kernel?] possible deadlock in __mod_timer (2) syzbot
2023-05-11 13:47 ` [PATCH] debugobject: don't wake up kswapd from fill_pool() Tetsuo Handa
2023-05-12  3:44   ` Andrew Morton
2023-05-12 10:57     ` Tetsuo Handa
2023-05-12 12:54       ` Thomas Gleixner
2023-05-12 13:09         ` Tetsuo Handa
2023-05-12 18:07           ` Thomas Gleixner
2023-05-12 23:13             ` Tetsuo Handa
2023-05-13  8:33               ` Thomas Gleixner
2023-05-13  9:33                 ` Tetsuo Handa
2023-05-13 19:42                   ` Thomas Gleixner
2023-05-22 12:56   ` [tip: core/debugobjects] debugobjects: Don't " tip-bot2 for Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.