All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: syzbot <syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com>,
	syzkaller-bugs@googlegroups.com, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH] debugobject: don't wake up kswapd from fill_pool()
Date: Fri, 12 May 2023 19:57:07 +0900	[thread overview]
Message-ID: <d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20230511204458.819f9009d2ef8b46cc163191@linux-foundation.org>

On 2023/05/12 12:44, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>> Since fill_pool() might be called with arbitrary locks held,
>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
> 
> hm.  But many GFP_ATOMIC allocation attempts are made with locks held. 
> Why aren't all such callers buggy, by trying to wake kswapd with locks
> held?  What's special about this one?

Because debugobject cannot know what locks are held when fill_pool() does
GFP_ATOMIC allocation.

syzbot is reporting base->lock => pgdat->kswapd_wait dependency

  add_timer() {
    __mod_timer() {
      base = lock_timer_base(timer, &flags);
      new_base = get_target_base(base, timer->flags);
      if (base != new_base) {
        raw_spin_unlock(&base->lock);
        base = new_base;
        raw_spin_lock(&base->lock);
      }
      debug_timer_activate(timer) {
        debug_object_activate(timer, &timer_debug_descr) {
          debug_objects_fill_pool() {
            fill_pool() {
              kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) {
                // wakes kswapd
              }
            }
          }
        }
      }
      raw_spin_unlock_irqrestore(&base->lock, flags);
    }
  }

when pgdat->kswapd_wait => p->pi_lock dependency

  __alloc_pages() {
    get_page_from_freelist() {
      rmqueue() {
        wakeup_kswapd() {
          wake_up_interruptible(&pgdat->kswapd_wait) {
            __wake_up_common_lock() {
              spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags);
              __wake_up_common() {
                autoremove_wake_function() {
                  try_to_wake_up() {
                    raw_spin_lock_irqsave(&p->pi_lock, flags);
                    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
                  }
                }
              }
              spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags);
            }
          }
        }
      }
    }
  }

and p->pi_lock => rq->__lock => base->lock dependency

  wake_up_new_task() {
    raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
    rq = __task_rq_lock(p, &rf); // acquires rq->lock
    activate_task(rq, p, ENQUEUE_NOCLOCK) {
      enqueue_task() {
        psi_enqueue() {
          psi_task_change() {
            queue_delayed_work_on() {
              __queue_delayed_work() {
                add_timer() {
                  __mod_timer() {
                    base = lock_timer_base(timer, &flags); // acquires base->lock
                    debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency
                    raw_spin_unlock_irqrestore(&base->lock, flags);
                  }
                }
              }
            }
          }
        }
      }
    }
    task_rq_unlock(rq, p, &rf);
  }

exists.

All GFP_ATOMIC allocation users are supposed to be aware of what locks
are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM
if waking up kswapd can cause deadlock. But reality is that we can't be
careful enough to error-free. Who would imagine GFP_ATOMIC allocation
while base->lock is held can form circular locking dependency?

> 
>> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation

__GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation.
GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH.

>>
>> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>>  
>>  static void fill_pool(void)
>>  {
>> -	gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
>> +	gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
> 
> Does this weaken fill_pool()'s allocation attempt more than necessary? 
> We can still pass __GFP_HIGH?

What do you mean? I think that killing base->lock => pgdat->kswapd_wait
by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is
needed for avoiding base->lock => pgdat->kswapd_wait dependency from
debugobject code.

For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply
__GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages
as even __GFP_HIGH fails. And if such allocations try to allocate as many pages
as even __GFP_HIGH fails, they likely already failed before background kswapd
reclaim finds some reusable pages....


  reply	other threads:[~2023-05-12 10:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-10 22:10 [syzbot] [kernel?] possible deadlock in __mod_timer (2) syzbot
2023-05-11 13:47 ` [PATCH] debugobject: don't wake up kswapd from fill_pool() Tetsuo Handa
2023-05-12  3:44   ` Andrew Morton
2023-05-12 10:57     ` Tetsuo Handa [this message]
2023-05-12 12:54       ` Thomas Gleixner
2023-05-12 13:09         ` Tetsuo Handa
2023-05-12 18:07           ` Thomas Gleixner
2023-05-12 23:13             ` Tetsuo Handa
2023-05-13  8:33               ` Thomas Gleixner
2023-05-13  9:33                 ` Tetsuo Handa
2023-05-13 19:42                   ` Thomas Gleixner
2023-05-22 12:56   ` [tip: core/debugobjects] debugobjects: Don't " tip-bot2 for Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=syzbot+fe0c72f0ccbb93786380@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.