All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	syzbot <syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com>,
	syzkaller-bugs@googlegroups.com
Cc: linux-kernel@vger.kernel.org, Stephen Boyd <swboyd@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	wuchi <wuchi.zero@gmail.com>
Subject: Re: [PATCH] debugobjects: turn off debug_objects_enabled from debug_objects_oom()
Date: Wed, 07 Jun 2023 00:35:41 +0200	[thread overview]
Message-ID: <871qiokyma.ffs@tglx> (raw)
In-Reply-To: <1af29817-4698-c5ac-cf63-0dad289e740f@I-love.SAKURA.ne.jp>

Tetsuo!

On Mon, May 29 2023 at 23:39, Tetsuo Handa wrote:
> syzbot is reporting false positive ODEBUG message immediately after
> ODEBUG was disabled due to OOM.
>
>   [ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
>   [ 1062.886755][ T5171] ------------[ cut here ]------------
>   [ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
>
> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled, and debug_print_object() did not check
> debug_objects_enabled when calling WARN().
>
>   CPU 0 [ T5171]                CPU 1 [T22911]
>   --------------                --------------
>   debug_object_assert_init() {
>     if (!debug_objects_enabled)
>       return;
>     db = get_bucket((unsigned long) addr); // Finds a bucket, but...
>                                 debug_objects_oom() {
>                                   pr_warn("Out of memory. ODEBUG disabled\n");
>                                   // all buckets get emptied here, and...
>                                   hlist_move_list(&db->list, &freelist);
>                                 }
>     lookup_object_or_alloc(addr, db, descr, false, true) {
>       lookup_object(addr, b) {
>         return NULL; // this bucket is already empty.
>       }
>       if (!descr->is_static_object || !descr->is_static_object(addr))
>         return ERR_PTR(-ENOENT);
>     }
>     if (!obj) { // obj == ERR_PTR(-ENOENT) because non-static object.
>        debug_objects_oom();
>        return;
>     }
>     debug_print_object(&o, "assert_init") {
>       // False positive due to not checking debug_objects_enabled.
>       WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
>            "object: %p object type: %s hint: %pS\n", ...);
>     }
>   }

The above is undecodable gibberish.

Something like this is completely sufficient:

  CPU 0 [ T5171]			CPU 1 [T22911]
  --------------                	--------------
  debug_object_assert_init() {
    db = get_bucket(addr);
					debug_objects_oom() {
                                  	  pr_warn("Out of memory. ODEBUG disabled\n");
                                  	  // all buckets get emptied here
                                	}
    lookup_object_or_alloc(addr, db, ...)
      // Due to OOM:
      return ERR_PTR(-ENOENT);
    ...

    // Emits assert_init message and warning
    debug_print_object(&o, "assert_init");
  }

And this:

> This race happened because debug_objects_oom() emitted OOM message but did
> not turn off debug_objects_enabled

is completely wrong. Why?

The place where debug_objects_enabled is set to 0 is way before
debug_objects_oom() is invoked. That place _cannot_ invoke
debug_objects_oom() because it holds a hash bucket lock.

There are exactly three places which invoke debug_objects_oom() and for
all three places the pattern is exactly the same:

      lock_bucket();
      obj = lookup_object_or_alloc();
      unlock_bucket();
      if (!obj)
         debug_objects_oom();

The place which clears debug_objects_enabled is unsurprisingly
lookup_object_or_alloc() itself, which _cannot_ invoke
debug_objects_oom() because it is invoked with the hash bucket lock
held. There is even a comment to that effect:

	/* Out of memory. Do the cleanup outside of the locked region */
	debug_objects_enabled = 0;
	return NULL;

So at the point where debug_objects_oom() is invoked
@debug_objects_enabled is already 0.

But you claim that this is required, right?

> @@ -466,6 +466,7 @@ static void debug_objects_oom(void)
>  	unsigned long flags;
>  	int i;
>  
> +	debug_objects_enabled = 0;
>  	pr_warn("Out of memory. ODEBUG disabled\n");

Q: What is setting a variable which is already 0 to 0 solving?
A: Absolutely nothing

Now this:

> @@ -502,10 +503,10 @@ static void debug_print_object(struct de
>  		void *hint = descr->debug_hint ?
>  			descr->debug_hint(obj->object) : NULL;
>  		limit++;
> -		WARN(1, KERN_ERR "ODEBUG: %s %s (active state %u) "
> -				 "object: %p object type: %s hint: %pS\n",
> -			msg, obj_states[obj->state], obj->astate,
> -			obj->object, descr->name, hint);
> +		WARN(debug_objects_enabled, KERN_ERR
> +		     "ODEBUG: %s %s (active state %u) object: %p object type: %s hint: %pS\n",
> +		     msg, obj_states[obj->state], obj->astate,
> +		     obj->object, descr->name, hint);
>  	}

Q: Why is this related to the WARN() itself?
A: It's not related at all

The obvious fix is:

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -498,6 +498,14 @@ static void debug_print_object(struct de
 	const struct debug_obj_descr *descr = obj->descr;
 	static int limit;
 
+	/*
+	 * OOM handling is asynchronous for performance reasons. So the
+	 * call site might have raced with a concurrent OOM which cleared
+	 * the hash buckets.
+	 */
+	if (!debug_objects_enabled)
+		return;
+
 	if (limit < 5 && descr != descr_test) {
 		void *hint = descr->debug_hint ?
 			descr->debug_hint(obj->object) : NULL;

Along with a understandable changelog, no?

Thanks,

        tglx

  reply	other threads:[~2023-06-06 22:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-29 14:31 [syzbot] [kernel?] WARNING: ODEBUG bug in __mod_timer syzbot
2023-05-29 14:39 ` [PATCH] debugobjects: turn off debug_objects_enabled from debug_objects_oom() Tetsuo Handa
2023-06-06 22:35   ` Thomas Gleixner [this message]
2023-06-07 10:19     ` [PATCH v2] debugobjects: recheck debug_objects_enabled before reporting Tetsuo Handa
2023-06-07 12:20       ` [tip: core/debugobjects] debugobjects: Recheck " tip-bot2 for Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qiokyma.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=swboyd@chromium.org \
    --cc=syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=wuchi.zero@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.