All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Kees Cook <keescook@chromium.org>,
	Vegard Nossum <vegard.nossum@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Robert Moore <robert.moore@intel.com>,
	Erik Kaneda <erik.kaneda@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marco Elver <elver@google.com>, Waiman Long <longman@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Len Brown <lenb@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
	Roman Gushchin <guro@fb.com>
Subject: Re: slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018
Date: Fri, 5 Jun 2020 18:55:27 +0200	[thread overview]
Message-ID: <92d994be-e4f5-b186-4ad7-21828de44967@suse.cz> (raw)
In-Reply-To: <202006050828.F85A75D13@keescook>


On 6/5/20 5:44 PM, Kees Cook wrote:
> On Fri, Jun 05, 2020 at 04:44:51PM +0200, Vegard Nossum wrote:
>> On 2020-06-05 16:08, Vlastimil Babka wrote:
>> > On 6/5/20 3:12 PM, Rafael J. Wysocki wrote:
>> > > On Fri, Jun 5, 2020 at 2:48 PM Vegard Nossum <vegard.nossum@oracle.com> wrote:
>> > > > 
>> > > > On 2020-06-05 11:36, Vegard Nossum wrote:
>> > > > > 
>> > > > > On 2020-06-05 11:11, Vlastimil Babka wrote:
>> > > > > > So, with Kees' patch reverted, booting with slub_debug=F (or even more
>> > > > > > specific slub_debug=F,ftrace_event_field) also hits this bug below. I
>> > > > > > wanted to bisect it, but v5.7 was also bad, and also v5.6. Didn't try
>> > > > > > further in history. So it's not new at all, and likely very specific to
>> > > > > > your config+QEMU? (and related to the ACPI error messages that precede
>> > > > > > it?).
>> > [...]
>> > [    0.140408] ------------[ cut here ]------------
>> > [    0.140837] cache_from_obj: Wrong slab cache. Acpi-Namespace but object is from kmalloc-64
>> > [    0.141406] WARNING: CPU: 0 PID: 1 at mm/slab.h:524 kmem_cache_free+0x1d3/0x250
> 
> Ah yes! Good. I had improved this check recently too, and I was worried
> the freelist pointer patch was somehow blocking it, but I see now that
> the failing config didn't have CONFIG_SLAB_FREELIST_HARDENED=y. Once
> SLAB_CONSISTENCY_CHECKS was enabled ("slub_debug=F"), it started
> tripping. Whew.
> 
> I wonder if that entire test block should just be removed from
> cache_from_obj():
> 
>         if (!memcg_kmem_enabled() &&
>             !IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
>             !unlikely(s->flags & SLAB_CONSISTENCY_CHECKS))
>                 return s;
> 
> and make this test unconditional? It's mostly only called during free(),
> and shouldn't be too expensive to be made unconditional. Hmm.

Hmm I have a different idea. The whole cache_from_obj() was added because of
kmemcg (commit b9ce5ef49f00d) where per-memcg cache can be different from the
root one. And I just realized this usecase can go away with Roman's series [1].
But cache_from_obj() also kept the original SLUB consistency check case, and you
added the freelist hardening case. If kmemcg use case went away it would be nice
to avoid the virt_to_cache() and check completely again, unless in debugging or
hardened kernel.

Furthermore, the original SLUB debugging case was an unconditional pr_err() plus
WARN_ON_ONCE(1), which was kept by commit b9ce5ef49f00d.  With freelist
hardening this all changed to WARN_ONCE. So the second and later cases are not
reported at all for hardening and also not for explicitly enabled debugging like
in this case, which is IMHO not ideal.

So I propose the following - the freelist hardening case keeps the WARN_ONCE,
but also a one-line pr_err() for each case so they are not silent. The SLUB
debugging case is always a full warning, and printing the tracking info if
enabled and available. Pure kmemcg case does virt_to_cache() for now (until
hopefully removed by Roman's series) but no checking at all. Would that work for
everyone?

[1] https://lore.kernel.org/linux-mm/d7cdecbc-db24-8ced-1a86-6f4534613763@suse.cz/

----8<----
diff --git a/mm/slab.h b/mm/slab.h
index 815e4e9a94cd..1182ca2cb11a 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -275,6 +275,14 @@ static inline int cache_vmstat_idx(struct kmem_cache *s)
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE;
 }
 
+#ifdef CONFIG_SLUB_DEBUG
+void slab_print_tracking(struct kmem_cache *s, void *object);
+#else
+static inline void slab_print_tracking(struct kmem_cache *s, void *object)
+{
+}
+#endif
+
 #ifdef CONFIG_MEMCG_KMEM
 
 /* List of all root caches. */
@@ -520,9 +528,18 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 		return s;
 
 	cachep = virt_to_cache(x);
-	WARN_ONCE(cachep && !slab_equal_or_root(cachep, s),
-		  "%s: Wrong slab cache. %s but object is from %s\n",
-		  __func__, s->name, cachep->name);
+	if (unlikely(s->flags & SLAB_CONSISTENCY_CHECKS)) {
+		if (WARN(cachep && !slab_equal_or_root(cachep, s),
+			  "%s: Wrong slab cache. %s but object is from %s\n",
+			  __func__, s->name, cachep->name))
+			slab_print_tracking(cachep, x);
+	} else if (IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED)) {
+		if (unlikely(cachep && !slab_equal_or_root(cachep, s))) {
+			pr_err("%s: Wrong slab cache. %s but object is from %s\n",
+				  __func__, s->name, cachep->name);
+			WARN_ON_ONCE(1);
+		}
+	}
 	return cachep;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index d4a9a097da50..ff2d817c5a94 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -634,7 +634,7 @@ static void print_track(const char *s, struct track *t, unsigned long pr_time)
 #endif
 }
 
-static void print_tracking(struct kmem_cache *s, void *object)
+void slab_print_tracking(struct kmem_cache *s, void *object)
 {
 	unsigned long pr_time = jiffies;
 	if (!(s->flags & SLAB_STORE_USER))
@@ -698,7 +698,7 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 	unsigned int off;	/* Offset of last byte */
 	u8 *addr = page_address(page);
 
-	print_tracking(s, p);
+	slab_print_tracking(s, p);
 
 	print_page_info(page);
 
@@ -3858,7 +3858,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
 			pr_err("INFO: Object 0x%p @offset=%tu\n", p, p - addr);
-			print_tracking(s, p);
+			slab_print_tracking(s, p);
 		}
 	}
 	slab_unlock(page);

  parent reply	other threads:[~2020-06-05 16:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 17:14 slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018 Vegard Nossum
2020-06-04 17:18 ` Vlastimil Babka
2020-06-04 17:20   ` Vegard Nossum
2020-06-04 17:51     ` Kees Cook
2020-06-04 17:57     ` Kees Cook
2020-06-04 18:46       ` Vlastimil Babka
2020-06-05  9:11         ` Vlastimil Babka
2020-06-05  9:36           ` Vegard Nossum
2020-06-05 12:47             ` Vegard Nossum
2020-06-05 13:12               ` Rafael J. Wysocki
2020-06-05 13:12                 ` Rafael J. Wysocki
2020-06-05 14:08                 ` Vlastimil Babka
2020-06-05 14:24                   ` Rafael J. Wysocki
2020-06-05 14:24                     ` Rafael J. Wysocki
2020-06-05 14:44                   ` Vegard Nossum
2020-06-05 15:44                     ` Kees Cook
2020-06-05 16:37                       ` Vegard Nossum
2020-06-05 17:51                         ` Kees Cook
2020-06-05 16:55                       ` Vlastimil Babka [this message]
2020-06-05 18:46                         ` Kees Cook
2020-06-08 10:51                           ` Vlastimil Babka
2020-06-06  6:46                       ` Rafael J. Wysocki
2020-06-06  6:46                         ` Rafael J. Wysocki
2020-06-05 21:45                     ` Kaneda, Erik
2020-06-11  1:40                     ` Kaneda, Erik
2020-06-11 10:54                       ` Vlastimil Babka
2020-06-12 12:26                       ` Rafael J. Wysocki
2021-03-23 18:32                         ` Kirill A. Shutemov
2021-03-23 18:58                           ` Vegard Nossum
2021-03-23 19:03                           ` Rafael J. Wysocki
2021-03-23 19:03                             ` Rafael J. Wysocki
2021-03-23 21:54                             ` Kaneda, Erik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92d994be-e4f5-b186-4ad7-21828de44967@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=elver@google.com \
    --cc=erik.kaneda@intel.com \
    --cc=guro@fb.com \
    --cc=keescook@chromium.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.