linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Kees Cook <keescook@chromium.org>,
	Vegard Nossum <vegard.nossum@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Robert Moore <robert.moore@intel.com>,
	Erik Kaneda <erik.kaneda@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marco Elver <elver@google.com>, Waiman Long <longman@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Len Brown <lenb@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
	Roman Gushchin <guro@fb.com>
Subject: Re: slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018
Date: Fri, 5 Jun 2020 18:55:27 +0200	[thread overview]
Message-ID: <92d994be-e4f5-b186-4ad7-21828de44967@suse.cz> (raw)
In-Reply-To: <202006050828.F85A75D13@keescook>


On 6/5/20 5:44 PM, Kees Cook wrote:
> On Fri, Jun 05, 2020 at 04:44:51PM +0200, Vegard Nossum wrote:
>> On 2020-06-05 16:08, Vlastimil Babka wrote:
>> > On 6/5/20 3:12 PM, Rafael J. Wysocki wrote:
>> > > On Fri, Jun 5, 2020 at 2:48 PM Vegard Nossum <vegard.nossum@oracle.com> wrote:
>> > > > 
>> > > > On 2020-06-05 11:36, Vegard Nossum wrote:
>> > > > > 
>> > > > > On 2020-06-05 11:11, Vlastimil Babka wrote:
>> > > > > > So, with Kees' patch reverted, booting with slub_debug=F (or even more
>> > > > > > specific slub_debug=F,ftrace_event_field) also hits this bug below. I
>> > > > > > wanted to bisect it, but v5.7 was also bad, and also v5.6. Didn't try
>> > > > > > further in history. So it's not new at all, and likely very specific to
>> > > > > > your config+QEMU? (and related to the ACPI error messages that precede
>> > > > > > it?).
>> > [...]
>> > [    0.140408] ------------[ cut here ]------------
>> > [    0.140837] cache_from_obj: Wrong slab cache. Acpi-Namespace but object is from kmalloc-64
>> > [    0.141406] WARNING: CPU: 0 PID: 1 at mm/slab.h:524 kmem_cache_free+0x1d3/0x250
> 
> Ah yes! Good. I had improved this check recently too, and I was worried
> the freelist pointer patch was somehow blocking it, but I see now that
> the failing config didn't have CONFIG_SLAB_FREELIST_HARDENED=y. Once
> SLAB_CONSISTENCY_CHECKS was enabled ("slub_debug=F"), it started
> tripping. Whew.
> 
> I wonder if that entire test block should just be removed from
> cache_from_obj():
> 
>         if (!memcg_kmem_enabled() &&
>             !IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
>             !unlikely(s->flags & SLAB_CONSISTENCY_CHECKS))
>                 return s;
> 
> and make this test unconditional? It's mostly only called during free(),
> and shouldn't be too expensive to be made unconditional. Hmm.

Hmm I have a different idea. The whole cache_from_obj() was added because of
kmemcg (commit b9ce5ef49f00d) where per-memcg cache can be different from the
root one. And I just realized this usecase can go away with Roman's series [1].
But cache_from_obj() also kept the original SLUB consistency check case, and you
added the freelist hardening case. If kmemcg use case went away it would be nice
to avoid the virt_to_cache() and check completely again, unless in debugging or
hardened kernel.

Furthermore, the original SLUB debugging case was an unconditional pr_err() plus
WARN_ON_ONCE(1), which was kept by commit b9ce5ef49f00d.  With freelist
hardening this all changed to WARN_ONCE. So the second and later cases are not
reported at all for hardening and also not for explicitly enabled debugging like
in this case, which is IMHO not ideal.

So I propose the following - the freelist hardening case keeps the WARN_ONCE,
but also a one-line pr_err() for each case so they are not silent. The SLUB
debugging case is always a full warning, and printing the tracking info if
enabled and available. Pure kmemcg case does virt_to_cache() for now (until
hopefully removed by Roman's series) but no checking at all. Would that work for
everyone?

[1] https://lore.kernel.org/linux-mm/d7cdecbc-db24-8ced-1a86-6f4534613763@suse.cz/

----8<----
diff --git a/mm/slab.h b/mm/slab.h
index 815e4e9a94cd..1182ca2cb11a 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -275,6 +275,14 @@ static inline int cache_vmstat_idx(struct kmem_cache *s)
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE;
 }
 
+#ifdef CONFIG_SLUB_DEBUG
+void slab_print_tracking(struct kmem_cache *s, void *object);
+#else
+static inline void slab_print_tracking(struct kmem_cache *s, void *object)
+{
+}
+#endif
+
 #ifdef CONFIG_MEMCG_KMEM
 
 /* List of all root caches. */
@@ -520,9 +528,18 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 		return s;
 
 	cachep = virt_to_cache(x);
-	WARN_ONCE(cachep && !slab_equal_or_root(cachep, s),
-		  "%s: Wrong slab cache. %s but object is from %s\n",
-		  __func__, s->name, cachep->name);
+	if (unlikely(s->flags & SLAB_CONSISTENCY_CHECKS)) {
+		if (WARN(cachep && !slab_equal_or_root(cachep, s),
+			  "%s: Wrong slab cache. %s but object is from %s\n",
+			  __func__, s->name, cachep->name))
+			slab_print_tracking(cachep, x);
+	} else if (IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED)) {
+		if (unlikely(cachep && !slab_equal_or_root(cachep, s))) {
+			pr_err("%s: Wrong slab cache. %s but object is from %s\n",
+				  __func__, s->name, cachep->name);
+			WARN_ON_ONCE(1);
+		}
+	}
 	return cachep;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index d4a9a097da50..ff2d817c5a94 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -634,7 +634,7 @@ static void print_track(const char *s, struct track *t, unsigned long pr_time)
 #endif
 }
 
-static void print_tracking(struct kmem_cache *s, void *object)
+void slab_print_tracking(struct kmem_cache *s, void *object)
 {
 	unsigned long pr_time = jiffies;
 	if (!(s->flags & SLAB_STORE_USER))
@@ -698,7 +698,7 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 	unsigned int off;	/* Offset of last byte */
 	u8 *addr = page_address(page);
 
-	print_tracking(s, p);
+	slab_print_tracking(s, p);
 
 	print_page_info(page);
 
@@ -3858,7 +3858,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
 
 		if (!test_bit(slab_index(p, s, addr), map)) {
 			pr_err("INFO: Object 0x%p @offset=%tu\n", p, p - addr);
-			print_tracking(s, p);
+			slab_print_tracking(s, p);
 		}
 	}
 	slab_unlock(page);


  parent reply	other threads:[~2020-06-05 16:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 17:14 slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018 Vegard Nossum
2020-06-04 17:18 ` Vlastimil Babka
2020-06-04 17:20   ` Vegard Nossum
2020-06-04 17:51     ` Kees Cook
2020-06-04 17:57     ` Kees Cook
2020-06-04 18:46       ` Vlastimil Babka
2020-06-05  9:11         ` Vlastimil Babka
2020-06-05  9:36           ` Vegard Nossum
2020-06-05 12:47             ` Vegard Nossum
2020-06-05 13:12               ` Rafael J. Wysocki
2020-06-05 14:08                 ` Vlastimil Babka
2020-06-05 14:24                   ` Rafael J. Wysocki
2020-06-05 14:44                   ` Vegard Nossum
2020-06-05 15:44                     ` Kees Cook
2020-06-05 16:37                       ` Vegard Nossum
2020-06-05 17:51                         ` Kees Cook
2020-06-05 16:55                       ` Vlastimil Babka [this message]
2020-06-05 18:46                         ` Kees Cook
2020-06-08 10:51                           ` Vlastimil Babka
2020-06-06  6:46                       ` Rafael J. Wysocki
2020-06-05 21:45                     ` Kaneda, Erik
2020-06-11  1:40                     ` Kaneda, Erik
2020-06-11 10:54                       ` Vlastimil Babka
2020-06-12 12:26                       ` Rafael J. Wysocki
2021-03-23 18:32                         ` Kirill A. Shutemov
2021-03-23 18:58                           ` Vegard Nossum
2021-03-23 19:03                           ` Rafael J. Wysocki
2021-03-23 21:54                             ` Kaneda, Erik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92d994be-e4f5-b186-4ad7-21828de44967@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=elver@google.com \
    --cc=erik.kaneda@intel.com \
    --cc=guro@fb.com \
    --cc=keescook@chromium.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).