All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kaneda, Erik" <erik.kaneda@intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	Vegard Nossum <vegard.nossum@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Moore, Robert" <robert.moore@intel.com>,
	Kees Cook <keescook@chromium.org>,
	"Christoph Lameter" <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Marco Elver" <elver@google.com>,
	Waiman Long <longman@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"ACPI Devel Maling List" <linux-acpi@vger.kernel.org>,
	Len Brown <lenb@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: RE: slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018
Date: Tue, 23 Mar 2021 21:54:57 +0000	[thread overview]
Message-ID: <MWHPR11MB1599940F32F478311285F3AEF0649@MWHPR11MB1599.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAJZ5v0i1AXW0cVstfQZX=TBZNPcZChwkk+fpN2BPYTjxo_XjSQ@mail.gmail.com>



> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Tuesday, March 23, 2021 12:03 PM
> To: Kirill A. Shutemov <kirill@shutemov.name>; Kaneda, Erik
> <erik.kaneda@intel.com>
> Cc: Wysocki, Rafael J <rafael.j.wysocki@intel.com>; Vegard Nossum
> <vegard.nossum@oracle.com>; Vlastimil Babka <vbabka@suse.cz>; Rafael J.
> Wysocki <rafael@kernel.org>; Moore, Robert <robert.moore@intel.com>;
> Kees Cook <keescook@chromium.org>; Christoph Lameter <cl@linux.com>;
> Andrew Morton <akpm@linux-foundation.org>; Marco Elver
> <elver@google.com>; Waiman Long <longman@redhat.com>; LKML <linux-
> kernel@vger.kernel.org>; Linux MM <linux-mm@kvack.org>; ACPI Devel
> Maling List <linux-acpi@vger.kernel.org>; Len Brown <lenb@kernel.org>;
> Steven Rostedt <rostedt@goodmis.org>; Jan Kiszka
> <jan.kiszka@siemens.com>
> Subject: Re: slub freelist issue / BUG: unable to handle page fault for
> address: 000000003ffe0018
> 
> On Tue, Mar 23, 2021 at 7:32 PM Kirill A. Shutemov <kirill@shutemov.name>
> wrote:
> >
> > On Fri, Jun 12, 2020 at 02:26:58PM +0200, Rafael J. Wysocki wrote:
> > > On 6/11/2020 3:40 AM, Kaneda, Erik wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Vegard Nossum <vegard.nossum@oracle.com>
> > > > > Sent: Friday, June 5, 2020 7:45 AM
> > > > > To: Vlastimil Babka <vbabka@suse.cz>; Rafael J. Wysocki
> > > > > <rafael@kernel.org>; Moore, Robert <robert.moore@intel.com>;
> Kaneda,
> > > > > Erik <erik.kaneda@intel.com>
> > > > > Cc: Kees Cook <keescook@chromium.org>; Wysocki, Rafael J
> > > > > <rafael.j.wysocki@intel.com>; Christoph Lameter <cl@linux.com>;
> Andrew
> > > > > Morton <akpm@linux-foundation.org>; Marco Elver
> <elver@google.com>;
> > > > > Waiman Long <longman@redhat.com>; LKML <linux-
> > > > > kernel@vger.kernel.org>; Linux MM <linux-mm@kvack.org>; ACPI
> Devel
> > > > > Maling List <linux-acpi@vger.kernel.org>; Len Brown
> <lenb@kernel.org>;
> > > > > Steven Rostedt <rostedt@goodmis.org>
> > > > > Subject: Re: slub freelist issue / BUG: unable to handle page fault for
> > > > > address: 000000003ffe0018
> > > > >
> > > > > On 2020-06-05 16:08, Vlastimil Babka wrote:
> > > > > > On 6/5/20 3:12 PM, Rafael J. Wysocki wrote:
> > > > > > > On Fri, Jun 5, 2020 at 2:48 PM Vegard Nossum
> > > > > <vegard.nossum@oracle.com> wrote:
> > > > > > > > On 2020-06-05 11:36, Vegard Nossum wrote:
> > > > > > > > > On 2020-06-05 11:11, Vlastimil Babka wrote:
> > > > > > > > > > On 6/4/20 8:46 PM, Vlastimil Babka wrote:
> > > > > > > > > > > On 6/4/20 7:57 PM, Kees Cook wrote:
> > > > > > > > > > > > On Thu, Jun 04, 2020 at 07:20:18PM +0200, Vegard
> Nossum wrote:
> > > > > > > > > > > > > On 2020-06-04 19:18, Vlastimil Babka wrote:
> > > > > > > > > > > > > > On 6/4/20 7:14 PM, Vegard Nossum wrote:
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I ran into a boot problem with latest linus/master
> > > > > > > > > > > > > > > (6929f71e46bdddbf1c4d67c2728648176c67c555)
> that manifests
> > > > > like this:
> > > > > > > > > > > > > > Hi, what's the .config you use?
> > > > > > > > > > > > > Pretty much x86_64 defconfig minus a few options (PCI,
> USB,
> > > > > > > > > > > > > ...)
> > > > > > > > > > > > Oh yes indeed. I immediately crash in the same way with
> this config.
> > > > > > > > > > > > I'll
> > > > > > > > > > > > start digging...
> > > > > > > > > > > >
> > > > > > > > > > > > (defconfig finishes boot)
> > > > > > > > > > > This is funny, booting with slub_debug=F results in:
> > > > > > > > > > > I'm not sure if it's ACPI or ftrace wrong here, but looks like
> > > > > > > > > > > the changed free pointer offset merely exposes a bug in
> something
> > > > > > > > > > > else.
> > > > > > > > > > So, with Kees' patch reverted, booting with slub_debug=F
> (or even
> > > > > > > > > > more specific slub_debug=F,ftrace_event_field) also hits
> this bug
> > > > > > > > > > below. I wanted to bisect it, but v5.7 was also bad, and also
> > > > > > > > > > v5.6. Didn't try further in history. So it's not new at all, and
> > > > > > > > > > likely very specific to your config+QEMU? (and related to the
> ACPI
> > > > > > > > > > error messages that precede it?).
> > > > > > > > > I see it too, but not on v5.0. I can bisect it.
> > > > > > > > commit 67a72420a326b45514deb3f212085fb2cd1595b5
> > > > > > > > Author: Bob Moore <robert.moore@intel.com>
> > > > > > > > Date:   Fri Aug 16 14:43:21 2019 -0700
> > > > > > > >
> > > > > > > >        ACPICA: Increase total number of possible Owner IDs
> > > > > > > >
> > > > > > > >        ACPICA commit
> 1f1652dad88b9d767767bc1f7eb4f7d99e6b5324
> > > > > > > >
> > > > > > > >        From 255 to 4095 possible IDs.
> > > > > > > >
> > > > > > > >        Link: https://github.com/acpica/acpica/commit/1f1652da
> > > > > > > >        Reported-by: Hedi Berriche <hedi.berriche @hpe.com>
> > > > > > > >        Signed-off-by: Bob Moore <robert.moore@intel.com>
> > > > > > > >        Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
> > > > > > > >        Signed-off-by: Rafael J. Wysocki
> <rafael.j.wysocki@intel.com>
> > > > > > > Bob, Erik, did we miss something in that patch?
> > > > > > Maybe the patch just changes layout in a way that exposes the bug.
> > > > > >
> > > > > > Anyway the "ftrace_event_field" cache is not really involved, this is
> > > > > > just because of slab merging. After adding "slub_nomerge" to
> > > > > > "slub_debug=F", it starts making more sense, as the cache becomes
> > > > > > Acpi-Namespace
> > > > > >
> > > > > > [    0.140408] ------------[ cut here ]------------
> > > > > > [    0.140837] cache_from_obj: Wrong slab cache. Acpi-Namespace
> but
> > > > > object is from kmalloc-64
> > > > > > [    0.141406] WARNING: CPU: 0 PID: 1 at mm/slab.h:524
> > > > > kmem_cache_free+0x1d3/0x250
> > > > > > [    0.142105] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0+ #45
> > > > > > [    0.142393] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996),
> > > > > BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> > > > > > [    0.142393] RIP: 0010:kmem_cache_free+0x1d3/0x250
> > > > > > [    0.142393] Code: 18 4d 85 ed 0f 84 10 ff ff ff 4c 39 ed 74 2f 49 8b 4d
> 58 48
> > > > > 8b 55 58 48 c7 c6 10 47 a1 ac 48 c7 c7 00 c2 b0 ac e8 b1 cc eb ff <0f> 0b
> 48 89 de
> > > > > 4c 89 ef e8 10 d7 ff ff 48 8b 15 59 36 9b 00 4c 89
> > > > > > [    0.142393] RSP: 0018:ffffb39cc0013dc0 EFLAGS: 00010282
> > > > > > [    0.142393] RAX: 0000000000000000 RBX: ffff937287409e00 RCX:
> > > > > 0000000000000000
> > > > > > [    0.142393] RDX: 0000000000000001 RSI: 0000000000000092 RDI:
> > > > > ffffffffacfdd32c
> > > > > > [    0.142393] RBP: ffff93728742ef00 R08: ffffb39cc0013c7d R09:
> > > > > 00000000000000fc
> > > > > > [    0.142393] R10: ffffb39cc0013c78 R11: ffffb39cc0013c7d R12:
> > > > > ffff937307409e00
> > > > > > [    0.142393] R13: ffff937287401d00 R14: 0000000000000000 R15:
> > > > > 0000000000000000
> > > > > > [    0.142393] FS:  0000000000000000(0000)
> GS:ffff937287a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > > [    0.142393] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [    0.142393] CR2: 0000000000000000 CR3: 0000000003a0a000 CR4:
> > > > > 00000000003406f0
> > > > > > [    0.142393] Call Trace:
> > > > > > [    0.142393]  acpi_os_release_object+0x5/0x10
> > > > > > [    0.142393]  acpi_ns_delete_children+0x46/0x59
> > > > > > [    0.142393]  acpi_ns_delete_namespace_subtree+0x5c/0x79
> > > > > > [    0.142393]  ? acpi_sleep_proc_init+0x1f/0x1f
> > > > > > [    0.142393]  acpi_ns_terminate+0xc/0x31
> > > > > > [    0.142393]  acpi_ut_subsystem_shutdown+0x45/0xa3
> > > > > > [    0.142393]  ? acpi_sleep_proc_init+0x1f/0x1f
> > > > > > [    0.142393]  acpi_terminate+0x5/0xf
> > > > > > [    0.142393]  acpi_init+0x27b/0x308
> > > > > > [    0.142393]  ? video_setup+0x79/0x79
> > > > > > [    0.142393]  do_one_initcall+0x7b/0x160
> > > > > > [    0.142393]  kernel_init_freeable+0x190/0x1f2
> > > > > > [    0.142393]  ? rest_init+0x9a/0x9a
> > > > > > [    0.142393]  kernel_init+0x5/0xf6
> > > > > > [    0.142393]  ret_from_fork+0x22/0x30
> > > > > > [    0.142393] ---[ end trace 3539f236ef812ba1 ]---
> > > > > > [    0.142396] ------------[ cut here ]------------
> > > > > >
> > > > > > I've also changed the warning so it's not printed just once, and also
> > > > > > prints tracking info (see the hunk at the end of my mail, I'll turn this
> to a
> > > > > proper patch later).
> > > > > > With "slub_debug=FU slub_nomerge" there are now multiple
> warnings,
> > > > > but they all look the same:
> > > > > > [    0.143815] ------------[ cut here ]------------
> > > > > > [    0.144131] cache_from_obj: Wrong slab cache. Acpi-Namespace
> but
> > > > > object is from kmalloc-64
> > > > > > [    0.144929] WARNING: CPU: 0 PID: 1 at mm/slab.h:524
> > > > > kmem_cache_free+0x1d3/0x250
> > > > > > [    0.145129] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0+ #45
> > > > > > [    0.145129] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996),
> > > > > BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> > > > > > [    0.145129] RIP: 0010:kmem_cache_free+0x1d3/0x250
> > > > > > [    0.145129] Code: 18 4d 85 ed 0f 84 10 ff ff ff 4c 39 ed 74 2f 49 8b 4d
> 58 48
> > > > > 8b 55 58 48 c7 c6 10 47 c1 8d 48 c7 c7 00 c2 d0 8d e8 b1 cc eb ff <0f> 0b
> 48 89 de
> > > > > 4c 89 ef e8 10 d7 ff ff 48 8b 15 59 36 9b 00 4c 89
> > > > > > [    0.145129] RSP: 0018:ffff990b80013dc0 EFLAGS: 00010282
> > > > > > [    0.145129] RAX: 0000000000000000 RBX: ffff972d474ada80 RCX:
> > > > > 0000000000000000
> > > > > > [    0.145129] RDX: 0000000000000001 RSI: 0000000000000092 RDI:
> > > > > ffffffff8e1dd32c
> > > > > > [    0.145129] RBP: ffff972d47425680 R08: ffff990b80013c7d R09:
> > > > > 00000000000000fc
> > > > > > [    0.145129] R10: ffff990b80013c78 R11: ffff990b80013c7d R12:
> > > > > ffff972dc74ada80
> > > > > > [    0.145129] R13: ffff972d474038c0 R14: 0000000000000000 R15:
> > > > > 0000000000000000
> > > > > > [    0.145129] FS:  0000000000000000(0000)
> GS:ffff972d47a00000(0000)
> > > > > knlGS:0000000000000000
> > > > > > [    0.145129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [    0.145129] CR2: 0000000000000000 CR3: 000000000660a000 CR4:
> > > > > 00000000003406f0
> > > > > > [    0.145129] Call Trace:
> > > > > > [    0.145129]  acpi_os_release_object+0x5/0x10
> > > > > > [    0.145129]  acpi_ns_delete_children+0x46/0x59
> > > > > > [    0.145129]  acpi_ns_delete_namespace_subtree+0x5c/0x79
> > > > > > [    0.145129]  ? acpi_sleep_proc_init+0x1f/0x1f
> > > > > > [    0.145129]  acpi_ns_terminate+0xc/0x31
> > > > > > [    0.145129]  acpi_ut_subsystem_shutdown+0x45/0xa3
> > > > > > [    0.145129]  ? acpi_sleep_proc_init+0x1f/0x1f
> > > > > > [    0.145129]  acpi_terminate+0x5/0xf
> > > > > > [    0.145129]  acpi_init+0x27b/0x308
> > > > > > [    0.145129]  ? video_setup+0x79/0x79
> > > > > > [    0.145129]  do_one_initcall+0x7b/0x160
> > > > > > [    0.145129]  kernel_init_freeable+0x190/0x1f2
> > > > > > [    0.145129]  ? rest_init+0x9a/0x9a
> > > > > > [    0.145129]  kernel_init+0x5/0xf6
> > > > > > [    0.145129]  ret_from_fork+0x22/0x30
> > > > > > [    0.145129] ---[ end trace 574554fca7bd06bb ]---
> > > > > > [    0.145133] INFO: Allocated in acpi_ns_root_initialize+0xb6/0x2d1
> age=58
> > > > > cpu=0 pid=0
> > > > > > [    0.145881]  kmem_cache_alloc_trace+0x1a9/0x1c0
> > > > > > [    0.146132]  acpi_ns_root_initialize+0xb6/0x2d1
> > > > > > [    0.146578]  acpi_initialize_subsystem+0x65/0xa8
> > > > > > [    0.147024]  acpi_early_init+0x5d/0xd1
> > > > > > [    0.147132]  start_kernel+0x45b/0x518
> > > > > > [    0.147491]  secondary_startup_64+0xb6/0xc0
> > > > > > [    0.147897] ------------[ cut here ]------------
> > > > > >
> > > > > > And it seems ACPI is allocating an object via kmalloc() and then
> > > > > > freeing it via kmem_cache_free(<"Acpi-Namespace"
> kmem_cache>) which
> > > > > is wrong.
> > > > > > > ./scripts/faddr2line vmlinux 'acpi_ns_root_initialize+0xb6'
> > > > > > acpi_ns_root_initialize+0xb6/0x2d1:
> > > > > > kmalloc at include/linux/slab.h:555
> > > > > > (inlined by) kzalloc at include/linux/slab.h:669 (inlined by)
> > > > > > acpi_os_allocate_zeroed at include/acpi/platform/aclinuxex.h:57
> > > > > > (inlined by) acpi_ns_root_initialize at
> > > > > > drivers/acpi/acpica/nsaccess.c:102
> > > > > >
> > > > Hi Vegard,
> > > >
> > > > > That's it :-) This fixes it for me:
> > > > We'll take this patch for ACPICA and it will be in the next release.
> > > >
> > > > Rafael, do you want to take this as a part of the next rc?
> > >
> > > Yes, I do.
> >
Hi,

> > Folks, what happened to the patch? I don't see it in current upstream.
> >
> > Looks like it got reported again:
> >
> > https://lore.kernel.org/r/a1461e21-c744-767d-6dfc-
> 6641fd3e3ce2@siemens.com
> 
> Well, I'm actually not sure.
> 
> Erik?

Sorry about that. It looks like I missed this. I've submitted a pull request for the ACPICA upstream here: https://github.com/acpica/acpica/pull/680
I've also submitted a Linux-ized version of the patch to the mailing list. Rafael, please take the submitted patch for the next rc.

Thanks and sorry for the mistake!
Erik

      reply	other threads:[~2021-03-23 21:56 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 17:14 slub freelist issue / BUG: unable to handle page fault for address: 000000003ffe0018 Vegard Nossum
2020-06-04 17:18 ` Vlastimil Babka
2020-06-04 17:20   ` Vegard Nossum
2020-06-04 17:51     ` Kees Cook
2020-06-04 17:57     ` Kees Cook
2020-06-04 18:46       ` Vlastimil Babka
2020-06-05  9:11         ` Vlastimil Babka
2020-06-05  9:36           ` Vegard Nossum
2020-06-05 12:47             ` Vegard Nossum
2020-06-05 13:12               ` Rafael J. Wysocki
2020-06-05 13:12                 ` Rafael J. Wysocki
2020-06-05 14:08                 ` Vlastimil Babka
2020-06-05 14:24                   ` Rafael J. Wysocki
2020-06-05 14:24                     ` Rafael J. Wysocki
2020-06-05 14:44                   ` Vegard Nossum
2020-06-05 15:44                     ` Kees Cook
2020-06-05 16:37                       ` Vegard Nossum
2020-06-05 17:51                         ` Kees Cook
2020-06-05 16:55                       ` Vlastimil Babka
2020-06-05 18:46                         ` Kees Cook
2020-06-08 10:51                           ` Vlastimil Babka
2020-06-06  6:46                       ` Rafael J. Wysocki
2020-06-06  6:46                         ` Rafael J. Wysocki
2020-06-05 21:45                     ` Kaneda, Erik
2020-06-11  1:40                     ` Kaneda, Erik
2020-06-11 10:54                       ` Vlastimil Babka
2020-06-12 12:26                       ` Rafael J. Wysocki
2021-03-23 18:32                         ` Kirill A. Shutemov
2021-03-23 18:58                           ` Vegard Nossum
2021-03-23 19:03                           ` Rafael J. Wysocki
2021-03-23 19:03                             ` Rafael J. Wysocki
2021-03-23 21:54                             ` Kaneda, Erik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB1599940F32F478311285F3AEF0649@MWHPR11MB1599.namprd11.prod.outlook.com \
    --to=erik.kaneda@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=elver@google.com \
    --cc=jan.kiszka@siemens.com \
    --cc=keescook@chromium.org \
    --cc=kirill@shutemov.name \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=vbabka@suse.cz \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.