All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Kees Cook <keescook@chromium.org>,
	Mark Brown <broonie@kernel.org>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Kristina Martsenko <kristina.martsenko@arm.com>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Andrew Scull <ascull@google.com>,
	David Brazdil <dbrazdil@google.com>,
	Marc Zyngier <maz@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
Date: Thu, 18 Mar 2021 09:41:54 +0100	[thread overview]
Message-ID: <CAK8P3a0FeuGLYhiPx=GLdewu2P=Hix7cpVsbF05i5WO5T2XPvQ@mail.gmail.com> (raw)
In-Reply-To: <20210317161838.GF12269@arm.com>

On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > index bad2b9eaab22..926cdb597a45 100644
> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > @@ -217,7 +217,7 @@ SECTIONS
> > >             INIT_CALLS
> > >             CON_INITCALL
> > >             INIT_RAM_FS
> > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> >
> > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > we need this change.
>
> Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> is needed for the EFI stub.

I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
been .init.data.* instead.

> However, I gave this a quick try and under Qemu with -cpu max and -smp 2
> (or more) it fails as below. I haven't debugged but the lr points to
> just after the switch_to() call. Maybe some section got discarded and we
> patched in the wrong instructions. It is fine with -cpu host or -smp 1.

Ah, interesting.

> -------------------8<------------------------
> smp: Bringing up secondary CPUs ...
> Detected PIPT I-cache on CPU1
> CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
> Unable to handle kernel paging request at virtual address eb91d81ad2971160
> Mem abort info:
>   ESR = 0x86000004
>   EC = 0x21: IABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> [eb91d81ad2971160] address between user and kernel address ranges
> Internal error: Oops: 86000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
> Stopper: 0x0 <- 0x0
> pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> pc : 0xeb91d81ad2971160
> lr : __schedule+0x230/0x6b8
> sp : ffff80001009bd60
> x29: ffff80001009bd60 x28: 0000000000000000
> x27: ffff0000000a6760 x26: ffff0000000b7540
> x25: 0080000000000000 x24: ffffd81ad3969000
> x23: ffff0000000a6200 x22: 6ee0d81ad2971658
> x21: ffff0000000a6200 x20: ffff000000080000
> x19: ffff00007fbc6bc0 x18: 0000000000000030
> x17: 0000000000000000 x16: 0000000000000000
> x15: 00008952b30a9a9e x14: 0000000000000366
> x13: 0000000000000192 x12: 0000000000000000
> x11: 0000000000000003 x10: 00000000000009b0
> x9 : ffff80001009bd30 x8 : ffff0000000a6c10
> x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30
> x5 : 00000000ffffffff x4 : 0000000000000000
> x3 : 0000000000000008 x2 : 0000000000000000
> x1 : ffff0000000a6200 x0 : ffff0000000a3800
> Call trace:
>  0xeb91d81ad2971160
>  schedule+0x70/0x108
>  schedule_preempt_disabled+0x24/0x40
>  __kthread_parkme+0x68/0xd0
>  kthread+0x138/0x170
>  ret_from_fork+0x10/0x30
> Code: bad PC value
> ---[ end trace af3481062ecef3e7 ]---

This looks like it has just returned from __schedule() to schedule()
and is trying to return from that as well, through code like this:

.L562:
// /git/arm-soc/kernel/sched/core.c:5159: }
        ldp     x19, x20, [sp, 16]      //,,
        ldp     x29, x30, [sp], 32      //,,,
        hint    29 // autiasp
        ret

It looks like pointer authentication gone wrong, which ended up
with dereferencing the broken pointer in x22, and it explains why
it only happens with -cpu max. Presumably this also only happens
on secondary CPUs, so maybe the bit that initializes PAC on
secondary CPUs got discarded?

        Arnd

WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	 Nick Desaulniers <ndesaulniers@google.com>,
	Kees Cook <keescook@chromium.org>,
	 Mark Brown <broonie@kernel.org>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Kristina Martsenko <kristina.martsenko@arm.com>,
	 Ionela Voinescu <ionela.voinescu@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Andrew Scull <ascull@google.com>,
	David Brazdil <dbrazdil@google.com>,
	Marc Zyngier <maz@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 clang-built-linux <clang-built-linux@googlegroups.com>,
	Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
Date: Thu, 18 Mar 2021 09:41:54 +0100	[thread overview]
Message-ID: <CAK8P3a0FeuGLYhiPx=GLdewu2P=Hix7cpVsbF05i5WO5T2XPvQ@mail.gmail.com> (raw)
In-Reply-To: <20210317161838.GF12269@arm.com>

On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > index bad2b9eaab22..926cdb597a45 100644
> > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > @@ -217,7 +217,7 @@ SECTIONS
> > >             INIT_CALLS
> > >             CON_INITCALL
> > >             INIT_RAM_FS
> > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> >
> > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > we need this change.
>
> Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> is needed for the EFI stub.

I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
been .init.data.* instead.

> However, I gave this a quick try and under Qemu with -cpu max and -smp 2
> (or more) it fails as below. I haven't debugged but the lr points to
> just after the switch_to() call. Maybe some section got discarded and we
> patched in the wrong instructions. It is fine with -cpu host or -smp 1.

Ah, interesting.

> -------------------8<------------------------
> smp: Bringing up secondary CPUs ...
> Detected PIPT I-cache on CPU1
> CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
> Unable to handle kernel paging request at virtual address eb91d81ad2971160
> Mem abort info:
>   ESR = 0x86000004
>   EC = 0x21: IABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> [eb91d81ad2971160] address between user and kernel address ranges
> Internal error: Oops: 86000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
> Stopper: 0x0 <- 0x0
> pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> pc : 0xeb91d81ad2971160
> lr : __schedule+0x230/0x6b8
> sp : ffff80001009bd60
> x29: ffff80001009bd60 x28: 0000000000000000
> x27: ffff0000000a6760 x26: ffff0000000b7540
> x25: 0080000000000000 x24: ffffd81ad3969000
> x23: ffff0000000a6200 x22: 6ee0d81ad2971658
> x21: ffff0000000a6200 x20: ffff000000080000
> x19: ffff00007fbc6bc0 x18: 0000000000000030
> x17: 0000000000000000 x16: 0000000000000000
> x15: 00008952b30a9a9e x14: 0000000000000366
> x13: 0000000000000192 x12: 0000000000000000
> x11: 0000000000000003 x10: 00000000000009b0
> x9 : ffff80001009bd30 x8 : ffff0000000a6c10
> x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30
> x5 : 00000000ffffffff x4 : 0000000000000000
> x3 : 0000000000000008 x2 : 0000000000000000
> x1 : ffff0000000a6200 x0 : ffff0000000a3800
> Call trace:
>  0xeb91d81ad2971160
>  schedule+0x70/0x108
>  schedule_preempt_disabled+0x24/0x40
>  __kthread_parkme+0x68/0xd0
>  kthread+0x138/0x170
>  ret_from_fork+0x10/0x30
> Code: bad PC value
> ---[ end trace af3481062ecef3e7 ]---

This looks like it has just returned from __schedule() to schedule()
and is trying to return from that as well, through code like this:

.L562:
// /git/arm-soc/kernel/sched/core.c:5159: }
        ldp     x19, x20, [sp, 16]      //,,
        ldp     x29, x30, [sp], 32      //,,,
        hint    29 // autiasp
        ret

It looks like pointer authentication gone wrong, which ended up
with dereferencing the broken pointer in x22, and it explains why
it only happens with -cpu max. Presumably this also only happens
on secondary CPUs, so maybe the bit that initializes PAC on
secondary CPUs got discarded?

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-03-18  8:43 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25 11:20 [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION Arnd Bergmann
2021-02-25 11:20 ` Arnd Bergmann
2021-02-25 20:16 ` Kees Cook
2021-02-25 20:16   ` Kees Cook
2021-02-26  0:36 ` Sedat Dilek
2021-02-26  0:36   ` Sedat Dilek
2021-02-26  8:14   ` Arnd Bergmann
2021-02-26  8:14     ` Arnd Bergmann
2021-02-26  9:05     ` Sedat Dilek
2021-02-26  9:05       ` Sedat Dilek
2021-02-26  9:51       ` Arnd Bergmann
2021-02-26  9:51         ` Arnd Bergmann
2021-02-26 10:02         ` Sedat Dilek
2021-02-26 10:02           ` Sedat Dilek
2021-02-27 20:13           ` Sedat Dilek
2021-02-26 21:13 ` Fangrui Song
2021-02-26 21:13   ` Fangrui Song
2021-02-27  9:49   ` Arnd Bergmann
2021-02-27  9:49     ` Arnd Bergmann
2021-03-01  1:11     ` Nicholas Piggin
2021-03-01  1:11       ` Nicholas Piggin
2021-03-10 20:49       ` Masahiro Yamada
2021-03-10 20:49         ` Masahiro Yamada
2021-03-10 21:08         ` Arnd Bergmann
2021-03-10 21:08           ` Arnd Bergmann
2021-03-10 21:24           ` Sedat Dilek
2021-03-10 21:24             ` Sedat Dilek
2021-03-10 21:47             ` Nicolas Pitre
2021-03-10 21:47               ` Nicolas Pitre
2021-03-10 21:57               ` Sedat Dilek
2021-03-10 21:57                 ` Sedat Dilek
2021-03-10 22:02           ` Nick Desaulniers
2021-03-10 22:02             ` Nick Desaulniers
2021-03-10 22:08             ` Nicolas Pitre
2021-03-10 22:08               ` Nicolas Pitre
2021-03-10 22:29           ` Fangrui Song
2021-03-10 22:29             ` Fangrui Song
2021-03-10 21:45         ` Rasmus Villemoes
2021-03-10 21:45           ` Rasmus Villemoes
2021-03-10 21:19       ` Nicolas Pitre
2021-03-10 21:19         ` Nicolas Pitre
2021-03-10 22:42         ` Fangrui Song
2021-03-10 22:42           ` Fangrui Song
2021-03-17 14:37 ` Catalin Marinas
2021-03-17 14:37   ` Catalin Marinas
2021-03-17 16:18   ` Catalin Marinas
2021-03-17 16:18     ` Catalin Marinas
2021-03-18  8:41     ` Arnd Bergmann [this message]
2021-03-18  8:41       ` Arnd Bergmann
2021-03-19 12:25       ` Catalin Marinas
2021-03-19 12:25         ` Catalin Marinas
2021-03-19 14:01         ` Arnd Bergmann
2021-03-19 14:01           ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK8P3a0FeuGLYhiPx=GLdewu2P=Hix7cpVsbF05i5WO5T2XPvQ@mail.gmail.com' \
    --to=arnd@kernel.org \
    --cc=ardb@kernel.org \
    --cc=ascull@google.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=clang-built-linux@googlegroups.com \
    --cc=dbrazdil@google.com \
    --cc=geert+renesas@glider.be \
    --cc=ionela.voinescu@arm.com \
    --cc=keescook@chromium.org \
    --cc=kristina.martsenko@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nico@fluxnic.net \
    --cc=vincenzo.frascino@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.