All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Yinan Liu <yinan@linux.alibaba.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)" 
	<linuxppc-dev@lists.ozlabs.org>,
	Sachin Sant <sachinp@linux.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests
Date: Thu, 27 Jan 2022 14:59:31 +0100	[thread overview]
Message-ID: <CAMj1kXHnQcHmxRrTBQmj0Z2JJ6iWvNCQqSjvPqG_oedWpikfSA@mail.gmail.com> (raw)
In-Reply-To: <YfKc8MDwenS1iXqQ@FVFF77S0Q05N>

On Thu, 27 Jan 2022 at 14:24, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 13:59, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 27 Jan 2022 at 13:20, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > > > >
> > > > > > These architectures use place-relative extables for the same reason:
> > > > > > place relative references are resolved at build time rather than at
> > > > > > runtime during relocation, making a build time sort feasible.
> > > > > >
> > > > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > >
> > > > > > Note that the swap routine becomes something like the below, given
> > > > > > that the relative references need to be fixed up after the entry
> > > > > > changes place in the sorted list.
> > > > > >
> > > > > > static void swap_ex(void *a, void *b, int size)
> > > > > > {
> > > > > >         struct exception_table_entry *x = a, *y = b, tmp;
> > > > > >         int delta = b - a;
> > > > > >
> > > > > >         tmp = *x;
> > > > > >         x->insn = y->insn + delta;
> > > > > >         y->insn = tmp.insn - delta;
> > > > > >         ...
> > > > > > }
> > > > > >
> > > > > > As a bonus, the resulting footprint of the table in the image is
> > > > > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > > > > __gnu_mcount_mc.
> > > > >
> > > > > Absolutely -- it'd be great if we could do that for the callsite locations; the
> > > > > difficulty is that the entries are generated by the compiler itself, so we'd
> > > > > either need some build/link time processing to convert each absolute 64-bit
> > > > > value to a relative 32-bit offset, or new compiler options to generate those as
> > > > > relative offsets from the outset.
> > > >
> > > > Don't we use scripts/recordmcount.pl for that?
> > >
> > > Not quite -- we could adjust it to do so, but today it doesn't consider
> > > existing mcount_loc entries, and just generates new ones where the compiler has
> > > generated calls to mcount, which it finds by scanning the instructions in the
> > > binary. Today it is not used:
> > >
> > > * On arm64 when we default to using `-fpatchable-function-entry=N`.  That makes
> > >   the compiler insert 2 NOPs in the function prologue, and log the location of
> > >   that NOP sled to a section called.  `__patchable_function_entries`.
> > >
> > >   We need the compiler to do that since we can't reliably identify 2 NOPs in a
> > >   function prologue as being intended to be a patch site, as e.g. there could
> > >   be notrace functions where the compiler had to insert NOPs for alignment of a
> > >   subsequent brnach or similar.
> > >
> > > * On architectures with `-nop-mcount`. On these, it's necessary to use
> > >   `-mrecord-mcount` to have the compiler log the patch-site, for the same
> > >   reason as with `-fpatchable-function-entry`.
> > >
> > > * On architectures with `-mrecord-mcount` generally, since this avoids the
> > >   overhead of scanning each object.
> > >
> > > * On x86 when objtool is used.
> > >
> >
> > Right.
> >
> > I suppose that on arm64, we can work around this by passing
> > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > targets are prepopulated with the link time value of the respective
> > addresses. It does cause some bloat, which is why we disable that
> > today, but we could make that dependent on ftrace being enabled.
>
> We'd also need to teach the build-time sort to update the relocations, unless
> you mean to also change the boot-time reloc code to RMW with the offset?
>

Why would that be necessary? Every RELA entry has the same effect on
its target address, as it just adds a fixed offset.

> I think for right now the best thing is to disable the build-time sort for
> arm64, but maybe something like that is the right thing to do longer term.
>

Fair enough.

> > I do wonder how much over head we accumulate, though, by having all
> > these relocations, but I suppose that is the situation today in any
> > case.
>
> Yeah; I suspect if we want to do something about that we want to do it more
> generally, and would probably need to do something like the x86 approach and
> rewrite the relocs at build-time to something more compressed. If we applied
> the dynamic relocs with the link-time address we'd only need 4 bytes to
> identify each pointer to apply an offset to.
>
> I'm not exactly sure how we could do that, nor what the trade-off look like in
> practice.
>

It would make sense for -fpic codegen to use relative offsets in
__mcount_loc, but since we don't actually use -fpic on arm64, that
doesn't really help :-)

WARNING: multiple messages have this Message-ID (diff)
From: Ard Biesheuvel <ardb@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sachin Sant <sachinp@linux.ibm.com>,
	Yinan Liu <yinan@linux.alibaba.com>,
	"open list:LINUX FOR POWERPC \(32-BIT AND 64-BIT\)"
	<linuxppc-dev@lists.ozlabs.org>
Subject: Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests
Date: Thu, 27 Jan 2022 14:59:31 +0100	[thread overview]
Message-ID: <CAMj1kXHnQcHmxRrTBQmj0Z2JJ6iWvNCQqSjvPqG_oedWpikfSA@mail.gmail.com> (raw)
In-Reply-To: <YfKc8MDwenS1iXqQ@FVFF77S0Q05N>

On Thu, 27 Jan 2022 at 14:24, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 13:59, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 27 Jan 2022 at 13:20, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > > > >
> > > > > > These architectures use place-relative extables for the same reason:
> > > > > > place relative references are resolved at build time rather than at
> > > > > > runtime during relocation, making a build time sort feasible.
> > > > > >
> > > > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > >
> > > > > > Note that the swap routine becomes something like the below, given
> > > > > > that the relative references need to be fixed up after the entry
> > > > > > changes place in the sorted list.
> > > > > >
> > > > > > static void swap_ex(void *a, void *b, int size)
> > > > > > {
> > > > > >         struct exception_table_entry *x = a, *y = b, tmp;
> > > > > >         int delta = b - a;
> > > > > >
> > > > > >         tmp = *x;
> > > > > >         x->insn = y->insn + delta;
> > > > > >         y->insn = tmp.insn - delta;
> > > > > >         ...
> > > > > > }
> > > > > >
> > > > > > As a bonus, the resulting footprint of the table in the image is
> > > > > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > > > > __gnu_mcount_mc.
> > > > >
> > > > > Absolutely -- it'd be great if we could do that for the callsite locations; the
> > > > > difficulty is that the entries are generated by the compiler itself, so we'd
> > > > > either need some build/link time processing to convert each absolute 64-bit
> > > > > value to a relative 32-bit offset, or new compiler options to generate those as
> > > > > relative offsets from the outset.
> > > >
> > > > Don't we use scripts/recordmcount.pl for that?
> > >
> > > Not quite -- we could adjust it to do so, but today it doesn't consider
> > > existing mcount_loc entries, and just generates new ones where the compiler has
> > > generated calls to mcount, which it finds by scanning the instructions in the
> > > binary. Today it is not used:
> > >
> > > * On arm64 when we default to using `-fpatchable-function-entry=N`.  That makes
> > >   the compiler insert 2 NOPs in the function prologue, and log the location of
> > >   that NOP sled to a section called.  `__patchable_function_entries`.
> > >
> > >   We need the compiler to do that since we can't reliably identify 2 NOPs in a
> > >   function prologue as being intended to be a patch site, as e.g. there could
> > >   be notrace functions where the compiler had to insert NOPs for alignment of a
> > >   subsequent brnach or similar.
> > >
> > > * On architectures with `-nop-mcount`. On these, it's necessary to use
> > >   `-mrecord-mcount` to have the compiler log the patch-site, for the same
> > >   reason as with `-fpatchable-function-entry`.
> > >
> > > * On architectures with `-mrecord-mcount` generally, since this avoids the
> > >   overhead of scanning each object.
> > >
> > > * On x86 when objtool is used.
> > >
> >
> > Right.
> >
> > I suppose that on arm64, we can work around this by passing
> > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > targets are prepopulated with the link time value of the respective
> > addresses. It does cause some bloat, which is why we disable that
> > today, but we could make that dependent on ftrace being enabled.
>
> We'd also need to teach the build-time sort to update the relocations, unless
> you mean to also change the boot-time reloc code to RMW with the offset?
>

Why would that be necessary? Every RELA entry has the same effect on
its target address, as it just adds a fixed offset.

> I think for right now the best thing is to disable the build-time sort for
> arm64, but maybe something like that is the right thing to do longer term.
>

Fair enough.

> > I do wonder how much over head we accumulate, though, by having all
> > these relocations, but I suppose that is the situation today in any
> > case.
>
> Yeah; I suspect if we want to do something about that we want to do it more
> generally, and would probably need to do something like the x86 approach and
> rewrite the relocs at build-time to something more compressed. If we applied
> the dynamic relocs with the link-time address we'd only need 4 bytes to
> identify each pointer to apply an offset to.
>
> I'm not exactly sure how we could do that, nor what the trade-off look like in
> practice.
>

It would make sense for -fpic codegen to use relative offsets in
__mcount_loc, but since we don't actually use -fpic on arm64, that
doesn't really help :-)

  reply	other threads:[~2022-01-27 13:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-24  9:19 [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests Sachin Sant
2022-01-24  9:19 ` Sachin Sant
2022-01-24 12:15 ` Yinan Liu
2022-01-24 16:45   ` Steven Rostedt
2022-01-25  3:20     ` Yinan Liu
2022-01-26 14:37       ` Mark Rutland
2022-01-27 11:46         ` Mark Rutland
2022-01-27 11:46           ` Mark Rutland
2022-01-27 12:03           ` Ard Biesheuvel
2022-01-27 12:03             ` Ard Biesheuvel
2022-01-27 12:20             ` Mark Rutland
2022-01-27 12:20               ` Mark Rutland
2022-01-27 12:22               ` Ard Biesheuvel
2022-01-27 12:22                 ` Ard Biesheuvel
2022-01-27 12:59                 ` Mark Rutland
2022-01-27 12:59                   ` Mark Rutland
2022-01-27 13:07                   ` Ard Biesheuvel
2022-01-27 13:07                     ` Ard Biesheuvel
2022-01-27 13:24                     ` Mark Rutland
2022-01-27 13:24                       ` Mark Rutland
2022-01-27 13:59                       ` Ard Biesheuvel [this message]
2022-01-27 13:59                         ` Ard Biesheuvel
2022-01-27 14:54                         ` Mark Rutland
2022-01-27 14:54                           ` Mark Rutland
2022-01-27 15:01                           ` Ard Biesheuvel
2022-01-27 15:01                             ` Ard Biesheuvel
2022-01-27 12:04           ` Sven Schnelle
2022-01-27 12:04             ` Sven Schnelle
2022-01-27 12:27             ` Mark Rutland
2022-01-27 12:27               ` Mark Rutland
2022-01-27 12:46               ` Steven Rostedt
2022-01-27 12:46                 ` Steven Rostedt
2022-01-27 13:08                 ` Mark Rutland
2022-01-27 13:08                   ` Mark Rutland
2022-01-27 13:16                   ` Sven Schnelle
2022-01-27 13:16                     ` Sven Schnelle
2022-01-27 13:33                     ` Mark Rutland
2022-01-27 13:33                       ` Mark Rutland
2022-01-27 13:55                       ` Steven Rostedt
2022-01-27 13:55                         ` Steven Rostedt
2022-01-27 14:56                         ` Mark Rutland
2022-01-27 14:56                           ` Mark Rutland
2022-01-27 16:41           ` Kees Cook
2022-01-27 16:41             ` Kees Cook
2022-01-25  4:00     ` Sachin Sant
2022-01-25 14:28       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMj1kXHnQcHmxRrTBQmj0Z2JJ6iWvNCQqSjvPqG_oedWpikfSA@mail.gmail.com \
    --to=ardb@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=sachinp@linux.ibm.com \
    --cc=yinan@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.