All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guo Ren <guoren@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: David Laight <david.laight@aculab.com>,
	Evgenii Shatokhin <e.shatokhin@yadro.com>,
	"suagrfillet@gmail.com" <suagrfillet@gmail.com>,
	"andy.chiu@sifive.com" <andy.chiu@sifive.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Guo Ren <guoren@linux.alibaba.com>,
	"anup@brainfault.org" <anup@brainfault.org>,
	"paul.walmsley@sifive.com" <paul.walmsley@sifive.com>,
	"palmer@dabbelt.com" <palmer@dabbelt.com>,
	"conor.dooley@microchip.com" <conor.dooley@microchip.com>,
	"heiko@sntech.de" <heiko@sntech.de>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"mhiramat@kernel.org" <mhiramat@kernel.org>,
	"jolsa@redhat.com" <jolsa@redhat.com>, "bp@suse.de" <bp@suse.de>,
	"jpoimboe@kernel.org" <jpoimboe@kernel.org>,
	"linux@yadro.com" <linux@yadro.com>
Subject: Re: [PATCH -next V7 0/7] riscv: Optimize function trace
Date: Fri, 10 Feb 2023 10:21:01 +0800	[thread overview]
Message-ID: <CAJF2gTSL0c2GH0Jy+hrhFsswq3BsoqyC81harC_K=9TspS8eaQ@mail.gmail.com> (raw)
In-Reply-To: <Y+TC037Erd+bsrB7@FVFF77S0Q05N>

On Thu, Feb 9, 2023 at 5:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Feb 09, 2023 at 09:59:33AM +0800, Guo Ren wrote:
> > On Thu, Feb 9, 2023 at 9:51 AM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Thu, Feb 9, 2023 at 6:29 AM David Laight <David.Laight@aculab.com> wrote:
> > > >
> > > > > >   # Note: aligned to 8 bytes
> > > > > >   addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > > > > >   addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > > > > >   addr+00       func:   mv      t0, ra
> > > > > We needn't "mv t0, ra" here because our "jalr" could work with t0 and
> > > > > won't affect ra. Let's do it in the trampoline code, and then we can
> > > > > save another word here.
> > > > > >   addr+04               auipc   t1, ftrace_caller
> > > > > >   addr+08               jalr    ftrace_caller(t1)
> > > >
> > > > Is that some kind of 'load high' and 'add offset' pair?
> > > Yes.
> > >
> > > > I guess 64bit kernels guarantee to put all module code
> > > > within +-2G of the main kernel?
> > > Yes, 32-bit is enough. So we only need one 32-bit literal size for the
> > > current rv64, just like CONFIG_32BIT.
> > We need kernel_addr_base + this 32-bit Literal.
> >
> > @Mark Rutland
> > What do you think the idea about reducing one more 32-bit in
> > call-site? (It also sould work for arm64.)
>
> The literal pointer is for a struct ftrace_ops, which is data, not code.
>
> An ftrace_ops can be allocated from anywhere (e.g. core kernel data, module
> data, linear map, vmalloc space), and so is not guaranteed to be within 2GiB of
> all code. The literal needs to be able to address the entire kernel addresss
> range, and since it can be modified concurrently (with PREEMPT and not using
> stop_machine()) it needs to be possible to read/write atomically. So
> practically speaking it needs to be the native pointer size (i.e. 64-bit on a
> 64-bit kernel).
Got it, thx. Let's use an absolute pointer as the beginning.

>
> Other schemes for compressing that (e.g. using an integer into an array of
> pointers) is possible, but uses more memory and gets more complicated for
> concurrent manipulation, so I would strongly recommend keeping this simple and
> using a native pointer size here.
>
> > > > > Here is the call-site:
> > > > >    # Note: aligned to 8 bytes
> > > > >    addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > > > >    addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > > > >    addr+00               auipc   t0, ftrace_caller
> > > > >    addr+04               jalr    ftrace_caller(t0)
> > > >
> > > > Could you even do something like:
> > > >         addr-n  call ftrace-function
> > > >         addr-n+x        literals
> > > >         addr+0  nop or jmp addr-n
> > > >         addr+4  function_code
> > > Yours cost one more instruction, right?
> > >          addr-12  auipc
> > >          addr-8    jalr
> > >          addr-4    // Literal (32-bits)
> > >          addr+0   nop or jmp addr-n // one more?
> > >          addr+4   function_code
>
> Placing instructions before the entry point is going to confuse symbol
> resolution and unwind code, so I would not recommend that. It also means the
> trampoline will need to re-adjust the return address back into the function,
> but that is relatively simple.
>
> I also think that this is micro-optimizing. The AUPIC *should* be cheap, so
> executing that unconditionally should be fine. I think the form that Guo
> suggested with AUIPC + {JALR || NOP} in the function (and 64-bits reserved
> immediately bfore the function) is the way to go, so long as that does the
> right thing with ra.
>
> Thanks,
> Mark.



-- 
Best Regards
 Guo Ren

WARNING: multiple messages have this Message-ID (diff)
From: Guo Ren <guoren@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: David Laight <david.laight@aculab.com>,
	Evgenii Shatokhin <e.shatokhin@yadro.com>,
	 "suagrfillet@gmail.com" <suagrfillet@gmail.com>,
	"andy.chiu@sifive.com" <andy.chiu@sifive.com>,
	 "linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Guo Ren <guoren@linux.alibaba.com>,
	 "anup@brainfault.org" <anup@brainfault.org>,
	"paul.walmsley@sifive.com" <paul.walmsley@sifive.com>,
	 "palmer@dabbelt.com" <palmer@dabbelt.com>,
	 "conor.dooley@microchip.com" <conor.dooley@microchip.com>,
	"heiko@sntech.de" <heiko@sntech.de>,
	 "rostedt@goodmis.org" <rostedt@goodmis.org>,
	"mhiramat@kernel.org" <mhiramat@kernel.org>,
	 "jolsa@redhat.com" <jolsa@redhat.com>, "bp@suse.de" <bp@suse.de>,
	 "jpoimboe@kernel.org" <jpoimboe@kernel.org>,
	"linux@yadro.com" <linux@yadro.com>
Subject: Re: [PATCH -next V7 0/7] riscv: Optimize function trace
Date: Fri, 10 Feb 2023 10:21:01 +0800	[thread overview]
Message-ID: <CAJF2gTSL0c2GH0Jy+hrhFsswq3BsoqyC81harC_K=9TspS8eaQ@mail.gmail.com> (raw)
In-Reply-To: <Y+TC037Erd+bsrB7@FVFF77S0Q05N>

On Thu, Feb 9, 2023 at 5:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Feb 09, 2023 at 09:59:33AM +0800, Guo Ren wrote:
> > On Thu, Feb 9, 2023 at 9:51 AM Guo Ren <guoren@kernel.org> wrote:
> > >
> > > On Thu, Feb 9, 2023 at 6:29 AM David Laight <David.Laight@aculab.com> wrote:
> > > >
> > > > > >   # Note: aligned to 8 bytes
> > > > > >   addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > > > > >   addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > > > > >   addr+00       func:   mv      t0, ra
> > > > > We needn't "mv t0, ra" here because our "jalr" could work with t0 and
> > > > > won't affect ra. Let's do it in the trampoline code, and then we can
> > > > > save another word here.
> > > > > >   addr+04               auipc   t1, ftrace_caller
> > > > > >   addr+08               jalr    ftrace_caller(t1)
> > > >
> > > > Is that some kind of 'load high' and 'add offset' pair?
> > > Yes.
> > >
> > > > I guess 64bit kernels guarantee to put all module code
> > > > within +-2G of the main kernel?
> > > Yes, 32-bit is enough. So we only need one 32-bit literal size for the
> > > current rv64, just like CONFIG_32BIT.
> > We need kernel_addr_base + this 32-bit Literal.
> >
> > @Mark Rutland
> > What do you think the idea about reducing one more 32-bit in
> > call-site? (It also sould work for arm64.)
>
> The literal pointer is for a struct ftrace_ops, which is data, not code.
>
> An ftrace_ops can be allocated from anywhere (e.g. core kernel data, module
> data, linear map, vmalloc space), and so is not guaranteed to be within 2GiB of
> all code. The literal needs to be able to address the entire kernel addresss
> range, and since it can be modified concurrently (with PREEMPT and not using
> stop_machine()) it needs to be possible to read/write atomically. So
> practically speaking it needs to be the native pointer size (i.e. 64-bit on a
> 64-bit kernel).
Got it, thx. Let's use an absolute pointer as the beginning.

>
> Other schemes for compressing that (e.g. using an integer into an array of
> pointers) is possible, but uses more memory and gets more complicated for
> concurrent manipulation, so I would strongly recommend keeping this simple and
> using a native pointer size here.
>
> > > > > Here is the call-site:
> > > > >    # Note: aligned to 8 bytes
> > > > >    addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > > > >    addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > > > >    addr+00               auipc   t0, ftrace_caller
> > > > >    addr+04               jalr    ftrace_caller(t0)
> > > >
> > > > Could you even do something like:
> > > >         addr-n  call ftrace-function
> > > >         addr-n+x        literals
> > > >         addr+0  nop or jmp addr-n
> > > >         addr+4  function_code
> > > Yours cost one more instruction, right?
> > >          addr-12  auipc
> > >          addr-8    jalr
> > >          addr-4    // Literal (32-bits)
> > >          addr+0   nop or jmp addr-n // one more?
> > >          addr+4   function_code
>
> Placing instructions before the entry point is going to confuse symbol
> resolution and unwind code, so I would not recommend that. It also means the
> trampoline will need to re-adjust the return address back into the function,
> but that is relatively simple.
>
> I also think that this is micro-optimizing. The AUPIC *should* be cheap, so
> executing that unconditionally should be fine. I think the form that Guo
> suggested with AUIPC + {JALR || NOP} in the function (and 64-bits reserved
> immediately bfore the function) is the way to go, so long as that does the
> right thing with ra.
>
> Thanks,
> Mark.



-- 
Best Regards
 Guo Ren

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-02-10  2:21 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  9:05 [PATCH -next V7 0/7] riscv: Optimize function trace guoren
2023-01-12  9:05 ` guoren
2023-01-12  9:05 ` [PATCH -next V7 1/7] riscv: ftrace: Fixup panic by disabling preemption guoren
2023-01-12  9:05   ` guoren
2023-01-12 12:16   ` Mark Rutland
2023-01-12 12:16     ` Mark Rutland
2023-01-12 12:57     ` Mark Rutland
2023-01-12 12:57       ` Mark Rutland
2023-01-28  9:45       ` Guo Ren
2023-01-28  9:45         ` Guo Ren
2023-01-28  9:37     ` Guo Ren
2023-01-28  9:37       ` Guo Ren
2023-01-30 10:54       ` Mark Rutland
2023-01-30 10:54         ` Mark Rutland
2023-02-04  1:19         ` Guo Ren
2023-02-04  1:19           ` Guo Ren
2023-01-12  9:05 ` [PATCH -next V7 2/7] riscv: ftrace: Remove wasted nops for !RISCV_ISA_C guoren
2023-01-12  9:05   ` guoren
2023-01-12  9:05 ` [PATCH -next V7 3/7] riscv: ftrace: Reduce the detour code size to half guoren
2023-01-12  9:05   ` guoren
2023-01-16 14:11   ` Evgenii Shatokhin
2023-01-16 14:11     ` Evgenii Shatokhin
2023-01-12  9:06 ` [PATCH -next V7 4/7] riscv: ftrace: Add ftrace_graph_func guoren
2023-01-12  9:06   ` guoren
2023-01-12  9:06 ` [PATCH -next V7 5/7] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support guoren
2023-01-12  9:06   ` guoren
2023-01-12  9:06 ` [PATCH -next V7 6/7] samples: ftrace: Add riscv support for SAMPLE_FTRACE_DIRECT[_MULTI] guoren
2023-01-12  9:06   ` guoren
2023-01-16 14:30   ` Evgenii Shatokhin
2023-01-16 14:30     ` Evgenii Shatokhin
2023-01-17  9:32     ` Song Shuai
2023-01-17  9:32       ` Song Shuai
2023-01-17 13:16       ` Evgenii Shatokhin
2023-01-17 13:16         ` Evgenii Shatokhin
2023-01-17 16:22         ` Evgenii Shatokhin
2023-01-17 16:22           ` Evgenii Shatokhin
2023-01-18  2:37           ` Song Shuai
2023-01-18  2:37             ` Song Shuai
2023-01-18 15:19             ` Evgenii Shatokhin
2023-01-18 15:19               ` Evgenii Shatokhin
2023-01-19  6:05               ` Guo Ren
2023-01-19  6:05                 ` Guo Ren
2023-02-18 21:30                 ` Palmer Dabbelt
2023-02-18 21:30                   ` Palmer Dabbelt
2023-02-20  2:46                   ` Song Shuai
2023-02-20  2:46                     ` Song Shuai
2023-02-21  3:56                     ` Guo Ren
2023-02-21  3:56                       ` Guo Ren
2023-02-21  4:02                   ` Guo Ren
2023-02-21  4:02                     ` Guo Ren
2023-01-12  9:06 ` [PATCH -next V7 7/7] riscv : select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY guoren
2023-01-12  9:06   ` guoren
2023-01-16 15:02 ` [PATCH -next V7 0/7] riscv: Optimize function trace Evgenii Shatokhin
2023-01-16 15:02   ` Evgenii Shatokhin
2023-02-04  6:40   ` Guo Ren
2023-02-04  6:40     ` Guo Ren
2023-02-06  9:56     ` Mark Rutland
2023-02-06  9:56       ` Mark Rutland
2023-02-07  3:57       ` Guo Ren
2023-02-07  3:57         ` Guo Ren
2023-02-07  9:16         ` Mark Rutland
2023-02-07  9:16           ` Mark Rutland
2023-02-08  2:30           ` Guo Ren
2023-02-08  2:30             ` Guo Ren
2023-02-08 14:46             ` Mark Rutland
2023-02-08 14:46               ` Mark Rutland
2023-02-09  1:31               ` Guo Ren
2023-02-09  1:31                 ` Guo Ren
2023-02-09 22:46                 ` David Laight
2023-02-09 22:46                   ` David Laight
2023-02-10  2:18                   ` Guo Ren
2023-02-10  2:18                     ` Guo Ren
2023-02-08 22:29             ` David Laight
2023-02-08 22:29               ` David Laight
2023-02-09  1:51               ` Guo Ren
2023-02-09  1:51                 ` Guo Ren
2023-02-09  1:59                 ` Guo Ren
2023-02-09  1:59                   ` Guo Ren
2023-02-09  9:54                   ` Mark Rutland
2023-02-09  9:54                     ` Mark Rutland
2023-02-10  2:21                     ` Guo Ren [this message]
2023-02-10  2:21                       ` Guo Ren
2023-02-09  9:00                 ` David Laight
2023-02-09  9:00                   ` David Laight
2023-02-09  9:11                   ` Guo Ren
2023-02-09  9:11                     ` Guo Ren
2023-02-18 21:42 ` patchwork-bot+linux-riscv
2023-02-18 21:42   ` patchwork-bot+linux-riscv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJF2gTSL0c2GH0Jy+hrhFsswq3BsoqyC81harC_K=9TspS8eaQ@mail.gmail.com' \
    --to=guoren@kernel.org \
    --cc=andy.chiu@sifive.com \
    --cc=anup@brainfault.org \
    --cc=bp@suse.de \
    --cc=conor.dooley@microchip.com \
    --cc=david.laight@aculab.com \
    --cc=e.shatokhin@yadro.com \
    --cc=guoren@linux.alibaba.com \
    --cc=heiko@sntech.de \
    --cc=jolsa@redhat.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@yadro.com \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=rostedt@goodmis.org \
    --cc=suagrfillet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.