linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "liuqi (BA)" <liuqi115@huawei.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	"anil.s.keshavamurthy@intel.com" <anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	Linuxarm <linuxarm@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Fri, 23 Jul 2021 02:43:22 +0000	[thread overview]
Message-ID: <6c71c1bba0124dedac9cb09c29465e04@hisilicon.com> (raw)
In-Reply-To: <20210723000318.5594c86e7c454aed82d9465d@kernel.org>



> -----Original Message-----
> From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
> Sent: Friday, July 23, 2021 3:03 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: liuqi (BA) <liuqi115@huawei.com>; catalin.marinas@arm.com;
> will@kernel.org; naveen.n.rao@linux.ibm.com; anil.s.keshavamurthy@intel.com;
> davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B)
> <prime.zeng@hisilicon.com>; robin.murphy@arm.com; Linuxarm
> <linuxarm@huawei.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
> 
> Hi Song,
> 
> On Thu, 22 Jul 2021 10:24:54 +0000
> "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
> > > Sent: Wednesday, July 21, 2021 8:42 PM
> > > To: liuqi (BA) <liuqi115@huawei.com>
> > > Cc: catalin.marinas@arm.com; will@kernel.org; naveen.n.rao@linux.ibm.com;
> > > anil.s.keshavamurthy@intel.com; davem@davemloft.net;
> > > linux-arm-kernel@lists.infradead.org; Song Bao Hua (Barry Song)
> > > <song.bao.hua@hisilicon.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> > > robin.murphy@arm.com; Linuxarm <linuxarm@huawei.com>;
> > > linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
> > >
> > > Hi Qi,
> > >
> > > Thanks for your effort!
> > >
> > > On Mon, 19 Jul 2021 20:24:17 +0800
> > > Qi Liu <liuqi115@huawei.com> wrote:
> > >
> > > > This patch introduce optprobe for ARM64. In optprobe, probed
> > > > instruction is replaced by a branch instruction to detour
> > > > buffer. Detour buffer contains trampoline code and a call to
> > > > optimized_callback(). optimized_callback() calls opt_pre_handler()
> > > > to execute kprobe handler.
> > >
> > > OK so this will replace only one instruction.
> > >
> > > >
> > > > Limitations:
> > > > - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
> > > > guarantee the offset between probe point and kprobe pre_handler
> > > > is not larger than 128MiB.
> > >
> > > Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
> > > allocate an intermediate trampoline area similar to arm optprobe
> > > does.
> >
> > Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
> > RANDOMIZE_BASE according to arch/arm64/Kconfig:
> > config RANDOMIZE_BASE
> > 	bool "Randomize the address of the kernel image"
> > 	select ARM64_MODULE_PLTS if MODULES
> > 	select RELOCATABLE
> 
> Yes, but why it is required for "RANDOMIZE_BASE"?
> Does that imply the module call might need to use PLT in
> some cases?
> 
> >
> > Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
> > allowing RANDOMIZE_BASE via avoiding long jump according to:
> > arch/arm64/Kconfig:
> >
> > config RANDOMIZE_MODULE_REGION_FULL
> > 	bool "Randomize the module region over a 4 GB range"
> > 	depends on RANDOMIZE_BASE
> > 	default y
> > 	help
> > 	  Randomizes the location of the module region inside a 4 GB window
> > 	  covering the core kernel. This way, it is less likely for modules
> > 	  to leak information about the location of core kernel data structures
> > 	  but it does imply that function calls between modules and the core
> > 	  kernel will need to be resolved via veneers in the module PLT.
> >
> > 	  When this option is not set, the module region will be randomized over
> > 	  a limited range that contains the [_stext, _etext] interval of the
> > 	  core kernel, so branch relocations are always in range.
> 
> Hmm, this dependency looks strange. If it always in range, don't we need
> PLT for modules?
> 
> Cataline, would you know why?
> Maybe it's a KASLR's Kconfig issue?

I actually didn't see any problem after making this change:

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e07e7de9ac49..6440671b72e0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1781,7 +1781,6 @@ config RELOCATABLE

 config RANDOMIZE_BASE
        bool "Randomize the address of the kernel image"
-       select ARM64_MODULE_PLTS if MODULES
        select RELOCATABLE
        help
          Randomizes the virtual address at which the kernel image is
@@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
 config RANDOMIZE_MODULE_REGION_FULL
        bool "Randomize the module region over a 4 GB range"
        depends on RANDOMIZE_BASE
+       select ARM64_MODULE_PLTS if MODULES
        default y
        help
          Randomizes the location of the module region inside a 4 GB window

and having this config:
# zcat /proc/config.gz | grep RANDOMIZE_BASE
CONFIG_RANDOMIZE_BASE=y

# zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
# CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set

# zcat /proc/config.gz | grep ARM64_MODULE_PLTS
# CONFIG_ARM64_MODULE_PLTS is not set

Modules work all good:
# lsmod
Module                  Size  Used by
btrfs                1355776  0
blake2b_generic        20480  0
libcrc32c              16384  1 btrfs
xor                    20480  1 btrfs
xor_neon               16384  1 xor
zstd_compress         163840  1 btrfs
raid6_pq              110592  1 btrfs
ctr                    16384  0
md5                    16384  0
ip_tunnel              32768  0
ipv6                  442368  28


I am not quite sure if there is a corner case. If no,
I would think the kconfig might be some improper.


> 
> >
> > and
> >
> > arch/arm64/kernel/kaslr.c:
> > 	if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
> > 		/*
> > 		 * Randomize the module region over a 2 GB window covering the
> > 		 * kernel. This reduces the risk of modules leaking information
> > 		 * about the address of the kernel itself, but results in
> > 		 * branches between modules and the core kernel that are
> > 		 * resolved via PLTs. (Branches between modules will be
> > 		 * resolved normally.)
> > 		 */
> > 		module_range = SZ_2G - (u64)(_end - _stext);
> > 		module_alloc_base = max((u64)_end + offset - SZ_2G,
> > 					(u64)MODULES_VADDR);
> > 	} else {
> > 		/*
> > 		 * Randomize the module region by setting module_alloc_base to
> > 		 * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE,
> > 		 * _stext) . This guarantees that the resulting region still
> > 		 * covers [_stext, _etext], and that all relative branches can
> > 		 * be resolved without veneers.
> > 		 */
> > 		module_range = MODULES_VSIZE - (u64)(_etext - _stext);
> > 		module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
> > 	}
> >
> > So depending on ! ARM64_MODULE_PLTS seems to narrow the scenarios
> > while depending on ! RANDOMIZE_MODULE_REGION_FULL  permit more
> > machines to use optprobe.
> 
> OK, I see that the code ensures the range will be in the MODULE_VSIZE (=128MB).
> 
> >
> > I am not quite sure I am 100% right but tests seem to back this.
> > hopefully Catalin and Will can correct me.
> >
> > >
> > > >
> > > > Performance of optprobe on Hip08 platform is test using kprobe
> > > > example module[1] to analyze the latency of a kernel function,
> > > > and here is the result:
> > > >
> > > > [1]
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
> > > mples/kprobes/kretprobe_example.c
> > > >
> > > > kprobe before optimized:
> > > > [280709.846380] do_empty returned 0 and took 1530 ns to execute
> > > > [280709.852057] do_empty returned 0 and took 550 ns to execute
> > > > [280709.857631] do_empty returned 0 and took 440 ns to execute
> > > > [280709.863215] do_empty returned 0 and took 380 ns to execute
> > > > [280709.868787] do_empty returned 0 and took 360 ns to execute
> > > > [280709.874362] do_empty returned 0 and took 340 ns to execute
> > > > [280709.879936] do_empty returned 0 and took 320 ns to execute
> > > > [280709.885505] do_empty returned 0 and took 300 ns to execute
> > > > [280709.891075] do_empty returned 0 and took 280 ns to execute
> > > > [280709.896646] do_empty returned 0 and took 290 ns to execute
> > > > [280709.902220] do_empty returned 0 and took 290 ns to execute
> > > > [280709.907807] do_empty returned 0 and took 290 ns to execute
> > > >
> > > > optprobe:
> > > > [ 2965.964572] do_empty returned 0 and took 90 ns to execute
> > > > [ 2965.969952] do_empty returned 0 and took 80 ns to execute
> > > > [ 2965.975332] do_empty returned 0 and took 70 ns to execute
> > > > [ 2965.980714] do_empty returned 0 and took 60 ns to execute
> > > > [ 2965.986128] do_empty returned 0 and took 80 ns to execute
> > > > [ 2965.991507] do_empty returned 0 and took 70 ns to execute
> > > > [ 2965.996884] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.002262] do_empty returned 0 and took 80 ns to execute
> > > > [ 2966.007642] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.013020] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.018400] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.023779] do_empty returned 0 and took 70 ns to execute
> > > > [ 2966.029158] do_empty returned 0 and took 70 ns to execute
> > >
> > > Great result!
> > > I have other comments on the code below.
> > >
> > > [...]
> > > > diff --git a/arch/arm64/kernel/probes/kprobes.c
> > > b/arch/arm64/kernel/probes/kprobes.c
> > > > index 6dbcc89f6662..83755ad62abe 100644
> > > > --- a/arch/arm64/kernel/probes/kprobes.c
> > > > +++ b/arch/arm64/kernel/probes/kprobes.c
> > > > @@ -11,6 +11,7 @@
> > > >  #include <linux/kasan.h>
> > > >  #include <linux/kernel.h>
> > > >  #include <linux/kprobes.h>
> > > > +#include <linux/moduleloader.h>
> > > >  #include <linux/sched/debug.h>
> > > >  #include <linux/set_memory.h>
> > > >  #include <linux/slab.h>
> > > > @@ -113,9 +114,21 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
> > > >
> > > >  void *alloc_insn_page(void)
> > > >  {
> > > > -	return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> VMALLOC_END,
> > > > -			GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
> > > > -			NUMA_NO_NODE, __builtin_return_address(0));
> > > > +	void *page;
> > > > +
> > > > +	page = module_alloc(PAGE_SIZE);
> > > > +	if (!page)
> > > > +		return NULL;
> > > > +
> > > > +	set_vm_flush_reset_perms(page);
> > > > +	/*
> > > > +	 * First make the page read-only, and only then make it executable
> to
> > > > +	 * prevent it from being W+X in between.
> > > > +	 */
> > > > +	set_memory_ro((unsigned long)page, 1);
> > > > +	set_memory_x((unsigned long)page, 1);
> > > > +
> > > > +	return page;
> > >
> > > Isn't this a separated change? Or any reason why you have to
> > > change this function?
> >
> > As far as I can tell, this is still related with the 128MB
> > short jump limitation.
> > VMALLOC_START, VMALLOC_END is an fixed virtual address area
> > which isn't necessarily modules will be put.
> > So this patch is moving to module_alloc() which will get
> > memory between module_alloc_base and module_alloc_end.
> 
> Ah, I missed that point. Yes, VMALLOC_START and VMALLOC_END
> are not correct range.
> 
> >
> > Together with depending on !RANDOMIZE_MODULE_REGION_FULL,
> > this makes all kernel, module and trampoline in short
> > jmp area.
> >
> > As long as we can figure out a way to support long jmp
> > for optprobe, the change in alloc_insn_page() can be
> > dropped.
> 
> No, I think above change is rather readable, so it is OK.
> 
> >
> > Masami, any reference code from any platform to support long
> > jump for optprobe? For long jmp, we need to put jmp address
> > to a memory and then somehow load the target address
> > to PC. Right now, we are able to replace an instruction
> > only. That is the problem.
> 
> Hmm, I had read a paper about 2-stage jump idea 15years ago. That
> paper allocated an intermediate trampoline (like PLT) which did a long
> jump to the real trampoline on SPARC.
> (something like, "push x0; ldr x0, [pc+8]; br x0; <immediate-addr>" for
> a slot of the intermediate trampoline.)
> 
> For the other (simpler) solution example is optprobe in powerpc
> (arch/powerpc/kernel/optprobes_head.S). That reserves a buffer page
> in the text section, and use it.
> 
> But I think your current implementation is good enough for the
> first step. If someone needs CONFIG_RANDOMIZE_MODULE_REGION_FULL
> and optprobe, we can revisit this point.
> 
> Thank you,
> 
> 

Thanks
Barry


  reply	other threads:[~2021-07-23  2:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 12:24 [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 Qi Liu
2021-07-21  8:41 ` Masami Hiramatsu
2021-07-22 10:24   ` Song Bao Hua (Barry Song)
2021-07-22 15:03     ` Masami Hiramatsu
2021-07-23  2:43       ` Song Bao Hua (Barry Song) [this message]
2021-07-30 10:04       ` Song Bao Hua (Barry Song)
2021-07-31  1:15         ` Masami Hiramatsu
2021-07-31 12:21           ` Song Bao Hua (Barry Song)
2021-08-02  3:52             ` liuqi (BA)
2021-08-02  3:59               ` liuqi (BA)
2021-08-02 12:02               ` liuqi (BA)
2021-07-30  3:32   ` liuqi (BA)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c71c1bba0124dedac9cb09c29465e04@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liuqi115@huawei.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).