linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Masami Hiramatsu <mhiramat@kernel.org>,
	"liuqi (BA)" <liuqi115@huawei.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	"anil.s.keshavamurthy@intel.com" <anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	Linuxarm <linuxarm@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Thu, 22 Jul 2021 10:24:54 +0000	[thread overview]
Message-ID: <332df5b7d7bb4bd096b6521ffefaabe6@hisilicon.com> (raw)
In-Reply-To: <20210721174153.34c1898dc9eea135eb0b8be8@kernel.org>



> -----Original Message-----
> From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
> Sent: Wednesday, July 21, 2021 8:42 PM
> To: liuqi (BA) <liuqi115@huawei.com>
> Cc: catalin.marinas@arm.com; will@kernel.org; naveen.n.rao@linux.ibm.com;
> anil.s.keshavamurthy@intel.com; davem@davemloft.net;
> linux-arm-kernel@lists.infradead.org; Song Bao Hua (Barry Song)
> <song.bao.hua@hisilicon.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> robin.murphy@arm.com; Linuxarm <linuxarm@huawei.com>;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
> 
> Hi Qi,
> 
> Thanks for your effort!
> 
> On Mon, 19 Jul 2021 20:24:17 +0800
> Qi Liu <liuqi115@huawei.com> wrote:
> 
> > This patch introduce optprobe for ARM64. In optprobe, probed
> > instruction is replaced by a branch instruction to detour
> > buffer. Detour buffer contains trampoline code and a call to
> > optimized_callback(). optimized_callback() calls opt_pre_handler()
> > to execute kprobe handler.
> 
> OK so this will replace only one instruction.
> 
> >
> > Limitations:
> > - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
> > guarantee the offset between probe point and kprobe pre_handler
> > is not larger than 128MiB.
> 
> Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
> allocate an intermediate trampoline area similar to arm optprobe
> does.

Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
RANDOMIZE_BASE according to arch/arm64/Kconfig:
config RANDOMIZE_BASE
	bool "Randomize the address of the kernel image"
	select ARM64_MODULE_PLTS if MODULES
	select RELOCATABLE

Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
allowing RANDOMIZE_BASE via avoiding long jump according to:
arch/arm64/Kconfig:

config RANDOMIZE_MODULE_REGION_FULL
	bool "Randomize the module region over a 4 GB range"
	depends on RANDOMIZE_BASE
	default y
	help
	  Randomizes the location of the module region inside a 4 GB window
	  covering the core kernel. This way, it is less likely for modules
	  to leak information about the location of core kernel data structures
	  but it does imply that function calls between modules and the core
	  kernel will need to be resolved via veneers in the module PLT.

	  When this option is not set, the module region will be randomized over
	  a limited range that contains the [_stext, _etext] interval of the
	  core kernel, so branch relocations are always in range.

and

arch/arm64/kernel/kaslr.c:
	if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
		/*
		 * Randomize the module region over a 2 GB window covering the
		 * kernel. This reduces the risk of modules leaking information
		 * about the address of the kernel itself, but results in
		 * branches between modules and the core kernel that are
		 * resolved via PLTs. (Branches between modules will be
		 * resolved normally.)
		 */
		module_range = SZ_2G - (u64)(_end - _stext);
		module_alloc_base = max((u64)_end + offset - SZ_2G,
					(u64)MODULES_VADDR);
	} else {
		/*
		 * Randomize the module region by setting module_alloc_base to
		 * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE,
		 * _stext) . This guarantees that the resulting region still
		 * covers [_stext, _etext], and that all relative branches can
		 * be resolved without veneers.
		 */
		module_range = MODULES_VSIZE - (u64)(_etext - _stext);
		module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
	}

So depending on ! ARM64_MODULE_PLTS seems to narrow the scenarios
while depending on ! RANDOMIZE_MODULE_REGION_FULL  permit more
machines to use optprobe.

I am not quite sure I am 100% right but tests seem to back this.
hopefully Catalin and Will can correct me.

> 
> >
> > Performance of optprobe on Hip08 platform is test using kprobe
> > example module[1] to analyze the latency of a kernel function,
> > and here is the result:
> >
> > [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sa
> mples/kprobes/kretprobe_example.c
> >
> > kprobe before optimized:
> > [280709.846380] do_empty returned 0 and took 1530 ns to execute
> > [280709.852057] do_empty returned 0 and took 550 ns to execute
> > [280709.857631] do_empty returned 0 and took 440 ns to execute
> > [280709.863215] do_empty returned 0 and took 380 ns to execute
> > [280709.868787] do_empty returned 0 and took 360 ns to execute
> > [280709.874362] do_empty returned 0 and took 340 ns to execute
> > [280709.879936] do_empty returned 0 and took 320 ns to execute
> > [280709.885505] do_empty returned 0 and took 300 ns to execute
> > [280709.891075] do_empty returned 0 and took 280 ns to execute
> > [280709.896646] do_empty returned 0 and took 290 ns to execute
> > [280709.902220] do_empty returned 0 and took 290 ns to execute
> > [280709.907807] do_empty returned 0 and took 290 ns to execute
> >
> > optprobe:
> > [ 2965.964572] do_empty returned 0 and took 90 ns to execute
> > [ 2965.969952] do_empty returned 0 and took 80 ns to execute
> > [ 2965.975332] do_empty returned 0 and took 70 ns to execute
> > [ 2965.980714] do_empty returned 0 and took 60 ns to execute
> > [ 2965.986128] do_empty returned 0 and took 80 ns to execute
> > [ 2965.991507] do_empty returned 0 and took 70 ns to execute
> > [ 2965.996884] do_empty returned 0 and took 70 ns to execute
> > [ 2966.002262] do_empty returned 0 and took 80 ns to execute
> > [ 2966.007642] do_empty returned 0 and took 70 ns to execute
> > [ 2966.013020] do_empty returned 0 and took 70 ns to execute
> > [ 2966.018400] do_empty returned 0 and took 70 ns to execute
> > [ 2966.023779] do_empty returned 0 and took 70 ns to execute
> > [ 2966.029158] do_empty returned 0 and took 70 ns to execute
> 
> Great result!
> I have other comments on the code below.
> 
> [...]
> > diff --git a/arch/arm64/kernel/probes/kprobes.c
> b/arch/arm64/kernel/probes/kprobes.c
> > index 6dbcc89f6662..83755ad62abe 100644
> > --- a/arch/arm64/kernel/probes/kprobes.c
> > +++ b/arch/arm64/kernel/probes/kprobes.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/kasan.h>
> >  #include <linux/kernel.h>
> >  #include <linux/kprobes.h>
> > +#include <linux/moduleloader.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/set_memory.h>
> >  #include <linux/slab.h>
> > @@ -113,9 +114,21 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
> >
> >  void *alloc_insn_page(void)
> >  {
> > -	return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
> > -			GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS,
> > -			NUMA_NO_NODE, __builtin_return_address(0));
> > +	void *page;
> > +
> > +	page = module_alloc(PAGE_SIZE);
> > +	if (!page)
> > +		return NULL;
> > +
> > +	set_vm_flush_reset_perms(page);
> > +	/*
> > +	 * First make the page read-only, and only then make it executable to
> > +	 * prevent it from being W+X in between.
> > +	 */
> > +	set_memory_ro((unsigned long)page, 1);
> > +	set_memory_x((unsigned long)page, 1);
> > +
> > +	return page;
> 
> Isn't this a separated change? Or any reason why you have to
> change this function?

As far as I can tell, this is still related with the 128MB
short jump limitation.
VMALLOC_START, VMALLOC_END is an fixed virtual address area
which isn't necessarily modules will be put.
So this patch is moving to module_alloc() which will get
memory between module_alloc_base and module_alloc_end.

Together with depending on !RANDOMIZE_MODULE_REGION_FULL,
this makes all kernel, module and trampoline in short
jmp area.

As long as we can figure out a way to support long jmp
for optprobe, the change in alloc_insn_page() can be
dropped.

Masami, any reference code from any platform to support long
jump for optprobe? For long jmp, we need to put jmp address
to a memory and then somehow load the target address
to PC. Right now, we are able to replace an instruction
only. That is the problem.

Thanks
Barry


  reply	other threads:[~2021-07-22 10:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 12:24 [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 Qi Liu
2021-07-21  8:41 ` Masami Hiramatsu
2021-07-22 10:24   ` Song Bao Hua (Barry Song) [this message]
2021-07-22 15:03     ` Masami Hiramatsu
2021-07-23  2:43       ` Song Bao Hua (Barry Song)
2021-07-30 10:04       ` Song Bao Hua (Barry Song)
2021-07-31  1:15         ` Masami Hiramatsu
2021-07-31 12:21           ` Song Bao Hua (Barry Song)
2021-08-02  3:52             ` liuqi (BA)
2021-08-02  3:59               ` liuqi (BA)
2021-08-02 12:02               ` liuqi (BA)
2021-07-30  3:32   ` liuqi (BA)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=332df5b7d7bb4bd096b6521ffefaabe6@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liuqi115@huawei.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).