All of lore.kernel.org
 help / color / mirror / Atom feed
From: "liuqi (BA)" <liuqi115@huawei.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Linuxarm <linuxarm@huawei.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	"anil.s.keshavamurthy@intel.com" <anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Mon, 2 Aug 2021 11:52:00 +0800	[thread overview]
Message-ID: <2f32fff3-6b58-583f-8e85-06ec1553d3f4@huawei.com> (raw)
In-Reply-To: <6a97dff6c33c4b84887223de2502bd3d@hisilicon.com>



On 2021/7/31 20:21, Song Bao Hua (Barry Song) wrote:
> 
> 
>> -----Original Message-----
>> From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
>> Sent: Saturday, July 31, 2021 1:16 PM
>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
>> Cc: liuqi (BA) <liuqi115@huawei.com>; catalin.marinas@arm.com;
>> will@kernel.org; naveen.n.rao@linux.ibm.com; anil.s.keshavamurthy@intel.com;
>> davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B)
>> <prime.zeng@hisilicon.com>; robin.murphy@arm.com; Linuxarm
>> <linuxarm@huawei.com>; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
>>
>> On Fri, 30 Jul 2021 10:04:06 +0000
>> "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote:
>>
>>>>>>>
>>>>>>> Hi Qi,
>>>>>>>
>>>>>>> Thanks for your effort!
>>>>>>>
>>>>>>> On Mon, 19 Jul 2021 20:24:17 +0800
>>>>>>> Qi Liu <liuqi115@huawei.com> wrote:
>>>>>>>
>>>>>>>> This patch introduce optprobe for ARM64. In optprobe, probed
>>>>>>>> instruction is replaced by a branch instruction to detour
>>>>>>>> buffer. Detour buffer contains trampoline code and a call to
>>>>>>>> optimized_callback(). optimized_callback() calls opt_pre_handler()
>>>>>>>> to execute kprobe handler.
>>>>>>>
>>>>>>> OK so this will replace only one instruction.
>>>>>>>
>>>>>>>>
>>>>>>>> Limitations:
>>>>>>>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
>>>>>>>> guarantee the offset between probe point and kprobe pre_handler
>>>>>>>> is not larger than 128MiB.
>>>>>>>
>>>>>>> Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
>>>>>>> allocate an intermediate trampoline area similar to arm optprobe
>>>>>>> does.
>>>>>>
>>>>>> Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
>>>>>> RANDOMIZE_BASE according to arch/arm64/Kconfig:
>>>>>> config RANDOMIZE_BASE
>>>>>> 	bool "Randomize the address of the kernel image"
>>>>>> 	select ARM64_MODULE_PLTS if MODULES
>>>>>> 	select RELOCATABLE
>>>>>
>>>>> Yes, but why it is required for "RANDOMIZE_BASE"?
>>>>> Does that imply the module call might need to use PLT in
>>>>> some cases?
>>>>>
>>>>>>
>>>>>> Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
>>>>>> allowing RANDOMIZE_BASE via avoiding long jump according to:
>>>>>> arch/arm64/Kconfig:
>>>>>>
>>>>>> config RANDOMIZE_MODULE_REGION_FULL
>>>>>> 	bool "Randomize the module region over a 4 GB range"
>>>>>> 	depends on RANDOMIZE_BASE
>>>>>> 	default y
>>>>>> 	help
>>>>>> 	  Randomizes the location of the module region inside a 4 GB window
>>>>>> 	  covering the core kernel. This way, it is less likely for modules
>>>>>> 	  to leak information about the location of core kernel data structures
>>>>>> 	  but it does imply that function calls between modules and the core
>>>>>> 	  kernel will need to be resolved via veneers in the module PLT.
>>>>>>
>>>>>> 	  When this option is not set, the module region will be randomized
>> over
>>>>>> 	  a limited range that contains the [_stext, _etext] interval of the
>>>>>> 	  core kernel, so branch relocations are always in range.
>>>>>
>>>>> Hmm, this dependency looks strange. If it always in range, don't we need
>>>>> PLT for modules?
>>>>>
>>>>> Cataline, would you know why?
>>>>> Maybe it's a KASLR's Kconfig issue?
>>>>
>>>> I actually didn't see any problem after making this change:
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index e07e7de9ac49..6440671b72e0 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -1781,7 +1781,6 @@ config RELOCATABLE
>>>>
>>>>   config RANDOMIZE_BASE
>>>>          bool "Randomize the address of the kernel image"
>>>> -       select ARM64_MODULE_PLTS if MODULES
>>>>          select RELOCATABLE
>>>>          help
>>>>            Randomizes the virtual address at which the kernel image is
>>>> @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
>>>>   config RANDOMIZE_MODULE_REGION_FULL
>>>>          bool "Randomize the module region over a 4 GB range"
>>>>          depends on RANDOMIZE_BASE
>>>> +       select ARM64_MODULE_PLTS if MODULES
>>>>          default y
>>>>          help
>>>>            Randomizes the location of the module region inside a 4 GB window
>>>>
>>>> and having this config:
>>>> # zcat /proc/config.gz | grep RANDOMIZE_BASE
>>>> CONFIG_RANDOMIZE_BASE=y
>>>>
>>>> # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
>>>> # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
>>>>
>>>> # zcat /proc/config.gz | grep ARM64_MODULE_PLTS
>>>> # CONFIG_ARM64_MODULE_PLTS is not set
>>>>
>>>> Modules work all good:
>>>> # lsmod
>>>> Module                  Size  Used by
>>>> btrfs                1355776  0
>>>> blake2b_generic        20480  0
>>>> libcrc32c              16384  1 btrfs
>>>> xor                    20480  1 btrfs
>>>> xor_neon               16384  1 xor
>>>> zstd_compress         163840  1 btrfs
>>>> raid6_pq              110592  1 btrfs
>>>> ctr                    16384  0
>>>> md5                    16384  0
>>>> ip_tunnel              32768  0
>>>> ipv6                  442368  28
>>>>
>>>>
>>>> I am not quite sure if there is a corner case. If no,
>>>> I would think the kconfig might be some improper.
>>>
>>> The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL
>>> is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when
>>> we can't get memory from the 128MB area in case the area is exhausted,
>>> we will fall back in module_alloc() to a 2GB area as long as either
>>> of the below two conditions is met:
>>>
>>> 1. KASAN is not enabled
>>> 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled.
>>>
>>> void *module_alloc(unsigned long size)
>>> {
>>> 	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
>>> 	gfp_t gfp_mask = GFP_KERNEL;
>>> 	void *p;
>>>
>>> 	/* Silence the initial allocation */
>>> 	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
>>> 		gfp_mask |= __GFP_NOWARN;
>>>
>>> 	if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
>>> 	    IS_ENABLED(CONFIG_KASAN_SW_TAGS))
>>> 		/* don't exceed the static module region - see below */
>>> 		module_alloc_end = MODULES_END;
>>>
>>> 	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>> 				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
>>> 				NUMA_NO_NODE, __builtin_return_address(0));
>>>
>>> 	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
>>> 	    (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
>>> 	     (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
>>> 	      !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
>>> 		/*
>>> 		 * KASAN without KASAN_VMALLOC can only deal with module
>>> 		 * allocations being served from the reserved module region,
>>> 		 * since the remainder of the vmalloc region is already
>>> 		 * backed by zero shadow pages, and punching holes into it
>>> 		 * is non-trivial. Since the module region is not randomized
>>> 		 * when KASAN is enabled without KASAN_VMALLOC, it is even
>>> 		 * less likely that the module region gets exhausted, so we
>>> 		 * can simply omit this fallback in that case.
>>> 		 */
>>> 		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>> 				module_alloc_base + SZ_2G, GFP_KERNEL,
>>> 				PAGE_KERNEL, 0, NUMA_NO_NODE,
>>> 				__builtin_return_address(0));
>>>
>>> 	if (p && (kasan_module_alloc(p, size) < 0)) {
>>> 		vfree(p);
>>> 		return NULL;
>>> 	}
>>>
>>> 	return p;
>>> }
>>>
>>> This should be happening quite rarely. But maybe arm64's document
>>> needs some minor fixup, otherwise, it is quite confusing.
>>
>> OK, so CONFIG_KASAN_VLALLOC=y and CONFIG_ARM64_MODULE_PLTS=y, the
>> module_alloc()
>> basically returns the memory in 128MB region, but can return the memory in 2GB
>> region. (This is OK because optprobe can filter it out)
>> But CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, there is almost no chance to get
>> the memory in 128MB region.
>>
>> Hmm, for the optprobe in kernel text, maybe we can define 'optinsn_alloc_start'
>> by 'module_alloc_base - (SZ_2G - MODULES_VADDR)' and use __vmalloc_node_range()
>> to avoid this issue. But that is only for the kernel. For the modules, we may
>> always out of 128MB region.
> 
> If we can have some separate PLT entries in each module for optprobe,
> we should be able to short-jump to the PLT entry and then PLT entry
> will further long-jump to detour out of the range. That is exactly
> the duty of PLT.
> 
> Right now, arm64 has support on dynamic_ftrace by adding a
> section in module for ftrace PLT.
> arch/arm64/include/asm/module.lds.h:
> SECTIONS {
> #ifdef CONFIG_ARM64_MODULE_PLTS
> 	.plt 0 (NOLOAD) : { BYTE(0) }
> 	.init.plt 0 (NOLOAD) : { BYTE(0) }
> 	.text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
> #endif
> ...
> }
> 
> arch/arm64/kernel/module.c will initialize some PLT entries
> for ftrace:
> 
> static int module_init_ftrace_plt(const Elf_Ehdr *hdr,
> 				  const Elf_Shdr *sechdrs,
> 				  struct module *mod)
> {
> #if defined(CONFIG_ARM64_MODULE_PLTS) && defined(CONFIG_DYNAMIC_FTRACE)
> 	const Elf_Shdr *s;
> 	struct plt_entry *plts;
> 
> 	s = find_section(hdr, sechdrs, ".text.ftrace_trampoline");
> 	if (!s)
> 		return -ENOEXEC;
> 
> 	plts = (void *)s->sh_addr;
> 
> 	__init_plt(&plts[FTRACE_PLT_IDX], FTRACE_ADDR);
> 
> 	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
> 		__init_plt(&plts[FTRACE_REGS_PLT_IDX], FTRACE_REGS_ADDR);
> 
> 	mod->arch.ftrace_trampolines = plts;
> #endif
> 	return 0;
> }
> 
> Ftrace will then use those PLT entries in arch/arm64/kernel/ftrace.c:
> static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned long addr)
> {
> #ifdef CONFIG_ARM64_MODULE_PLTS
> 	struct plt_entry *plt = mod->arch.ftrace_trampolines;
> 
> 	if (addr == FTRACE_ADDR)
> 		return &plt[FTRACE_PLT_IDX];
> 	if (addr == FTRACE_REGS_ADDR &&
> 	    IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
> 		return &plt[FTRACE_REGS_PLT_IDX];
> #endif
> 	return NULL;
> }
> 
> /*
>   * Turn on the call to ftrace_caller() in instrumented function
>   */
> int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> {
> 	unsigned long pc = rec->ip;
> 	u32 old, new;
> 	long offset = (long)pc - (long)addr;
> 
> 	if (offset < -SZ_128M || offset >= SZ_128M) {
> 		struct module *mod;
> 		struct plt_entry *plt;
> 
> 		if (!IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> 			return -EINVAL;
> 
> 		/*
> 		 * On kernels that support module PLTs, the offset between the
> 		 * branch instruction and its target may legally exceed the
> 		 * range of an ordinary relative 'bl' opcode. In this case, we
> 		 * need to branch via a trampoline in the module.
> 		 *
> 		 * NOTE: __module_text_address() must be called with preemption
> 		 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
> 		 * retains its validity throughout the remainder of this code.
> 		 */
> 		preempt_disable();
> 		mod = __module_text_address(pc);
> 		preempt_enable();
> 
> 		if (WARN_ON(!mod))
> 			return -EINVAL;
> 
> 		plt = get_ftrace_plt(mod, addr);
> 		if (!plt) {
> 			pr_err("ftrace: no module PLT for %ps\n", (void *)addr);
> 			return -EINVAL;
> 		}
> 
> 		addr = (unsigned long)plt;
> 	}
> 
> 	old = aarch64_insn_gen_nop();
> 	new = aarch64_insn_gen_branch_imm(pc, addr, AARCH64_INSN_BRANCH_LINK);
> 
> 	return ftrace_modify_code(pc, old, new, true);
> }
> 
> This might be the direction to go later. Anyway, "Rome wasn't built
> in a day", for this stage, we might focus on optprobe for the case
> of non-randomized module region :-).
> 
> BTW, @liuqi, if users set "nokaslr" in bootargs, will your optprobe
> always work and not fall back to normal kprobe even we remove the
> dependency on RANDOMIZED_MODULE_REGION_FULL?
> 
Hi Barry,

I do some tests on Hip08 platform, using nokaslr in booting cmdline and 
remove dependency on RANDOMIZED_MODULE_REGION_FULL, optprobe seems work.
Here is the log:

estuary:/$ uname -a
Linux (none) 5.13.0-rc4+ #37 SMP PREEMPT Mon Aug 2 08:13:37 CST 2021 
aarch64 GNU/Linux
estuary:/$ zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION
CONFIG_RANDOMIZE_MODULE_REGION_FULL=y
estuary:/$ zcat /proc/config.gz | grep OPTPROBE
CONFIG_OPTPROBES=y
CONFIG_HAVE_OPTPROBES=y
estuary:/$ cat /proc/cmdline
console=ttyAMA0,115200 earlycon=pl011,0x9000000 kpti=off nokaslr
estuary:/$ cat /sys/bus/platform/devices/hello_driver/kprobe_test
[   61.304143] do_empty returned 0 and took 200 ns to execute
[   61.304662] do_empty returned 0 and took 110 ns to execute
[   61.305196] do_empty returned 0 and took 100 ns to execute
[   61.305745] do_empty returned 0 and took 90 ns to execute
[   61.306262] do_empty returned 0 and took 90 ns to execute
[   61.306781] do_empty returned 0 and took 90 ns to execute
[   61.307286] do_empty returned 0 and took 90 ns to execute
[   61.307798] do_empty returned 0 and took 90 ns to execute
[   61.308314] do_empty returned 0 and took 90 ns to execute
[   61.308828] do_empty returned 0 and took 90 ns to execute
[   61.309323] do_empty returned 0 and took 80 ns to execute
[   61.309832] do_empty returned 0 and took 80 ns to execute
[   61.310357] do_empty returned 0 and took 80 ns to execute
[   61.310871] do_empty returned 0 and took 80 ns to execute
[   61.311361] do_empty returned 0 and took 80 ns to execute
[   61.311851] do_empty returned 0 and took 90 ns to execute
[   61.312358] do_empty returned 0 and took 90 ns to execute
[   61.312879] do_empty returned 0 and took 80 ns to execute

Thanks,
Qi

>>
>> Thank you,
>>
>> --
>> Masami Hiramatsu <mhiramat@kernel.org>
> 
> Thanks
> Barry
> .
> 


WARNING: multiple messages have this Message-ID (diff)
From: "liuqi (BA)" <liuqi115@huawei.com>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Linuxarm <linuxarm@huawei.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	 "anil.s.keshavamurthy@intel.com"
	<anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Mon, 2 Aug 2021 11:52:00 +0800	[thread overview]
Message-ID: <2f32fff3-6b58-583f-8e85-06ec1553d3f4@huawei.com> (raw)
In-Reply-To: <6a97dff6c33c4b84887223de2502bd3d@hisilicon.com>



On 2021/7/31 20:21, Song Bao Hua (Barry Song) wrote:
> 
> 
>> -----Original Message-----
>> From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
>> Sent: Saturday, July 31, 2021 1:16 PM
>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
>> Cc: liuqi (BA) <liuqi115@huawei.com>; catalin.marinas@arm.com;
>> will@kernel.org; naveen.n.rao@linux.ibm.com; anil.s.keshavamurthy@intel.com;
>> davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B)
>> <prime.zeng@hisilicon.com>; robin.murphy@arm.com; Linuxarm
>> <linuxarm@huawei.com>; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
>>
>> On Fri, 30 Jul 2021 10:04:06 +0000
>> "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote:
>>
>>>>>>>
>>>>>>> Hi Qi,
>>>>>>>
>>>>>>> Thanks for your effort!
>>>>>>>
>>>>>>> On Mon, 19 Jul 2021 20:24:17 +0800
>>>>>>> Qi Liu <liuqi115@huawei.com> wrote:
>>>>>>>
>>>>>>>> This patch introduce optprobe for ARM64. In optprobe, probed
>>>>>>>> instruction is replaced by a branch instruction to detour
>>>>>>>> buffer. Detour buffer contains trampoline code and a call to
>>>>>>>> optimized_callback(). optimized_callback() calls opt_pre_handler()
>>>>>>>> to execute kprobe handler.
>>>>>>>
>>>>>>> OK so this will replace only one instruction.
>>>>>>>
>>>>>>>>
>>>>>>>> Limitations:
>>>>>>>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
>>>>>>>> guarantee the offset between probe point and kprobe pre_handler
>>>>>>>> is not larger than 128MiB.
>>>>>>>
>>>>>>> Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
>>>>>>> allocate an intermediate trampoline area similar to arm optprobe
>>>>>>> does.
>>>>>>
>>>>>> Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
>>>>>> RANDOMIZE_BASE according to arch/arm64/Kconfig:
>>>>>> config RANDOMIZE_BASE
>>>>>> 	bool "Randomize the address of the kernel image"
>>>>>> 	select ARM64_MODULE_PLTS if MODULES
>>>>>> 	select RELOCATABLE
>>>>>
>>>>> Yes, but why it is required for "RANDOMIZE_BASE"?
>>>>> Does that imply the module call might need to use PLT in
>>>>> some cases?
>>>>>
>>>>>>
>>>>>> Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
>>>>>> allowing RANDOMIZE_BASE via avoiding long jump according to:
>>>>>> arch/arm64/Kconfig:
>>>>>>
>>>>>> config RANDOMIZE_MODULE_REGION_FULL
>>>>>> 	bool "Randomize the module region over a 4 GB range"
>>>>>> 	depends on RANDOMIZE_BASE
>>>>>> 	default y
>>>>>> 	help
>>>>>> 	  Randomizes the location of the module region inside a 4 GB window
>>>>>> 	  covering the core kernel. This way, it is less likely for modules
>>>>>> 	  to leak information about the location of core kernel data structures
>>>>>> 	  but it does imply that function calls between modules and the core
>>>>>> 	  kernel will need to be resolved via veneers in the module PLT.
>>>>>>
>>>>>> 	  When this option is not set, the module region will be randomized
>> over
>>>>>> 	  a limited range that contains the [_stext, _etext] interval of the
>>>>>> 	  core kernel, so branch relocations are always in range.
>>>>>
>>>>> Hmm, this dependency looks strange. If it always in range, don't we need
>>>>> PLT for modules?
>>>>>
>>>>> Cataline, would you know why?
>>>>> Maybe it's a KASLR's Kconfig issue?
>>>>
>>>> I actually didn't see any problem after making this change:
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index e07e7de9ac49..6440671b72e0 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -1781,7 +1781,6 @@ config RELOCATABLE
>>>>
>>>>   config RANDOMIZE_BASE
>>>>          bool "Randomize the address of the kernel image"
>>>> -       select ARM64_MODULE_PLTS if MODULES
>>>>          select RELOCATABLE
>>>>          help
>>>>            Randomizes the virtual address at which the kernel image is
>>>> @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
>>>>   config RANDOMIZE_MODULE_REGION_FULL
>>>>          bool "Randomize the module region over a 4 GB range"
>>>>          depends on RANDOMIZE_BASE
>>>> +       select ARM64_MODULE_PLTS if MODULES
>>>>          default y
>>>>          help
>>>>            Randomizes the location of the module region inside a 4 GB window
>>>>
>>>> and having this config:
>>>> # zcat /proc/config.gz | grep RANDOMIZE_BASE
>>>> CONFIG_RANDOMIZE_BASE=y
>>>>
>>>> # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
>>>> # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
>>>>
>>>> # zcat /proc/config.gz | grep ARM64_MODULE_PLTS
>>>> # CONFIG_ARM64_MODULE_PLTS is not set
>>>>
>>>> Modules work all good:
>>>> # lsmod
>>>> Module                  Size  Used by
>>>> btrfs                1355776  0
>>>> blake2b_generic        20480  0
>>>> libcrc32c              16384  1 btrfs
>>>> xor                    20480  1 btrfs
>>>> xor_neon               16384  1 xor
>>>> zstd_compress         163840  1 btrfs
>>>> raid6_pq              110592  1 btrfs
>>>> ctr                    16384  0
>>>> md5                    16384  0
>>>> ip_tunnel              32768  0
>>>> ipv6                  442368  28
>>>>
>>>>
>>>> I am not quite sure if there is a corner case. If no,
>>>> I would think the kconfig might be some improper.
>>>
>>> The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL
>>> is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when
>>> we can't get memory from the 128MB area in case the area is exhausted,
>>> we will fall back in module_alloc() to a 2GB area as long as either
>>> of the below two conditions is met:
>>>
>>> 1. KASAN is not enabled
>>> 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled.
>>>
>>> void *module_alloc(unsigned long size)
>>> {
>>> 	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
>>> 	gfp_t gfp_mask = GFP_KERNEL;
>>> 	void *p;
>>>
>>> 	/* Silence the initial allocation */
>>> 	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
>>> 		gfp_mask |= __GFP_NOWARN;
>>>
>>> 	if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
>>> 	    IS_ENABLED(CONFIG_KASAN_SW_TAGS))
>>> 		/* don't exceed the static module region - see below */
>>> 		module_alloc_end = MODULES_END;
>>>
>>> 	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>> 				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
>>> 				NUMA_NO_NODE, __builtin_return_address(0));
>>>
>>> 	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
>>> 	    (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
>>> 	     (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
>>> 	      !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
>>> 		/*
>>> 		 * KASAN without KASAN_VMALLOC can only deal with module
>>> 		 * allocations being served from the reserved module region,
>>> 		 * since the remainder of the vmalloc region is already
>>> 		 * backed by zero shadow pages, and punching holes into it
>>> 		 * is non-trivial. Since the module region is not randomized
>>> 		 * when KASAN is enabled without KASAN_VMALLOC, it is even
>>> 		 * less likely that the module region gets exhausted, so we
>>> 		 * can simply omit this fallback in that case.
>>> 		 */
>>> 		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>> 				module_alloc_base + SZ_2G, GFP_KERNEL,
>>> 				PAGE_KERNEL, 0, NUMA_NO_NODE,
>>> 				__builtin_return_address(0));
>>>
>>> 	if (p && (kasan_module_alloc(p, size) < 0)) {
>>> 		vfree(p);
>>> 		return NULL;
>>> 	}
>>>
>>> 	return p;
>>> }
>>>
>>> This should be happening quite rarely. But maybe arm64's document
>>> needs some minor fixup, otherwise, it is quite confusing.
>>
>> OK, so CONFIG_KASAN_VLALLOC=y and CONFIG_ARM64_MODULE_PLTS=y, the
>> module_alloc()
>> basically returns the memory in 128MB region, but can return the memory in 2GB
>> region. (This is OK because optprobe can filter it out)
>> But CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, there is almost no chance to get
>> the memory in 128MB region.
>>
>> Hmm, for the optprobe in kernel text, maybe we can define 'optinsn_alloc_start'
>> by 'module_alloc_base - (SZ_2G - MODULES_VADDR)' and use __vmalloc_node_range()
>> to avoid this issue. But that is only for the kernel. For the modules, we may
>> always out of 128MB region.
> 
> If we can have some separate PLT entries in each module for optprobe,
> we should be able to short-jump to the PLT entry and then PLT entry
> will further long-jump to detour out of the range. That is exactly
> the duty of PLT.
> 
> Right now, arm64 has support on dynamic_ftrace by adding a
> section in module for ftrace PLT.
> arch/arm64/include/asm/module.lds.h:
> SECTIONS {
> #ifdef CONFIG_ARM64_MODULE_PLTS
> 	.plt 0 (NOLOAD) : { BYTE(0) }
> 	.init.plt 0 (NOLOAD) : { BYTE(0) }
> 	.text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
> #endif
> ...
> }
> 
> arch/arm64/kernel/module.c will initialize some PLT entries
> for ftrace:
> 
> static int module_init_ftrace_plt(const Elf_Ehdr *hdr,
> 				  const Elf_Shdr *sechdrs,
> 				  struct module *mod)
> {
> #if defined(CONFIG_ARM64_MODULE_PLTS) && defined(CONFIG_DYNAMIC_FTRACE)
> 	const Elf_Shdr *s;
> 	struct plt_entry *plts;
> 
> 	s = find_section(hdr, sechdrs, ".text.ftrace_trampoline");
> 	if (!s)
> 		return -ENOEXEC;
> 
> 	plts = (void *)s->sh_addr;
> 
> 	__init_plt(&plts[FTRACE_PLT_IDX], FTRACE_ADDR);
> 
> 	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
> 		__init_plt(&plts[FTRACE_REGS_PLT_IDX], FTRACE_REGS_ADDR);
> 
> 	mod->arch.ftrace_trampolines = plts;
> #endif
> 	return 0;
> }
> 
> Ftrace will then use those PLT entries in arch/arm64/kernel/ftrace.c:
> static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned long addr)
> {
> #ifdef CONFIG_ARM64_MODULE_PLTS
> 	struct plt_entry *plt = mod->arch.ftrace_trampolines;
> 
> 	if (addr == FTRACE_ADDR)
> 		return &plt[FTRACE_PLT_IDX];
> 	if (addr == FTRACE_REGS_ADDR &&
> 	    IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
> 		return &plt[FTRACE_REGS_PLT_IDX];
> #endif
> 	return NULL;
> }
> 
> /*
>   * Turn on the call to ftrace_caller() in instrumented function
>   */
> int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> {
> 	unsigned long pc = rec->ip;
> 	u32 old, new;
> 	long offset = (long)pc - (long)addr;
> 
> 	if (offset < -SZ_128M || offset >= SZ_128M) {
> 		struct module *mod;
> 		struct plt_entry *plt;
> 
> 		if (!IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> 			return -EINVAL;
> 
> 		/*
> 		 * On kernels that support module PLTs, the offset between the
> 		 * branch instruction and its target may legally exceed the
> 		 * range of an ordinary relative 'bl' opcode. In this case, we
> 		 * need to branch via a trampoline in the module.
> 		 *
> 		 * NOTE: __module_text_address() must be called with preemption
> 		 * disabled, but we can rely on ftrace_lock to ensure that 'mod'
> 		 * retains its validity throughout the remainder of this code.
> 		 */
> 		preempt_disable();
> 		mod = __module_text_address(pc);
> 		preempt_enable();
> 
> 		if (WARN_ON(!mod))
> 			return -EINVAL;
> 
> 		plt = get_ftrace_plt(mod, addr);
> 		if (!plt) {
> 			pr_err("ftrace: no module PLT for %ps\n", (void *)addr);
> 			return -EINVAL;
> 		}
> 
> 		addr = (unsigned long)plt;
> 	}
> 
> 	old = aarch64_insn_gen_nop();
> 	new = aarch64_insn_gen_branch_imm(pc, addr, AARCH64_INSN_BRANCH_LINK);
> 
> 	return ftrace_modify_code(pc, old, new, true);
> }
> 
> This might be the direction to go later. Anyway, "Rome wasn't built
> in a day", for this stage, we might focus on optprobe for the case
> of non-randomized module region :-).
> 
> BTW, @liuqi, if users set "nokaslr" in bootargs, will your optprobe
> always work and not fall back to normal kprobe even we remove the
> dependency on RANDOMIZED_MODULE_REGION_FULL?
> 
Hi Barry,

I do some tests on Hip08 platform, using nokaslr in booting cmdline and 
remove dependency on RANDOMIZED_MODULE_REGION_FULL, optprobe seems work.
Here is the log:

estuary:/$ uname -a
Linux (none) 5.13.0-rc4+ #37 SMP PREEMPT Mon Aug 2 08:13:37 CST 2021 
aarch64 GNU/Linux
estuary:/$ zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION
CONFIG_RANDOMIZE_MODULE_REGION_FULL=y
estuary:/$ zcat /proc/config.gz | grep OPTPROBE
CONFIG_OPTPROBES=y
CONFIG_HAVE_OPTPROBES=y
estuary:/$ cat /proc/cmdline
console=ttyAMA0,115200 earlycon=pl011,0x9000000 kpti=off nokaslr
estuary:/$ cat /sys/bus/platform/devices/hello_driver/kprobe_test
[   61.304143] do_empty returned 0 and took 200 ns to execute
[   61.304662] do_empty returned 0 and took 110 ns to execute
[   61.305196] do_empty returned 0 and took 100 ns to execute
[   61.305745] do_empty returned 0 and took 90 ns to execute
[   61.306262] do_empty returned 0 and took 90 ns to execute
[   61.306781] do_empty returned 0 and took 90 ns to execute
[   61.307286] do_empty returned 0 and took 90 ns to execute
[   61.307798] do_empty returned 0 and took 90 ns to execute
[   61.308314] do_empty returned 0 and took 90 ns to execute
[   61.308828] do_empty returned 0 and took 90 ns to execute
[   61.309323] do_empty returned 0 and took 80 ns to execute
[   61.309832] do_empty returned 0 and took 80 ns to execute
[   61.310357] do_empty returned 0 and took 80 ns to execute
[   61.310871] do_empty returned 0 and took 80 ns to execute
[   61.311361] do_empty returned 0 and took 80 ns to execute
[   61.311851] do_empty returned 0 and took 90 ns to execute
[   61.312358] do_empty returned 0 and took 90 ns to execute
[   61.312879] do_empty returned 0 and took 80 ns to execute

Thanks,
Qi

>>
>> Thank you,
>>
>> --
>> Masami Hiramatsu <mhiramat@kernel.org>
> 
> Thanks
> Barry
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-08-02  3:52 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 12:24 [PATCH] arm64: kprobe: Enable OPTPROBE for arm64 Qi Liu
2021-07-19 12:24 ` Qi Liu
2021-07-21  8:41 ` Masami Hiramatsu
2021-07-21  8:41   ` Masami Hiramatsu
2021-07-22 10:24   ` Song Bao Hua (Barry Song)
2021-07-22 10:24     ` Song Bao Hua (Barry Song)
2021-07-22 15:03     ` Masami Hiramatsu
2021-07-22 15:03       ` Masami Hiramatsu
2021-07-23  2:43       ` Song Bao Hua (Barry Song)
2021-07-23  2:43         ` Song Bao Hua (Barry Song)
2021-07-30 10:04       ` Song Bao Hua (Barry Song)
2021-07-30 10:04         ` Song Bao Hua (Barry Song)
2021-07-31  1:15         ` Masami Hiramatsu
2021-07-31  1:15           ` Masami Hiramatsu
2021-07-31 12:21           ` Song Bao Hua (Barry Song)
2021-07-31 12:21             ` Song Bao Hua (Barry Song)
2021-08-02  3:52             ` liuqi (BA) [this message]
2021-08-02  3:52               ` liuqi (BA)
2021-08-02  3:59               ` liuqi (BA)
2021-08-02  3:59                 ` liuqi (BA)
2021-08-02 12:02               ` liuqi (BA)
2021-08-02 12:02                 ` liuqi (BA)
2021-07-30  3:32   ` liuqi (BA)
2021-07-30  3:32     ` liuqi (BA)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f32fff3-6b58-583f-8e85-06ec1553d3f4@huawei.com \
    --to=liuqi115@huawei.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=robin.murphy@arm.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.