linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "liuqi (BA)" <liuqi115@huawei.com>
To: Linuxarm <linuxarm@huawei.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	"anil.s.keshavamurthy@intel.com" <anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Mon, 2 Aug 2021 11:59:51 +0800	[thread overview]
Message-ID: <06bb0cfb-4773-4d36-080c-291a5c7d8bb1@huawei.com> (raw)
In-Reply-To: <2f32fff3-6b58-583f-8e85-06ec1553d3f4@huawei.com>



On 2021/8/2 11:52, liuqi (BA) wrote:
> 
> 
> On 2021/7/31 20:21, Song Bao Hua (Barry Song) wrote:
>>
>>
>>> -----Original Message-----
>>> From: Masami Hiramatsu [mailto:mhiramat@kernel.org]
>>> Sent: Saturday, July 31, 2021 1:16 PM
>>> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
>>> Cc: liuqi (BA) <liuqi115@huawei.com>; catalin.marinas@arm.com;
>>> will@kernel.org; naveen.n.rao@linux.ibm.com; 
>>> anil.s.keshavamurthy@intel.com;
>>> davem@davemloft.net; linux-arm-kernel@lists.infradead.org; Zengtao (B)
>>> <prime.zeng@hisilicon.com>; robin.murphy@arm.com; Linuxarm
>>> <linuxarm@huawei.com>; linux-kernel@vger.kernel.org
>>> Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
>>>
>>> On Fri, 30 Jul 2021 10:04:06 +0000
>>> "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote:
>>>
>>>>>>>>
>>>>>>>> Hi Qi,
>>>>>>>>
>>>>>>>> Thanks for your effort!
>>>>>>>>
>>>>>>>> On Mon, 19 Jul 2021 20:24:17 +0800
>>>>>>>> Qi Liu <liuqi115@huawei.com> wrote:
>>>>>>>>
>>>>>>>>> This patch introduce optprobe for ARM64. In optprobe, probed
>>>>>>>>> instruction is replaced by a branch instruction to detour
>>>>>>>>> buffer. Detour buffer contains trampoline code and a call to
>>>>>>>>> optimized_callback(). optimized_callback() calls opt_pre_handler()
>>>>>>>>> to execute kprobe handler.
>>>>>>>>
>>>>>>>> OK so this will replace only one instruction.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Limitations:
>>>>>>>>> - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
>>>>>>>>> guarantee the offset between probe point and kprobe pre_handler
>>>>>>>>> is not larger than 128MiB.
>>>>>>>>
>>>>>>>> Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
>>>>>>>> allocate an intermediate trampoline area similar to arm optprobe
>>>>>>>> does.
>>>>>>>
>>>>>>> Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
>>>>>>> RANDOMIZE_BASE according to arch/arm64/Kconfig:
>>>>>>> config RANDOMIZE_BASE
>>>>>>>     bool "Randomize the address of the kernel image"
>>>>>>>     select ARM64_MODULE_PLTS if MODULES
>>>>>>>     select RELOCATABLE
>>>>>>
>>>>>> Yes, but why it is required for "RANDOMIZE_BASE"?
>>>>>> Does that imply the module call might need to use PLT in
>>>>>> some cases?
>>>>>>
>>>>>>>
>>>>>>> Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
>>>>>>> allowing RANDOMIZE_BASE via avoiding long jump according to:
>>>>>>> arch/arm64/Kconfig:
>>>>>>>
>>>>>>> config RANDOMIZE_MODULE_REGION_FULL
>>>>>>>     bool "Randomize the module region over a 4 GB range"
>>>>>>>     depends on RANDOMIZE_BASE
>>>>>>>     default y
>>>>>>>     help
>>>>>>>       Randomizes the location of the module region inside a 4 GB 
>>>>>>> window
>>>>>>>       covering the core kernel. This way, it is less likely for 
>>>>>>> modules
>>>>>>>       to leak information about the location of core kernel data 
>>>>>>> structures
>>>>>>>       but it does imply that function calls between modules and 
>>>>>>> the core
>>>>>>>       kernel will need to be resolved via veneers in the module PLT.
>>>>>>>
>>>>>>>       When this option is not set, the module region will be 
>>>>>>> randomized
>>> over
>>>>>>>       a limited range that contains the [_stext, _etext] interval 
>>>>>>> of the
>>>>>>>       core kernel, so branch relocations are always in range.
>>>>>>
>>>>>> Hmm, this dependency looks strange. If it always in range, don't 
>>>>>> we need
>>>>>> PLT for modules?
>>>>>>
>>>>>> Cataline, would you know why?
>>>>>> Maybe it's a KASLR's Kconfig issue?
>>>>>
>>>>> I actually didn't see any problem after making this change:
>>>>>
>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>> index e07e7de9ac49..6440671b72e0 100644
>>>>> --- a/arch/arm64/Kconfig
>>>>> +++ b/arch/arm64/Kconfig
>>>>> @@ -1781,7 +1781,6 @@ config RELOCATABLE
>>>>>
>>>>>   config RANDOMIZE_BASE
>>>>>          bool "Randomize the address of the kernel image"
>>>>> -       select ARM64_MODULE_PLTS if MODULES
>>>>>          select RELOCATABLE
>>>>>          help
>>>>>            Randomizes the virtual address at which the kernel image is
>>>>> @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
>>>>>   config RANDOMIZE_MODULE_REGION_FULL
>>>>>          bool "Randomize the module region over a 4 GB range"
>>>>>          depends on RANDOMIZE_BASE
>>>>> +       select ARM64_MODULE_PLTS if MODULES
>>>>>          default y
>>>>>          help
>>>>>            Randomizes the location of the module region inside a 4 
>>>>> GB window
>>>>>
>>>>> and having this config:
>>>>> # zcat /proc/config.gz | grep RANDOMIZE_BASE
>>>>> CONFIG_RANDOMIZE_BASE=y
>>>>>
>>>>> # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
>>>>> # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
>>>>>
>>>>> # zcat /proc/config.gz | grep ARM64_MODULE_PLTS
>>>>> # CONFIG_ARM64_MODULE_PLTS is not set
>>>>>
>>>>> Modules work all good:
>>>>> # lsmod
>>>>> Module                  Size  Used by
>>>>> btrfs                1355776  0
>>>>> blake2b_generic        20480  0
>>>>> libcrc32c              16384  1 btrfs
>>>>> xor                    20480  1 btrfs
>>>>> xor_neon               16384  1 xor
>>>>> zstd_compress         163840  1 btrfs
>>>>> raid6_pq              110592  1 btrfs
>>>>> ctr                    16384  0
>>>>> md5                    16384  0
>>>>> ip_tunnel              32768  0
>>>>> ipv6                  442368  28
>>>>>
>>>>>
>>>>> I am not quite sure if there is a corner case. If no,
>>>>> I would think the kconfig might be some improper.
>>>>
>>>> The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL
>>>> is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when
>>>> we can't get memory from the 128MB area in case the area is exhausted,
>>>> we will fall back in module_alloc() to a 2GB area as long as either
>>>> of the below two conditions is met:
>>>>
>>>> 1. KASAN is not enabled
>>>> 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled.
>>>>
>>>> void *module_alloc(unsigned long size)
>>>> {
>>>>     u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
>>>>     gfp_t gfp_mask = GFP_KERNEL;
>>>>     void *p;
>>>>
>>>>     /* Silence the initial allocation */
>>>>     if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
>>>>         gfp_mask |= __GFP_NOWARN;
>>>>
>>>>     if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
>>>>         IS_ENABLED(CONFIG_KASAN_SW_TAGS))
>>>>         /* don't exceed the static module region - see below */
>>>>         module_alloc_end = MODULES_END;
>>>>
>>>>     p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>>>                 module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
>>>>                 NUMA_NO_NODE, __builtin_return_address(0));
>>>>
>>>>     if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
>>>>         (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
>>>>          (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
>>>>           !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
>>>>         /*
>>>>          * KASAN without KASAN_VMALLOC can only deal with module
>>>>          * allocations being served from the reserved module region,
>>>>          * since the remainder of the vmalloc region is already
>>>>          * backed by zero shadow pages, and punching holes into it
>>>>          * is non-trivial. Since the module region is not randomized
>>>>          * when KASAN is enabled without KASAN_VMALLOC, it is even
>>>>          * less likely that the module region gets exhausted, so we
>>>>          * can simply omit this fallback in that case.
>>>>          */
>>>>         p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
>>>>                 module_alloc_base + SZ_2G, GFP_KERNEL,
>>>>                 PAGE_KERNEL, 0, NUMA_NO_NODE,
>>>>                 __builtin_return_address(0));
>>>>
>>>>     if (p && (kasan_module_alloc(p, size) < 0)) {
>>>>         vfree(p);
>>>>         return NULL;
>>>>     }
>>>>
>>>>     return p;
>>>> }
>>>>
>>>> This should be happening quite rarely. But maybe arm64's document
>>>> needs some minor fixup, otherwise, it is quite confusing.
>>>
>>> OK, so CONFIG_KASAN_VLALLOC=y and CONFIG_ARM64_MODULE_PLTS=y, the
>>> module_alloc()
>>> basically returns the memory in 128MB region, but can return the 
>>> memory in 2GB
>>> region. (This is OK because optprobe can filter it out)
>>> But CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, there is almost no chance 
>>> to get
>>> the memory in 128MB region.
>>>
>>> Hmm, for the optprobe in kernel text, maybe we can define 
>>> 'optinsn_alloc_start'
>>> by 'module_alloc_base - (SZ_2G - MODULES_VADDR)' and use 
>>> __vmalloc_node_range()
>>> to avoid this issue. But that is only for the kernel. For the 
>>> modules, we may
>>> always out of 128MB region.
>>
>> If we can have some separate PLT entries in each module for optprobe,
>> we should be able to short-jump to the PLT entry and then PLT entry
>> will further long-jump to detour out of the range. That is exactly
>> the duty of PLT.
>>
>> Right now, arm64 has support on dynamic_ftrace by adding a
>> section in module for ftrace PLT.
>> arch/arm64/include/asm/module.lds.h:
>> SECTIONS {
>> #ifdef CONFIG_ARM64_MODULE_PLTS
>>     .plt 0 (NOLOAD) : { BYTE(0) }
>>     .init.plt 0 (NOLOAD) : { BYTE(0) }
>>     .text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
>> #endif
>> ...
>> }
>>
>> arch/arm64/kernel/module.c will initialize some PLT entries
>> for ftrace:
>>
>> static int module_init_ftrace_plt(const Elf_Ehdr *hdr,
>>                   const Elf_Shdr *sechdrs,
>>                   struct module *mod)
>> {
>> #if defined(CONFIG_ARM64_MODULE_PLTS) && defined(CONFIG_DYNAMIC_FTRACE)
>>     const Elf_Shdr *s;
>>     struct plt_entry *plts;
>>
>>     s = find_section(hdr, sechdrs, ".text.ftrace_trampoline");
>>     if (!s)
>>         return -ENOEXEC;
>>
>>     plts = (void *)s->sh_addr;
>>
>>     __init_plt(&plts[FTRACE_PLT_IDX], FTRACE_ADDR);
>>
>>     if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
>>         __init_plt(&plts[FTRACE_REGS_PLT_IDX], FTRACE_REGS_ADDR);
>>
>>     mod->arch.ftrace_trampolines = plts;
>> #endif
>>     return 0;
>> }
>>
>> Ftrace will then use those PLT entries in arch/arm64/kernel/ftrace.c:
>> static struct plt_entry *get_ftrace_plt(struct module *mod, unsigned 
>> long addr)
>> {
>> #ifdef CONFIG_ARM64_MODULE_PLTS
>>     struct plt_entry *plt = mod->arch.ftrace_trampolines;
>>
>>     if (addr == FTRACE_ADDR)
>>         return &plt[FTRACE_PLT_IDX];
>>     if (addr == FTRACE_REGS_ADDR &&
>>         IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS))
>>         return &plt[FTRACE_REGS_PLT_IDX];
>> #endif
>>     return NULL;
>> }
>>
>> /*
>>   * Turn on the call to ftrace_caller() in instrumented function
>>   */
>> int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>> {
>>     unsigned long pc = rec->ip;
>>     u32 old, new;
>>     long offset = (long)pc - (long)addr;
>>
>>     if (offset < -SZ_128M || offset >= SZ_128M) {
>>         struct module *mod;
>>         struct plt_entry *plt;
>>
>>         if (!IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
>>             return -EINVAL;
>>
>>         /*
>>          * On kernels that support module PLTs, the offset between the
>>          * branch instruction and its target may legally exceed the
>>          * range of an ordinary relative 'bl' opcode. In this case, we
>>          * need to branch via a trampoline in the module.
>>          *
>>          * NOTE: __module_text_address() must be called with preemption
>>          * disabled, but we can rely on ftrace_lock to ensure that 'mod'
>>          * retains its validity throughout the remainder of this code.
>>          */
>>         preempt_disable();
>>         mod = __module_text_address(pc);
>>         preempt_enable();
>>
>>         if (WARN_ON(!mod))
>>             return -EINVAL;
>>
>>         plt = get_ftrace_plt(mod, addr);
>>         if (!plt) {
>>             pr_err("ftrace: no module PLT for %ps\n", (void *)addr);
>>             return -EINVAL;
>>         }
>>
>>         addr = (unsigned long)plt;
>>     }
>>
>>     old = aarch64_insn_gen_nop();
>>     new = aarch64_insn_gen_branch_imm(pc, addr, 
>> AARCH64_INSN_BRANCH_LINK);
>>
>>     return ftrace_modify_code(pc, old, new, true);
>> }
>>
>> This might be the direction to go later. Anyway, "Rome wasn't built
>> in a day", for this stage, we might focus on optprobe for the case
>> of non-randomized module region :-).
>>
>> BTW, @liuqi, if users set "nokaslr" in bootargs, will your optprobe
>> always work and not fall back to normal kprobe even we remove the
>> dependency on RANDOMIZED_MODULE_REGION_FULL?
>>
> Hi Barry,
> 
> I do some tests on Hip08 platform, using nokaslr in booting cmdline and 
> remove dependency on RANDOMIZED_MODULE_REGION_FULL, optprobe seems work.
> Here is the log:
> 
> estuary:/$ uname -a
> Linux (none) 5.13.0-rc4+ #37 SMP PREEMPT Mon Aug 2 08:13:37 CST 2021 
> aarch64 GNU/Linux
> estuary:/$ zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION
> CONFIG_RANDOMIZE_MODULE_REGION_FULL=y
> estuary:/$ zcat /proc/config.gz | grep OPTPROBE
> CONFIG_OPTPROBES=y
> CONFIG_HAVE_OPTPROBES=y
> estuary:/$ cat /proc/cmdline
> console=ttyAMA0,115200 earlycon=pl011,0x9000000 kpti=off nokaslr
> estuary:/$ cat /sys/bus/platform/devices/hello_driver/kprobe_test
> [   61.304143] do_empty returned 0 and took 200 ns to execute
> [   61.304662] do_empty returned 0 and took 110 ns to execute
> [   61.305196] do_empty returned 0 and took 100 ns to execute
> [   61.305745] do_empty returned 0 and took 90 ns to execute
> [   61.306262] do_empty returned 0 and took 90 ns to execute
> [   61.306781] do_empty returned 0 and took 90 ns to execute
> [   61.307286] do_empty returned 0 and took 90 ns to execute
> [   61.307798] do_empty returned 0 and took 90 ns to execute
> [   61.308314] do_empty returned 0 and took 90 ns to execute
> [   61.308828] do_empty returned 0 and took 90 ns to execute
> [   61.309323] do_empty returned 0 and took 80 ns to execute
> [   61.309832] do_empty returned 0 and took 80 ns to execute
> [   61.310357] do_empty returned 0 and took 80 ns to execute
> [   61.310871] do_empty returned 0 and took 80 ns to execute
> [   61.311361] do_empty returned 0 and took 80 ns to execute
> [   61.311851] do_empty returned 0 and took 90 ns to execute
> [   61.312358] do_empty returned 0 and took 90 ns to execute
> [   61.312879] do_empty returned 0 and took 80 ns to execute
> 
> Thanks,
> Qi
> 

This situation is: function to probe is builtin and pre_handler is build 
as module. I'll try to test the other three situation latter.

Thanks,
Qi
>>>
>>> Thank you,
>>>
>>> -- 
>>> Masami Hiramatsu <mhiramat@kernel.org>
>>
>> Thanks
>> Barry
>> .
>>
> 
> .


  reply	other threads:[~2021-08-02  3:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 12:24 Qi Liu
2021-07-21  8:41 ` Masami Hiramatsu
2021-07-22 10:24   ` Song Bao Hua (Barry Song)
2021-07-22 15:03     ` Masami Hiramatsu
2021-07-23  2:43       ` Song Bao Hua (Barry Song)
2021-07-30 10:04       ` Song Bao Hua (Barry Song)
2021-07-31  1:15         ` Masami Hiramatsu
2021-07-31 12:21           ` Song Bao Hua (Barry Song)
2021-08-02  3:52             ` liuqi (BA)
2021-08-02  3:59               ` liuqi (BA) [this message]
2021-08-02 12:02               ` liuqi (BA)
2021-07-30  3:32   ` liuqi (BA)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06bb0cfb-4773-4d36-080c-291a5c7d8bb1@huawei.com \
    --to=liuqi115@huawei.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=robin.murphy@arm.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=will@kernel.org \
    --subject='Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).