linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
Cc: "liuqi (BA)" <liuqi115@huawei.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"naveen.n.rao@linux.ibm.com" <naveen.n.rao@linux.ibm.com>,
	"anil.s.keshavamurthy@intel.com" <anil.s.keshavamurthy@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Zengtao (B)" <prime.zeng@hisilicon.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	Linuxarm <linuxarm@huawei.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64
Date: Sat, 31 Jul 2021 10:15:37 +0900	[thread overview]
Message-ID: <20210731101537.a64063d84e86d7910bd58a96@kernel.org> (raw)
In-Reply-To: <e63531dc8b7040219761e72fb9b1e74a@hisilicon.com>

On Fri, 30 Jul 2021 10:04:06 +0000
"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com> wrote:

> > > > >
> > > > > Hi Qi,
> > > > >
> > > > > Thanks for your effort!
> > > > >
> > > > > On Mon, 19 Jul 2021 20:24:17 +0800
> > > > > Qi Liu <liuqi115@huawei.com> wrote:
> > > > >
> > > > > > This patch introduce optprobe for ARM64. In optprobe, probed
> > > > > > instruction is replaced by a branch instruction to detour
> > > > > > buffer. Detour buffer contains trampoline code and a call to
> > > > > > optimized_callback(). optimized_callback() calls opt_pre_handler()
> > > > > > to execute kprobe handler.
> > > > >
> > > > > OK so this will replace only one instruction.
> > > > >
> > > > > >
> > > > > > Limitations:
> > > > > > - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
> > > > > > guarantee the offset between probe point and kprobe pre_handler
> > > > > > is not larger than 128MiB.
> > > > >
> > > > > Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
> > > > > allocate an intermediate trampoline area similar to arm optprobe
> > > > > does.
> > > >
> > > > Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
> > > > RANDOMIZE_BASE according to arch/arm64/Kconfig:
> > > > config RANDOMIZE_BASE
> > > > 	bool "Randomize the address of the kernel image"
> > > > 	select ARM64_MODULE_PLTS if MODULES
> > > > 	select RELOCATABLE
> > >
> > > Yes, but why it is required for "RANDOMIZE_BASE"?
> > > Does that imply the module call might need to use PLT in
> > > some cases?
> > >
> > > >
> > > > Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
> > > > allowing RANDOMIZE_BASE via avoiding long jump according to:
> > > > arch/arm64/Kconfig:
> > > >
> > > > config RANDOMIZE_MODULE_REGION_FULL
> > > > 	bool "Randomize the module region over a 4 GB range"
> > > > 	depends on RANDOMIZE_BASE
> > > > 	default y
> > > > 	help
> > > > 	  Randomizes the location of the module region inside a 4 GB window
> > > > 	  covering the core kernel. This way, it is less likely for modules
> > > > 	  to leak information about the location of core kernel data structures
> > > > 	  but it does imply that function calls between modules and the core
> > > > 	  kernel will need to be resolved via veneers in the module PLT.
> > > >
> > > > 	  When this option is not set, the module region will be randomized over
> > > > 	  a limited range that contains the [_stext, _etext] interval of the
> > > > 	  core kernel, so branch relocations are always in range.
> > >
> > > Hmm, this dependency looks strange. If it always in range, don't we need
> > > PLT for modules?
> > >
> > > Cataline, would you know why?
> > > Maybe it's a KASLR's Kconfig issue?
> > 
> > I actually didn't see any problem after making this change:
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e07e7de9ac49..6440671b72e0 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1781,7 +1781,6 @@ config RELOCATABLE
> > 
> >  config RANDOMIZE_BASE
> >         bool "Randomize the address of the kernel image"
> > -       select ARM64_MODULE_PLTS if MODULES
> >         select RELOCATABLE
> >         help
> >           Randomizes the virtual address at which the kernel image is
> > @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
> >  config RANDOMIZE_MODULE_REGION_FULL
> >         bool "Randomize the module region over a 4 GB range"
> >         depends on RANDOMIZE_BASE
> > +       select ARM64_MODULE_PLTS if MODULES
> >         default y
> >         help
> >           Randomizes the location of the module region inside a 4 GB window
> > 
> > and having this config:
> > # zcat /proc/config.gz | grep RANDOMIZE_BASE
> > CONFIG_RANDOMIZE_BASE=y
> > 
> > # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
> > # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
> > 
> > # zcat /proc/config.gz | grep ARM64_MODULE_PLTS
> > # CONFIG_ARM64_MODULE_PLTS is not set
> > 
> > Modules work all good:
> > # lsmod
> > Module                  Size  Used by
> > btrfs                1355776  0
> > blake2b_generic        20480  0
> > libcrc32c              16384  1 btrfs
> > xor                    20480  1 btrfs
> > xor_neon               16384  1 xor
> > zstd_compress         163840  1 btrfs
> > raid6_pq              110592  1 btrfs
> > ctr                    16384  0
> > md5                    16384  0
> > ip_tunnel              32768  0
> > ipv6                  442368  28
> > 
> > 
> > I am not quite sure if there is a corner case. If no,
> > I would think the kconfig might be some improper.
> 
> The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL
> is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when
> we can't get memory from the 128MB area in case the area is exhausted,
> we will fall back in module_alloc() to a 2GB area as long as either
> of the below two conditions is met:
> 
> 1. KASAN is not enabled
> 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled.
> 
> void *module_alloc(unsigned long size)
> {
> 	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> 	gfp_t gfp_mask = GFP_KERNEL;
> 	void *p;
> 
> 	/* Silence the initial allocation */
> 	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> 		gfp_mask |= __GFP_NOWARN;
> 
> 	if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
> 	    IS_ENABLED(CONFIG_KASAN_SW_TAGS))
> 		/* don't exceed the static module region - see below */
> 		module_alloc_end = MODULES_END;
> 
> 	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> 				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> 				NUMA_NO_NODE, __builtin_return_address(0));
> 
> 	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> 	    (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
> 	     (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
> 	      !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
> 		/*
> 		 * KASAN without KASAN_VMALLOC can only deal with module
> 		 * allocations being served from the reserved module region,
> 		 * since the remainder of the vmalloc region is already
> 		 * backed by zero shadow pages, and punching holes into it
> 		 * is non-trivial. Since the module region is not randomized
> 		 * when KASAN is enabled without KASAN_VMALLOC, it is even
> 		 * less likely that the module region gets exhausted, so we
> 		 * can simply omit this fallback in that case.
> 		 */
> 		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> 				module_alloc_base + SZ_2G, GFP_KERNEL,
> 				PAGE_KERNEL, 0, NUMA_NO_NODE,
> 				__builtin_return_address(0));
> 
> 	if (p && (kasan_module_alloc(p, size) < 0)) {
> 		vfree(p);
> 		return NULL;
> 	}
> 
> 	return p;
> }
> 
> This should be happening quite rarely. But maybe arm64's document
> needs some minor fixup, otherwise, it is quite confusing.

OK, so CONFIG_KASAN_VLALLOC=y and CONFIG_ARM64_MODULE_PLTS=y, the module_alloc()
basically returns the memory in 128MB region, but can return the memory in 2GB
region. (This is OK because optprobe can filter it out)
But CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, there is almost no chance to get
the memory in 128MB region.

Hmm, for the optprobe in kernel text, maybe we can define 'optinsn_alloc_start'
by 'module_alloc_base - (SZ_2G - MODULES_VADDR)' and use __vmalloc_node_range()
to avoid this issue. But that is only for the kernel. For the modules, we may
always out of 128MB region.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

  reply	other threads:[~2021-07-31  1:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-19 12:24 Qi Liu
2021-07-21  8:41 ` Masami Hiramatsu
2021-07-22 10:24   ` Song Bao Hua (Barry Song)
2021-07-22 15:03     ` Masami Hiramatsu
2021-07-23  2:43       ` Song Bao Hua (Barry Song)
2021-07-30 10:04       ` Song Bao Hua (Barry Song)
2021-07-31  1:15         ` Masami Hiramatsu [this message]
2021-07-31 12:21           ` Song Bao Hua (Barry Song)
2021-08-02  3:52             ` liuqi (BA)
2021-08-02  3:59               ` liuqi (BA)
2021-08-02 12:02               ` liuqi (BA)
2021-07-30  3:32   ` liuqi (BA)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210731101537.a64063d84e86d7910bd58a96@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liuqi115@huawei.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=robin.murphy@arm.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=will@kernel.org \
    --subject='Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).