linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes
@ 2024-01-27  4:41 Jinghao Jia
  2024-01-27  4:41 ` [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD Jinghao Jia
  2024-01-27  4:41 ` [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5 Jinghao Jia
  0 siblings, 2 replies; 13+ messages in thread
From: Jinghao Jia @ 2024-01-27  4:41 UTC (permalink / raw)
  To: Masami Hiramatsu (Google),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra
  Cc: linux-trace-kernel, linux-kernel, Jinghao Jia

Hi everyone,

This patch set makes the following 2 changes:

- It adds an exception opcode detector to prevent kprobing on INTs and UDs.
  These opcodes serves special purposes in the kernel and kprobing them
  will also cause the stack trace to be polluted by the copy buffer
  address. This is suggested by Masami.

- At the same time, this patch set also boosts more opcodes from the group
  2/3/4/5. The newly boosted opcodes are all arithmetic instructions with
  semantics that are easy to reason about, and therefore, they are able to
  be boosted and executed out-of-line. These instructions were not boosted
  previously because they use opcode extensions that are not handled by the
  kernel. But now with the instruction decoder they can be easily handled.
  Boosting (and further jump optimizing) these instructions leads to a 10x
  performance gain for a single probe on QEMU.

Jinghao Jia (2):
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: boost more instructions from grp2/3/4/5

 arch/x86/kernel/kprobes/core.c | 54 ++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 13 deletions(-)

--
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-27  4:41 [RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes Jinghao Jia
@ 2024-01-27  4:41 ` Jinghao Jia
  2024-01-27 19:47   ` Xin Li
  2024-01-28  1:19   ` Masami Hiramatsu
  2024-01-27  4:41 ` [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5 Jinghao Jia
  1 sibling, 2 replies; 13+ messages in thread
From: Jinghao Jia @ 2024-01-27  4:41 UTC (permalink / raw)
  To: Masami Hiramatsu (Google),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra
  Cc: linux-trace-kernel, linux-kernel, Jinghao Jia

Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
involved in LLVM-KCFI instrumentation. At the same time, attaching
kprobes on these instructions (particularly UDs) will pollute the stack
trace dumped in the kernel ring buffer, since the exception is triggered
in the copy buffer rather than the original location.

Check for INTs and UDs in can_probe and reject any kprobes trying to
attach to these instructions.

Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
---
 arch/x86/kernel/kprobes/core.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index e8babebad7b8..792b38d22126 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
 	return __recover_probed_insn(buf, addr);
 }
 
+static inline int is_exception_insn(struct insn *insn)
+{
+	if (insn->opcode.bytes[0] == 0x0f) {
+		/* UD0 / UD1 / UD2 */
+		return insn->opcode.bytes[1] == 0xff ||
+		       insn->opcode.bytes[1] == 0xb9 ||
+		       insn->opcode.bytes[1] == 0x0b;
+	} else {
+		/* INT3 / INT n / INTO / INT1 */
+		return insn->opcode.bytes[0] == 0xcc ||
+		       insn->opcode.bytes[0] == 0xcd ||
+		       insn->opcode.bytes[0] == 0xce ||
+		       insn->opcode.bytes[0] == 0xf1;
+	}
+}
+
 /* Check if paddr is at an instruction boundary */
 static int can_probe(unsigned long paddr)
 {
@@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
 #endif
 		addr += insn.length;
 	}
+	__addr = recover_probed_instruction(buf, addr);
+	if (!__addr)
+		return 0;
+
+	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
+		return 0;
+
+	if (is_exception_insn(&insn))
+		return 0;
+
 	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
 		/*
 		 * The compiler generates the following instruction sequence
@@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
 		 * Also, these movl and addl are used for showing expected
 		 * type. So those must not be touched.
 		 */
-		__addr = recover_probed_instruction(buf, addr);
-		if (!__addr)
-			return 0;
-
-		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
-			return 0;
-
 		if (insn.opcode.value == 0xBA)
 			offset = 12;
 		else if (insn.opcode.value == 0x3)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5
  2024-01-27  4:41 [RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes Jinghao Jia
  2024-01-27  4:41 ` [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD Jinghao Jia
@ 2024-01-27  4:41 ` Jinghao Jia
  2024-01-28  2:22   ` Masami Hiramatsu
  1 sibling, 1 reply; 13+ messages in thread
From: Jinghao Jia @ 2024-01-27  4:41 UTC (permalink / raw)
  To: Masami Hiramatsu (Google),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra
  Cc: linux-trace-kernel, linux-kernel, Jinghao Jia

With the instruction decoder, we are now able to decode and recognize
instructions with opcode extensions. There are more instructions in
these groups that can be boosted:

Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
Group 4: INC, DEC (byte operation)
Group 5: INC, DEC (word/doubleword/quadword operation)

These instructions are not boosted previously because there are reserved
opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
unmapped. As a result, kprobes attached to them requires two int3 traps
as being non-boostable also prevents jump-optimization.

Some simple tests on QEMU show that after boosting and jump-optimization
a single kprobe on these instructions with an empty pre-handler runs 10x
faster (~1000 cycles vs. ~100 cycles).

Since these instructions are mostly ALU operations and do not touch
special registers like RIP, let's boost them so that we get the
performance benefit.

Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
---
 arch/x86/kernel/kprobes/core.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 792b38d22126..f847bd9cc91b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
 	case 0x62:		/* bound */
 	case 0x70 ... 0x7f:	/* Conditional jumps */
 	case 0x9a:		/* Call far */
-	case 0xc0 ... 0xc1:	/* Grp2 */
 	case 0xcc ... 0xce:	/* software exceptions */
-	case 0xd0 ... 0xd3:	/* Grp2 */
 	case 0xd6:		/* (UD) */
 	case 0xd8 ... 0xdf:	/* ESC */
 	case 0xe0 ... 0xe3:	/* LOOP*, JCXZ */
 	case 0xe8 ... 0xe9:	/* near Call, JMP */
 	case 0xeb:		/* Short JMP */
 	case 0xf0 ... 0xf4:	/* LOCK/REP, HLT */
-	case 0xf6 ... 0xf7:	/* Grp3 */
-	case 0xfe:		/* Grp4 */
 		/* ... are not boostable */
 		return 0;
+	case 0xc0 ... 0xc1:	/* Grp2 */
+	case 0xd0 ... 0xd3:	/* Grp2 */
+		/* ModR/M nnn == 110 is reserved */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
+	case 0xf6 ... 0xf7:	/* Grp3 */
+		/* ModR/M nnn == 001 is reserved */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
+	case 0xfe:		/* Grp4 */
+		/* Only inc and dec are boostable */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
 	case 0xff:		/* Grp5 */
-		/* Only indirect jmp is boostable */
-		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
+		/* Only inc, dec, and indirect jmp are boostable */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
+		       X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
 	default:
 		return 1;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-27  4:41 ` [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD Jinghao Jia
@ 2024-01-27 19:47   ` Xin Li
  2024-01-28 21:09     ` Jinghao Jia
  2024-01-28  1:19   ` Masami Hiramatsu
  1 sibling, 1 reply; 13+ messages in thread
From: Xin Li @ 2024-01-27 19:47 UTC (permalink / raw)
  To: Jinghao Jia, Masami Hiramatsu (Google),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra
  Cc: linux-trace-kernel, linux-kernel

On 1/26/2024 8:41 PM, Jinghao Jia wrote:
> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
> involved in LLVM-KCFI instrumentation. At the same time, attaching
> kprobes on these instructions (particularly UDs) will pollute the stack
> trace dumped in the kernel ring buffer, since the exception is triggered
> in the copy buffer rather than the original location.
> 
> Check for INTs and UDs in can_probe and reject any kprobes trying to
> attach to these instructions.
> 
> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
> ---
>   arch/x86/kernel/kprobes/core.c | 33 ++++++++++++++++++++++++++-------
>   1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index e8babebad7b8..792b38d22126 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>   	return __recover_probed_insn(buf, addr);
>   }
>   
> +static inline int is_exception_insn(struct insn *insn)

s/int/bool

> +{
> +	if (insn->opcode.bytes[0] == 0x0f) {
> +		/* UD0 / UD1 / UD2 */
> +		return insn->opcode.bytes[1] == 0xff ||
> +		       insn->opcode.bytes[1] == 0xb9 ||
> +		       insn->opcode.bytes[1] == 0x0b;
> +	} else {
> +		/* INT3 / INT n / INTO / INT1 */
> +		return insn->opcode.bytes[0] == 0xcc ||
> +		       insn->opcode.bytes[0] == 0xcd ||
> +		       insn->opcode.bytes[0] == 0xce ||
> +		       insn->opcode.bytes[0] == 0xf1;
> +	}
> +}
> +
>   /* Check if paddr is at an instruction boundary */
>   static int can_probe(unsigned long paddr)
>   {
> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>   #endif
>   		addr += insn.length;
>   	}
> +	__addr = recover_probed_instruction(buf, addr);
> +	if (!__addr)
> +		return 0;
> +
> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> +		return 0;
> +
> +	if (is_exception_insn(&insn))
> +		return 0;
> +
>   	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>   		/*
>   		 * The compiler generates the following instruction sequence
> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>   		 * Also, these movl and addl are used for showing expected
>   		 * type. So those must not be touched.
>   		 */
> -		__addr = recover_probed_instruction(buf, addr);
> -		if (!__addr)
> -			return 0;
> -
> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> -			return 0;
> -
>   		if (insn.opcode.value == 0xBA)
>   			offset = 12;
>   		else if (insn.opcode.value == 0x3)

-- 
Thanks!
     Xin


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-27  4:41 ` [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD Jinghao Jia
  2024-01-27 19:47   ` Xin Li
@ 2024-01-28  1:19   ` Masami Hiramatsu
  2024-01-28 21:25     ` Jinghao Jia
  1 sibling, 1 reply; 13+ messages in thread
From: Masami Hiramatsu @ 2024-01-28  1:19 UTC (permalink / raw)
  To: Jinghao Jia
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel

On Fri, 26 Jan 2024 22:41:23 -0600
Jinghao Jia <jinghao7@illinois.edu> wrote:

> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
> involved in LLVM-KCFI instrumentation. At the same time, attaching
> kprobes on these instructions (particularly UDs) will pollute the stack
> trace dumped in the kernel ring buffer, since the exception is triggered
> in the copy buffer rather than the original location.
> 
> Check for INTs and UDs in can_probe and reject any kprobes trying to
> attach to these instructions.
> 

Thanks for implement this check!


> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
> ---
>  arch/x86/kernel/kprobes/core.c | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index e8babebad7b8..792b38d22126 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>  	return __recover_probed_insn(buf, addr);
>  }
>  
> +static inline int is_exception_insn(struct insn *insn)
> +{
> +	if (insn->opcode.bytes[0] == 0x0f) {
> +		/* UD0 / UD1 / UD2 */
> +		return insn->opcode.bytes[1] == 0xff ||
> +		       insn->opcode.bytes[1] == 0xb9 ||
> +		       insn->opcode.bytes[1] == 0x0b;
> +	} else {

If "else" block just return, you don't need this "else".

bool func()
{
	if (cond)
		return ...

	return ...
}

Is preferrable because this puts "return val" always at the end of non-void
function.

> +		/* INT3 / INT n / INTO / INT1 */
> +		return insn->opcode.bytes[0] == 0xcc ||
> +		       insn->opcode.bytes[0] == 0xcd ||
> +		       insn->opcode.bytes[0] == 0xce ||
> +		       insn->opcode.bytes[0] == 0xf1;
> +	}
> +}
> +
>  /* Check if paddr is at an instruction boundary */
>  static int can_probe(unsigned long paddr)
>  {
> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>  #endif
>  		addr += insn.length;
>  	}
> +	__addr = recover_probed_instruction(buf, addr);
> +	if (!__addr)
> +		return 0;
> +
> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> +		return 0;
> +
> +	if (is_exception_insn(&insn))
> +		return 0;
> +

Please don't put this outside of decoding loop. You should put these in
the loop which decodes the instruction from the beginning of the function.
Since the x86 instrcution is variable length, can_probe() needs to check
whether that the address is instruction boundary and decodable.

Thank you,

>  	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>  		/*
>  		 * The compiler generates the following instruction sequence
> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>  		 * Also, these movl and addl are used for showing expected
>  		 * type. So those must not be touched.
>  		 */
> -		__addr = recover_probed_instruction(buf, addr);
> -		if (!__addr)
> -			return 0;
> -
> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> -			return 0;
> -
>  		if (insn.opcode.value == 0xBA)
>  			offset = 12;
>  		else if (insn.opcode.value == 0x3)
> -- 
> 2.43.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5
  2024-01-27  4:41 ` [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5 Jinghao Jia
@ 2024-01-28  2:22   ` Masami Hiramatsu
  2024-01-28 21:30     ` Jinghao Jia
  0 siblings, 1 reply; 13+ messages in thread
From: Masami Hiramatsu @ 2024-01-28  2:22 UTC (permalink / raw)
  To: Jinghao Jia
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel

On Fri, 26 Jan 2024 22:41:24 -0600
Jinghao Jia <jinghao7@illinois.edu> wrote:

> With the instruction decoder, we are now able to decode and recognize
> instructions with opcode extensions. There are more instructions in
> these groups that can be boosted:
> 
> Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
> Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
> Group 4: INC, DEC (byte operation)
> Group 5: INC, DEC (word/doubleword/quadword operation)
> 
> These instructions are not boosted previously because there are reserved
> opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
> unmapped. As a result, kprobes attached to them requires two int3 traps
> as being non-boostable also prevents jump-optimization.
> 
> Some simple tests on QEMU show that after boosting and jump-optimization
> a single kprobe on these instructions with an empty pre-handler runs 10x
> faster (~1000 cycles vs. ~100 cycles).
> 
> Since these instructions are mostly ALU operations and do not touch
> special registers like RIP, let's boost them so that we get the
> performance benefit.
> 

As far as we check the ModR/M byte, I think we can safely run these
instructions on trampoline buffer without adjusting results (this
means it can be "boosted").
I just have a minor comment, but basically this looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
> ---
>  arch/x86/kernel/kprobes/core.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 792b38d22126..f847bd9cc91b 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
>  	case 0x62:		/* bound */
>  	case 0x70 ... 0x7f:	/* Conditional jumps */
>  	case 0x9a:		/* Call far */
> -	case 0xc0 ... 0xc1:	/* Grp2 */
>  	case 0xcc ... 0xce:	/* software exceptions */
> -	case 0xd0 ... 0xd3:	/* Grp2 */
>  	case 0xd6:		/* (UD) */
>  	case 0xd8 ... 0xdf:	/* ESC */
>  	case 0xe0 ... 0xe3:	/* LOOP*, JCXZ */
>  	case 0xe8 ... 0xe9:	/* near Call, JMP */
>  	case 0xeb:		/* Short JMP */
>  	case 0xf0 ... 0xf4:	/* LOCK/REP, HLT */
> -	case 0xf6 ... 0xf7:	/* Grp3 */
> -	case 0xfe:		/* Grp4 */
>  		/* ... are not boostable */
>  		return 0;
> +	case 0xc0 ... 0xc1:	/* Grp2 */
> +	case 0xd0 ... 0xd3:	/* Grp2 */
> +		/* ModR/M nnn == 110 is reserved */
> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
> +	case 0xf6 ... 0xf7:	/* Grp3 */
> +		/* ModR/M nnn == 001 is reserved */

		/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */

> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
> +	case 0xfe:		/* Grp4 */
> +		/* Only inc and dec are boostable */
> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
>  	case 0xff:		/* Grp5 */
> -		/* Only indirect jmp is boostable */
> -		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
> +		/* Only inc, dec, and indirect jmp are boostable */
> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>  	default:
>  		return 1;
>  	}
> -- 
> 2.43.0
> 

Thamnk you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-27 19:47   ` Xin Li
@ 2024-01-28 21:09     ` Jinghao Jia
  0 siblings, 0 replies; 13+ messages in thread
From: Jinghao Jia @ 2024-01-28 21:09 UTC (permalink / raw)
  To: Xin Li, Masami Hiramatsu (Google),
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra
  Cc: linux-trace-kernel, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3463 bytes --]



On 1/27/24 13:47, Xin Li wrote:
> On 1/26/2024 8:41 PM, Jinghao Jia wrote:
>> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
>> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
>> involved in LLVM-KCFI instrumentation. At the same time, attaching
>> kprobes on these instructions (particularly UDs) will pollute the stack
>> trace dumped in the kernel ring buffer, since the exception is triggered
>> in the copy buffer rather than the original location.
>>
>> Check for INTs and UDs in can_probe and reject any kprobes trying to
>> attach to these instructions.
>>
>> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
>> ---
>>   arch/x86/kernel/kprobes/core.c | 33 ++++++++++++++++++++++++++-------
>>   1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index e8babebad7b8..792b38d22126 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>>       return __recover_probed_insn(buf, addr);
>>   }
>>   +static inline int is_exception_insn(struct insn *insn)
> 
> s/int/bool
> 

Oh yes, the return type should be bool. Thanks for pointing out!

--Jinghao

>> +{
>> +    if (insn->opcode.bytes[0] == 0x0f) {
>> +        /* UD0 / UD1 / UD2 */
>> +        return insn->opcode.bytes[1] == 0xff ||
>> +               insn->opcode.bytes[1] == 0xb9 ||
>> +               insn->opcode.bytes[1] == 0x0b;
>> +    } else {
>> +        /* INT3 / INT n / INTO / INT1 */
>> +        return insn->opcode.bytes[0] == 0xcc ||
>> +               insn->opcode.bytes[0] == 0xcd ||
>> +               insn->opcode.bytes[0] == 0xce ||
>> +               insn->opcode.bytes[0] == 0xf1;
>> +    }
>> +}
>> +
>>   /* Check if paddr is at an instruction boundary */
>>   static int can_probe(unsigned long paddr)
>>   {
>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>   #endif
>>           addr += insn.length;
>>       }
>> +    __addr = recover_probed_instruction(buf, addr);
>> +    if (!__addr)
>> +        return 0;
>> +
>> +    if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> +        return 0;
>> +
>> +    if (is_exception_insn(&insn))
>> +        return 0;
>> +
>>       if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>           /*
>>            * The compiler generates the following instruction sequence
>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>            * Also, these movl and addl are used for showing expected
>>            * type. So those must not be touched.
>>            */
>> -        __addr = recover_probed_instruction(buf, addr);
>> -        if (!__addr)
>> -            return 0;
>> -
>> -        if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> -            return 0;
>> -
>>           if (insn.opcode.value == 0xBA)
>>               offset = 12;
>>           else if (insn.opcode.value == 0x3)
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-28  1:19   ` Masami Hiramatsu
@ 2024-01-28 21:25     ` Jinghao Jia
  2024-01-30  1:44       ` Masami Hiramatsu
  0 siblings, 1 reply; 13+ messages in thread
From: Jinghao Jia @ 2024-01-28 21:25 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4097 bytes --]



On 1/27/24 19:19, Masami Hiramatsu (Google) wrote:
> On Fri, 26 Jan 2024 22:41:23 -0600
> Jinghao Jia <jinghao7@illinois.edu> wrote:
> 
>> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
>> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
>> involved in LLVM-KCFI instrumentation. At the same time, attaching
>> kprobes on these instructions (particularly UDs) will pollute the stack
>> trace dumped in the kernel ring buffer, since the exception is triggered
>> in the copy buffer rather than the original location.
>>
>> Check for INTs and UDs in can_probe and reject any kprobes trying to
>> attach to these instructions.
>>
> 
> Thanks for implement this check!
> 

You are very welcome :)

> 
>> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
>> ---
>>  arch/x86/kernel/kprobes/core.c | 33 ++++++++++++++++++++++++++-------
>>  1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index e8babebad7b8..792b38d22126 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>>  	return __recover_probed_insn(buf, addr);
>>  }
>>  
>> +static inline int is_exception_insn(struct insn *insn)
>> +{
>> +	if (insn->opcode.bytes[0] == 0x0f) {
>> +		/* UD0 / UD1 / UD2 */
>> +		return insn->opcode.bytes[1] == 0xff ||
>> +		       insn->opcode.bytes[1] == 0xb9 ||
>> +		       insn->opcode.bytes[1] == 0x0b;
>> +	} else {
> 
> If "else" block just return, you don't need this "else".
> 
> bool func()
> {
> 	if (cond)
> 		return ...
> 
> 	return ...
> }
> 
> Is preferrable because this puts "return val" always at the end of non-void
> function.
> 

I will fix this in the v2.

>> +		/* INT3 / INT n / INTO / INT1 */
>> +		return insn->opcode.bytes[0] == 0xcc ||
>> +		       insn->opcode.bytes[0] == 0xcd ||
>> +		       insn->opcode.bytes[0] == 0xce ||
>> +		       insn->opcode.bytes[0] == 0xf1;
>> +	}
>> +}
>> +
>>  /* Check if paddr is at an instruction boundary */
>>  static int can_probe(unsigned long paddr)
>>  {
>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>  #endif
>>  		addr += insn.length;
>>  	}
>> +	__addr = recover_probed_instruction(buf, addr);
>> +	if (!__addr)
>> +		return 0;
>> +
>> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> +		return 0;
>> +
>> +	if (is_exception_insn(&insn))
>> +		return 0;
>> +
> 
> Please don't put this outside of decoding loop. You should put these in
> the loop which decodes the instruction from the beginning of the function.
> Since the x86 instrcution is variable length, can_probe() needs to check
> whether that the address is instruction boundary and decodable.
> 
> Thank you,

If my understanding is correct then this is trying to decode the kprobe
target instruction, given that it is after the main decoding loop.  Here I
hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
block so that we do not need to decode the same instruction twice.  I left
the main decoding loop unchanged so it is still decoding the function from
the start and should handle instruction boundaries. Are there any caveats
that I missed?

--Jinghao

> 
>>  	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>  		/*
>>  		 * The compiler generates the following instruction sequence
>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>  		 * Also, these movl and addl are used for showing expected
>>  		 * type. So those must not be touched.
>>  		 */
>> -		__addr = recover_probed_instruction(buf, addr);
>> -		if (!__addr)
>> -			return 0;
>> -
>> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> -			return 0;
>> -
>>  		if (insn.opcode.value == 0xBA)
>>  			offset = 12;
>>  		else if (insn.opcode.value == 0x3)
>> -- 
>> 2.43.0
>>
> 
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5
  2024-01-28  2:22   ` Masami Hiramatsu
@ 2024-01-28 21:30     ` Jinghao Jia
  2024-01-30  1:45       ` Masami Hiramatsu
  0 siblings, 1 reply; 13+ messages in thread
From: Jinghao Jia @ 2024-01-28 21:30 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3754 bytes --]



On 1/27/24 20:22, Masami Hiramatsu (Google) wrote:
> On Fri, 26 Jan 2024 22:41:24 -0600
> Jinghao Jia <jinghao7@illinois.edu> wrote:
> 
>> With the instruction decoder, we are now able to decode and recognize
>> instructions with opcode extensions. There are more instructions in
>> these groups that can be boosted:
>>
>> Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
>> Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
>> Group 4: INC, DEC (byte operation)
>> Group 5: INC, DEC (word/doubleword/quadword operation)
>>
>> These instructions are not boosted previously because there are reserved
>> opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
>> unmapped. As a result, kprobes attached to them requires two int3 traps
>> as being non-boostable also prevents jump-optimization.
>>
>> Some simple tests on QEMU show that after boosting and jump-optimization
>> a single kprobe on these instructions with an empty pre-handler runs 10x
>> faster (~1000 cycles vs. ~100 cycles).
>>
>> Since these instructions are mostly ALU operations and do not touch
>> special registers like RIP, let's boost them so that we get the
>> performance benefit.
>>
> 
> As far as we check the ModR/M byte, I think we can safely run these
> instructions on trampoline buffer without adjusting results (this
> means it can be "boosted").
> I just have a minor comment, but basically this looks good to me.
> 
> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
>> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
>> ---
>>  arch/x86/kernel/kprobes/core.c | 21 +++++++++++++++------
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index 792b38d22126..f847bd9cc91b 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
>>  	case 0x62:		/* bound */
>>  	case 0x70 ... 0x7f:	/* Conditional jumps */
>>  	case 0x9a:		/* Call far */
>> -	case 0xc0 ... 0xc1:	/* Grp2 */
>>  	case 0xcc ... 0xce:	/* software exceptions */
>> -	case 0xd0 ... 0xd3:	/* Grp2 */
>>  	case 0xd6:		/* (UD) */
>>  	case 0xd8 ... 0xdf:	/* ESC */
>>  	case 0xe0 ... 0xe3:	/* LOOP*, JCXZ */
>>  	case 0xe8 ... 0xe9:	/* near Call, JMP */
>>  	case 0xeb:		/* Short JMP */
>>  	case 0xf0 ... 0xf4:	/* LOCK/REP, HLT */
>> -	case 0xf6 ... 0xf7:	/* Grp3 */
>> -	case 0xfe:		/* Grp4 */
>>  		/* ... are not boostable */
>>  		return 0;
>> +	case 0xc0 ... 0xc1:	/* Grp2 */
>> +	case 0xd0 ... 0xd3:	/* Grp2 */
>> +		/* ModR/M nnn == 110 is reserved */
>> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
>> +	case 0xf6 ... 0xf7:	/* Grp3 */
>> +		/* ModR/M nnn == 001 is reserved */
> 
> 		/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
> 

I will incorporate this into the v2. Since nnn == 001 is still considered
reserved by Intel, we still need to prevent it from being boosted, don't
we?

--Jinghao

>> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
>> +	case 0xfe:		/* Grp4 */
>> +		/* Only inc and dec are boostable */
>> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
>> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
>>  	case 0xff:		/* Grp5 */
>> -		/* Only indirect jmp is boostable */
>> -		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>> +		/* Only inc, dec, and indirect jmp are boostable */
>> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
>> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
>> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>>  	default:
>>  		return 1;
>>  	}
>> -- 
>> 2.43.0
>>
> 
> Thamnk you,
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-28 21:25     ` Jinghao Jia
@ 2024-01-30  1:44       ` Masami Hiramatsu
  2024-01-30  2:50         ` Jinghao Jia
  0 siblings, 1 reply; 13+ messages in thread
From: Masami Hiramatsu @ 2024-01-30  1:44 UTC (permalink / raw)
  To: Jinghao Jia
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel

On Sun, 28 Jan 2024 15:25:59 -0600
Jinghao Jia <jinghao7@illinois.edu> wrote:

> >>  /* Check if paddr is at an instruction boundary */
> >>  static int can_probe(unsigned long paddr)
> >>  {
> >> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
> >>  #endif
> >>  		addr += insn.length;
> >>  	}
> >> +	__addr = recover_probed_instruction(buf, addr);
> >> +	if (!__addr)
> >> +		return 0;
> >> +
> >> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> >> +		return 0;
> >> +
> >> +	if (is_exception_insn(&insn))
> >> +		return 0;
> >> +
> > 
> > Please don't put this outside of decoding loop. You should put these in
> > the loop which decodes the instruction from the beginning of the function.
> > Since the x86 instrcution is variable length, can_probe() needs to check
> > whether that the address is instruction boundary and decodable.
> > 
> > Thank you,
> 
> If my understanding is correct then this is trying to decode the kprobe
> target instruction, given that it is after the main decoding loop.  Here I
> hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
> block so that we do not need to decode the same instruction twice.  I left
> the main decoding loop unchanged so it is still decoding the function from
> the start and should handle instruction boundaries. Are there any caveats
> that I missed?

Ah, sorry I misread the patch. You're correct!
This is a good place to do that.

But hmm, I think we should add another patch to check the addr == paddr
soon after the loop so that we will avoid decoding.

Thank you,

> 
> --Jinghao
> 
> > 
> >>  	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
> >>  		/*
> >>  		 * The compiler generates the following instruction sequence
> >> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
> >>  		 * Also, these movl and addl are used for showing expected
> >>  		 * type. So those must not be touched.
> >>  		 */
> >> -		__addr = recover_probed_instruction(buf, addr);
> >> -		if (!__addr)
> >> -			return 0;
> >> -
> >> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> >> -			return 0;
> >> -
> >>  		if (insn.opcode.value == 0xBA)
> >>  			offset = 12;
> >>  		else if (insn.opcode.value == 0x3)
> >> -- 
> >> 2.43.0
> >>
> > 
> > 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5
  2024-01-28 21:30     ` Jinghao Jia
@ 2024-01-30  1:45       ` Masami Hiramatsu
  0 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2024-01-30  1:45 UTC (permalink / raw)
  To: Jinghao Jia
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel

On Sun, 28 Jan 2024 15:30:50 -0600
Jinghao Jia <jinghao7@illinois.edu> wrote:

> 
> 
> On 1/27/24 20:22, Masami Hiramatsu (Google) wrote:
> > On Fri, 26 Jan 2024 22:41:24 -0600
> > Jinghao Jia <jinghao7@illinois.edu> wrote:
> > 
> >> With the instruction decoder, we are now able to decode and recognize
> >> instructions with opcode extensions. There are more instructions in
> >> these groups that can be boosted:
> >>
> >> Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
> >> Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
> >> Group 4: INC, DEC (byte operation)
> >> Group 5: INC, DEC (word/doubleword/quadword operation)
> >>
> >> These instructions are not boosted previously because there are reserved
> >> opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
> >> unmapped. As a result, kprobes attached to them requires two int3 traps
> >> as being non-boostable also prevents jump-optimization.
> >>
> >> Some simple tests on QEMU show that after boosting and jump-optimization
> >> a single kprobe on these instructions with an empty pre-handler runs 10x
> >> faster (~1000 cycles vs. ~100 cycles).
> >>
> >> Since these instructions are mostly ALU operations and do not touch
> >> special registers like RIP, let's boost them so that we get the
> >> performance benefit.
> >>
> > 
> > As far as we check the ModR/M byte, I think we can safely run these
> > instructions on trampoline buffer without adjusting results (this
> > means it can be "boosted").
> > I just have a minor comment, but basically this looks good to me.
> > 
> > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> >> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
> >> ---
> >>  arch/x86/kernel/kprobes/core.c | 21 +++++++++++++++------
> >>  1 file changed, 15 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> >> index 792b38d22126..f847bd9cc91b 100644
> >> --- a/arch/x86/kernel/kprobes/core.c
> >> +++ b/arch/x86/kernel/kprobes/core.c
> >> @@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
> >>  	case 0x62:		/* bound */
> >>  	case 0x70 ... 0x7f:	/* Conditional jumps */
> >>  	case 0x9a:		/* Call far */
> >> -	case 0xc0 ... 0xc1:	/* Grp2 */
> >>  	case 0xcc ... 0xce:	/* software exceptions */
> >> -	case 0xd0 ... 0xd3:	/* Grp2 */
> >>  	case 0xd6:		/* (UD) */
> >>  	case 0xd8 ... 0xdf:	/* ESC */
> >>  	case 0xe0 ... 0xe3:	/* LOOP*, JCXZ */
> >>  	case 0xe8 ... 0xe9:	/* near Call, JMP */
> >>  	case 0xeb:		/* Short JMP */
> >>  	case 0xf0 ... 0xf4:	/* LOCK/REP, HLT */
> >> -	case 0xf6 ... 0xf7:	/* Grp3 */
> >> -	case 0xfe:		/* Grp4 */
> >>  		/* ... are not boostable */
> >>  		return 0;
> >> +	case 0xc0 ... 0xc1:	/* Grp2 */
> >> +	case 0xd0 ... 0xd3:	/* Grp2 */
> >> +		/* ModR/M nnn == 110 is reserved */
> >> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
> >> +	case 0xf6 ... 0xf7:	/* Grp3 */
> >> +		/* ModR/M nnn == 001 is reserved */
> > 
> > 		/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
> > 
> 
> I will incorporate this into the v2. Since nnn == 001 is still considered
> reserved by Intel, we still need to prevent it from being boosted, don't
> we?
> 
> --Jinghao
> 
> >> +		return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
> >> +	case 0xfe:		/* Grp4 */
> >> +		/* Only inc and dec are boostable */
> >> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
> >> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
> >>  	case 0xff:		/* Grp5 */
> >> -		/* Only indirect jmp is boostable */
> >> -		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
> >> +		/* Only inc, dec, and indirect jmp are boostable */
> >> +		return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
> >> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
> >> +		       X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
> >>  	default:
> >>  		return 1;
> >>  	}
> >> -- 
> >> 2.43.0
> >>
> > 
> > Thamnk you,
> > 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-30  1:44       ` Masami Hiramatsu
@ 2024-01-30  2:50         ` Jinghao Jia
  2024-01-30 11:30           ` Masami Hiramatsu
  0 siblings, 1 reply; 13+ messages in thread
From: Jinghao Jia @ 2024-01-30  2:50 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2592 bytes --]

On 1/29/24 19:44, Masami Hiramatsu (Google) wrote:
> On Sun, 28 Jan 2024 15:25:59 -0600
> Jinghao Jia <jinghao7@illinois.edu> wrote:
> 
>>>>  /* Check if paddr is at an instruction boundary */
>>>>  static int can_probe(unsigned long paddr)
>>>>  {
>>>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>>>  #endif
>>>>  		addr += insn.length;
>>>>  	}
>>>> +	__addr = recover_probed_instruction(buf, addr);
>>>> +	if (!__addr)
>>>> +		return 0;
>>>> +
>>>> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>>>> +		return 0;
>>>> +
>>>> +	if (is_exception_insn(&insn))
>>>> +		return 0;
>>>> +
>>>
>>> Please don't put this outside of decoding loop. You should put these in
>>> the loop which decodes the instruction from the beginning of the function.
>>> Since the x86 instrcution is variable length, can_probe() needs to check
>>> whether that the address is instruction boundary and decodable.
>>>
>>> Thank you,
>>
>> If my understanding is correct then this is trying to decode the kprobe
>> target instruction, given that it is after the main decoding loop.  Here I
>> hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
>> block so that we do not need to decode the same instruction twice.  I left
>> the main decoding loop unchanged so it is still decoding the function from
>> the start and should handle instruction boundaries. Are there any caveats
>> that I missed?
> 
> Ah, sorry I misread the patch. You're correct!
> This is a good place to do that.
> 
> But hmm, I think we should add another patch to check the addr == paddr
> soon after the loop so that we will avoid decoding.
> 
> Thank you,
> 

Yes, that makes sense to me. At the same time, I'm also thinking about
changing the return type of can_probe() to bool, since we are just using
int as bool in this context.

--Jinghao

>>
>> --Jinghao
>>
>>>
>>>>  	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>>>  		/*
>>>>  		 * The compiler generates the following instruction sequence
>>>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>>>  		 * Also, these movl and addl are used for showing expected
>>>>  		 * type. So those must not be touched.
>>>>  		 */
>>>> -		__addr = recover_probed_instruction(buf, addr);
>>>> -		if (!__addr)
>>>> -			return 0;
>>>> -
>>>> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>>>> -			return 0;
>>>> -
>>>>  		if (insn.opcode.value == 0xBA)
>>>>  			offset = 12;
>>>>  		else if (insn.opcode.value == 0x3)
>>>> -- 
>>>> 2.43.0
>>>>
>>>
>>>
> 
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD
  2024-01-30  2:50         ` Jinghao Jia
@ 2024-01-30 11:30           ` Masami Hiramatsu
  0 siblings, 0 replies; 13+ messages in thread
From: Masami Hiramatsu @ 2024-01-30 11:30 UTC (permalink / raw)
  To: Jinghao Jia
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Peter Zijlstra, linux-trace-kernel, linux-kernel

On Mon, 29 Jan 2024 20:50:39 -0600
Jinghao Jia <jinghao7@illinois.edu> wrote:

> On 1/29/24 19:44, Masami Hiramatsu (Google) wrote:
> > On Sun, 28 Jan 2024 15:25:59 -0600
> > Jinghao Jia <jinghao7@illinois.edu> wrote:
> > 
> >>>>  /* Check if paddr is at an instruction boundary */
> >>>>  static int can_probe(unsigned long paddr)
> >>>>  {
> >>>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
> >>>>  #endif
> >>>>  		addr += insn.length;
> >>>>  	}
> >>>> +	__addr = recover_probed_instruction(buf, addr);
> >>>> +	if (!__addr)
> >>>> +		return 0;
> >>>> +
> >>>> +	if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> >>>> +		return 0;
> >>>> +
> >>>> +	if (is_exception_insn(&insn))
> >>>> +		return 0;
> >>>> +
> >>>
> >>> Please don't put this outside of decoding loop. You should put these in
> >>> the loop which decodes the instruction from the beginning of the function.
> >>> Since the x86 instrcution is variable length, can_probe() needs to check
> >>> whether that the address is instruction boundary and decodable.
> >>>
> >>> Thank you,
> >>
> >> If my understanding is correct then this is trying to decode the kprobe
> >> target instruction, given that it is after the main decoding loop.  Here I
> >> hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
> >> block so that we do not need to decode the same instruction twice.  I left
> >> the main decoding loop unchanged so it is still decoding the function from
> >> the start and should handle instruction boundaries. Are there any caveats
> >> that I missed?
> > 
> > Ah, sorry I misread the patch. You're correct!
> > This is a good place to do that.
> > 
> > But hmm, I think we should add another patch to check the addr == paddr
> > soon after the loop so that we will avoid decoding.
> > 
> > Thank you,
> > 
> 
> Yes, that makes sense to me. At the same time, I'm also thinking about
> changing the return type of can_probe() to bool, since we are just using
> int as bool in this context.

Yes, that is also a good change :)

Thank you,

> 
> --Jinghao
> 
> >>
> >> --Jinghao
> >>
> >>>
> >>>>  	if (IS_ENABLED(CONFIG_CFI_CLANG)) {
> >>>>  		/*
> >>>>  		 * The compiler generates the following instruction sequence
> >>>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
> >>>>  		 * Also, these movl and addl are used for showing expected
> >>>>  		 * type. So those must not be touched.
> >>>>  		 */
> >>>> -		__addr = recover_probed_instruction(buf, addr);
> >>>> -		if (!__addr)
> >>>> -			return 0;
> >>>> -
> >>>> -		if (insn_decode_kernel(&insn, (void *)__addr) < 0)
> >>>> -			return 0;
> >>>> -
> >>>>  		if (insn.opcode.value == 0xBA)
> >>>>  			offset = 12;
> >>>>  		else if (insn.opcode.value == 0x3)
> >>>> -- 
> >>>> 2.43.0
> >>>>
> >>>
> >>>
> > 
> > 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-30 11:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-27  4:41 [RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes Jinghao Jia
2024-01-27  4:41 ` [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD Jinghao Jia
2024-01-27 19:47   ` Xin Li
2024-01-28 21:09     ` Jinghao Jia
2024-01-28  1:19   ` Masami Hiramatsu
2024-01-28 21:25     ` Jinghao Jia
2024-01-30  1:44       ` Masami Hiramatsu
2024-01-30  2:50         ` Jinghao Jia
2024-01-30 11:30           ` Masami Hiramatsu
2024-01-27  4:41 ` [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5 Jinghao Jia
2024-01-28  2:22   ` Masami Hiramatsu
2024-01-28 21:30     ` Jinghao Jia
2024-01-30  1:45       ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).