linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
@ 2014-05-05 18:24 Denys Vlasenko
  2014-05-05 18:24 ` [PATCH 2/2] uprobes: fix 1-byte opcode tables Denys Vlasenko
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Denys Vlasenko @ 2014-05-05 18:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denys Vlasenko, Jim Keniston, Masami Hiramatsu,
	Srikar Dronamraju, Ingo Molnar, Oleg Nesterov

After adding these, it's clear we have some awkward choices there.
Some valid instructions are prohibited from uprobing while
several invalid ones are allowed.

Hopefully future edits to the good-opcode tables will fix wrong bits
or explain why those bits are not wrong.

No actual code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jim Keniston <jkenisto@us.ibm.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c | 153 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 134 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index dbbf6cd..4c66a7c 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -63,6 +63,49 @@
  * Good-instruction tables for 32-bit apps.  This is non-const and volatile
  * to keep gcc from statically optimizing it out, as variable_test_bit makes
  * some versions of gcc to think only *(unsigned long*) is used.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz    (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs
+ *	(why we support bound (62) then? it's similar, and similarly unused...)
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * 07,17,1f - pop es/ss/ds
+ *	Normally not used in userspace, but would execute if used.
+ *	Can cause GP or stack exception if tries to load wrong segment descriptor.
+ *	We hesitate to run them under single step since kernel's handling
+ *	of userspace single-stepping (TF flag) is fragile.
+ *	We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e)
+ *	on the same grounds that they are never used.
+ * cd - int N.
+ *	Used by userspace for "int 80" syscall entry. (Other "int N"
+ *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ *	Not supported since kernel's handling of userspace single-stepping
+ *	(TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ *
+ * Opcodes which can be enabled right away:
+ * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
+ * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
  */
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 static volatile u32 good_insns_32[256 / 32] = {
@@ -91,7 +134,55 @@ static volatile u32 good_insns_32[256 / 32] = {
 #define good_insns_32	NULL
 #endif
 
-/* Good-instruction tables for 64-bit apps */
+/* Good-instruction tables for 64-bit apps.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 40-4f - rex prefixes
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz    (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Genuinely invalid opcodes:
+ * 06,07 - formerly push/pop es
+ * 0e - formerly push cs
+ * 16,17 - formerly push/pop ss
+ * 1e,1f - formerly push/pop ds
+ * 27,2f,37,3f - formerly daa/das/aaa/aas
+ * 60,61 - formerly pusha/popa
+ * 62 - formerly bound. EVEX prefix for AVX512
+ * 82 - formerly redundant encoding of Group1
+ * 9a - formerly call seg:ofs (marked as "supported"???)
+ * c4,c5 - formerly les/lds. VEX prefixes for AVX
+ * ce - formerly into
+ * d4,d5 - formerly aam/aad
+ * d6 - formerly undocumented salc
+ * ea - formerly jmp seg:ofs (marked as "supported"???)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * cd - int N.
+ *	Used by userspace for "int 80" syscall entry. (Other "int N"
+ *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ *	Not supported since kernel's handling of userspace single-stepping
+ *	(TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ */
 #if defined(CONFIG_X86_64)
 static volatile u32 good_insns_64[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
@@ -119,7 +210,48 @@ static volatile u32 good_insns_64[256 / 32] = {
 #define good_insns_64	NULL
 #endif
 
-/* Using this for both 64-bit and 32-bit apps */
+/* Using this for both 64-bit and 32-bit apps.
+ * Opcodes we don't support:
+ * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns
+ * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
+ *	Also encodes tons of other system insns if mod=11.
+ *	Some are in fact non-system: xend, xtest, rdtscp, maybe more
+ * 0f 02 - lar (why? should be safe, it throws no exceptipons)
+ * 0f 03 - lsl (why? should be safe, it throws no exceptipons)
+ * 0f 04 - undefined
+ * 0f 05 - syscall
+ * 0f 06 - clts (CPL0 insn)
+ * 0f 07 - sysret
+ * 0f 08 - invd (CPL0 insn)
+ * 0f 09 - wbinvd (CPL0 insn)
+ * 0f 0a - undefined
+ * 0f 0b - ud1
+ * 0f 0c - undefined
+ * 0f 0d - prefetchFOO (amd prefetch insns)
+ * 0f 18 - prefetchBAR (intel prefetch insns)
+ * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
+ * 0f 25 - undefined
+ * 0f 26 - mov to test regs
+ * 0f 27 - undefined
+ * 0f 30 - wrmsr (CPL0 insn)
+ * 0f 34 - sysenter
+ * 0f 35 - sysexit
+ * 0f 36 - undefined
+ * 0f 37 - getsec
+ * 0f 38-3f - 3-byte opcodes (why?? all look safe)
+ * 0f 78 - vmread
+ * 0f 79 - vmwrite
+ * 0f 7a - undefined
+ * 0f 7b - undefined
+ * 0f 7c - undefined
+ * 0f 7d - undefined
+ * 0f a6 - undefined
+ * 0f a7 - undefined
+ * 0f b8 - popcnt (why?? it's an ordinary ALU op)
+ * 0f d0 - undefined
+ * 0f f0 - lddqu (why?? it's an ordinary vector load op)
+ * 0f ff - undefined
+ */
 static volatile u32 good_2byte_insns[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 	/*      ----------------------------------------------         */
@@ -145,23 +277,6 @@ static volatile u32 good_2byte_insns[256 / 32] = {
 #undef W
 
 /*
- * opcodes we'll probably never support:
- *
- *  6c-6d, e4-e5, ec-ed - in
- *  6e-6f, e6-e7, ee-ef - out
- *  cc, cd - int3, int
- *  cf - iret
- *  d6 - illegal instruction
- *  f1 - int1/icebp
- *  f4 - hlt
- *  fa, fb - cli, sti
- *  0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
- *
- * invalid opcodes in 64-bit mode:
- *
- *  06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
- *  63 - we support this opcode in x86_64 but not in i386.
- *
  * opcodes we may need to refine support for:
  *
  *  0f - 2-byte instructions: For many of these instructions, the validity
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] uprobes: fix 1-byte opcode tables
  2014-05-05 18:24 [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
@ 2014-05-05 18:24 ` Denys Vlasenko
  2014-05-05 19:41 ` [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
  2014-05-05 22:32 ` Jim Keniston
  2 siblings, 0 replies; 7+ messages in thread
From: Denys Vlasenko @ 2014-05-05 18:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denys Vlasenko, Jim Keniston, Masami Hiramatsu,
	Srikar Dronamraju, Ingo Molnar, Oleg Nesterov

This change fixes 1-byte opcode tables so that only insns
for which we have real reasons to disallow probing are marked
with unset bits.

To that end:

Set bits for all prefix bytes. Their setting is ignored anyway -
we check the bitmap against OPCODE1(insn), not against first byte.
Keeping them set to 0 only confuses code reader with
"why we don't support that opcode" question.

This includes bytes c4,c5 in 64-bit mode (VEX prefixes).
Byte 62 (EVEX prefix) is not yet enabled since insn decoder
does not support that yet.

For 32-bit mode, enable probing of opcodes 63 and d6.
They don't require any special handling.

For 64-bit mode, disable 9a and ea - these undefined opcodes
were mistakenly left enabled.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jim Keniston <jkenisto@us.ibm.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/uprobes.c | 66 +++++++++++++----------------------------------
 1 file changed, 18 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 4c66a7c..9a824fb 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -64,18 +64,6 @@
  * to keep gcc from statically optimizing it out, as variable_test_bit makes
  * some versions of gcc to think only *(unsigned long*) is used.
  *
- * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
- * won't report *prefixes* as OPCODE1(insn).
- * 0f - 2-byte opcode prefix
- * 26,2e,36,3e - es:/cs:/ss:/ds:
- * 64 - fs: (marked as "good", why?)
- * 65 - gs: (marked as "good", why?)
- * 66 - operand-size prefix
- * 67 - address-size prefix
- * f0 - lock prefix
- * f2 - repnz    (marked as "good", why?)
- * f3 - rep/repz (marked as "good", why?)
- *
  * Opcodes we'll probably never support:
  * 6c-6f - ins,outs. SEGVs if used in userspace
  * e4-e7 - in,out imm. SEGVs if used in userspace
@@ -102,31 +90,27 @@
  *	Not supported since kernel's handling of userspace single-stepping
  *	(TF flag) is fragile.
  * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
- *
- * Opcodes which can be enabled right away:
- * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
- * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
  */
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 static volatile u32 good_insns_32[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 	/*      ----------------------------------------------         */
-	W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */
+	W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 00 */
 	W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */
-	W(0x20, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* 20 */
-	W(0x30, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) , /* 30 */
+	W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
+	W(0x30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */
 	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
 	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
-	W(0x60, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+	W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
 	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
 	W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
 	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
 	W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
 	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
 	W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
-	W(0xd0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+	W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
 	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
-	W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
+	W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
 	/*      ----------------------------------------------         */
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 };
@@ -136,19 +120,6 @@ static volatile u32 good_insns_32[256 / 32] = {
 
 /* Good-instruction tables for 64-bit apps.
  *
- * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
- * won't report *prefixes* as OPCODE1(insn).
- * 0f - 2-byte opcode prefix
- * 26,2e,36,3e - es:/cs:/ss:/ds:
- * 40-4f - rex prefixes
- * 64 - fs: (marked as "good", why?)
- * 65 - gs: (marked as "good", why?)
- * 66 - operand-size prefix
- * 67 - address-size prefix
- * f0 - lock prefix
- * f2 - repnz    (marked as "good", why?)
- * f3 - rep/repz (marked as "good", why?)
- *
  * Genuinely invalid opcodes:
  * 06,07 - formerly push/pop es
  * 0e - formerly push cs
@@ -156,14 +127,13 @@ static volatile u32 good_insns_32[256 / 32] = {
  * 1e,1f - formerly push/pop ds
  * 27,2f,37,3f - formerly daa/das/aaa/aas
  * 60,61 - formerly pusha/popa
- * 62 - formerly bound. EVEX prefix for AVX512
+ * 62 - formerly bound. EVEX prefix for AVX512 (not yet supported)
  * 82 - formerly redundant encoding of Group1
- * 9a - formerly call seg:ofs (marked as "supported"???)
- * c4,c5 - formerly les/lds. VEX prefixes for AVX
+ * 9a - formerly call seg:ofs
  * ce - formerly into
  * d4,d5 - formerly aam/aad
  * d6 - formerly undocumented salc
- * ea - formerly jmp seg:ofs (marked as "supported"???)
+ * ea - formerly jmp seg:ofs
  *
  * Opcodes we'll probably never support:
  * 6c-6f - ins,outs. SEGVs if used in userspace
@@ -187,22 +157,22 @@ static volatile u32 good_insns_32[256 / 32] = {
 static volatile u32 good_insns_64[256 / 32] = {
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 	/*      ----------------------------------------------         */
-	W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */
+	W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* 00 */
 	W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */
-	W(0x20, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 20 */
-	W(0x30, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 30 */
-	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
+	W(0x20, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 20 */
+	W(0x30, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 30 */
+	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
 	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
-	W(0x60, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+	W(0x60, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
 	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
 	W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1) , /* 90 */
 	W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
 	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
-	W(0xc0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
+	W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
 	W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
-	W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
+	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* e0 */
+	W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1)   /* f0 */
 	/*      ----------------------------------------------         */
 	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
 };
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-05-05 18:24 [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
  2014-05-05 18:24 ` [PATCH 2/2] uprobes: fix 1-byte opcode tables Denys Vlasenko
@ 2014-05-05 19:41 ` Oleg Nesterov
  2014-05-06 17:01   ` Denys Vlasenko
  2014-05-05 22:32 ` Jim Keniston
  2 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2014-05-05 19:41 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Jim Keniston, Masami Hiramatsu, Srikar Dronamraju,
	Ingo Molnar

On 05/05, Denys Vlasenko wrote:
>
> + * Opcodes we'll probably never support:
> + * 6c-6f - ins,outs. SEGVs if used in userspace
> + * e4-e7 - in,out imm. SEGVs if used in userspace
> + * ec-ef - in,out acc. SEGVs if used in userspace

Well. I have no idea why they are nacked, but this is not the reason.

SEGVs are fine. Plus we have ioperm().

> + * cc - int3. SIGTRAP if used in userspace

SIGTRAP is not the reason. We do not want to enter do_int3() in UTASK_SSTEP
state with utask->active_uprobe != NULL etc. Too many potential complications.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-05-05 18:24 [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
  2014-05-05 18:24 ` [PATCH 2/2] uprobes: fix 1-byte opcode tables Denys Vlasenko
  2014-05-05 19:41 ` [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
@ 2014-05-05 22:32 ` Jim Keniston
  2014-05-06 16:11   ` Denys Vlasenko
  2 siblings, 1 reply; 7+ messages in thread
From: Jim Keniston @ 2014-05-05 22:32 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Masami Hiramatsu, Srikar Dronamraju, Ingo Molnar,
	Oleg Nesterov

On Mon, 2014-05-05 at 20:24 +0200, Denys Vlasenko wrote:
> After adding these, it's clear we have some awkward choices there.
> Some valid instructions are prohibited from uprobing while
> several invalid ones are allowed.
> 
> Hopefully future edits to the good-opcode tables will fix wrong bits
> or explain why those bits are not wrong.
> 
> No actual code changes.
> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Jim Keniston <jkenisto@us.ibm.com>
> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Oleg Nesterov <oleg@redhat.com>

All of the following is FYI.

The good-instruction tables date back 2006-2007.  Back then, the
philosophy was to disallow any questionable opcodes, and add them back
into the "good" tables only when a need was demonstrated (i.e., somebody
needed to probe that particular instruction) and we could verify that
probing that instruction worked.

This was before the instruction decoder, so we tended to allow/disallow
prefixes as if they were opcodes.  The fs, gs, and (I think) repz and
repnz prefixes were initially marked "bad," but were later changed and
verified as good when the need arose to probe instructions with those
prefixes.

And lacking the instruction decoder, a two-byte opcode was allowed if
any form of it was legitimate.

So (1) I acknowledge that the good/bad-opcode decisions could be much
more precise, and welcome any improvements in that regard; and (2) once
again I think somebody should erect a statue of Masami for the
painstaking work he did on the instruction decoder.  Maybe add Masami
AND Denys to Mt. Rushmore. :-)

Jim

> ---
>  arch/x86/kernel/uprobes.c | 153 ++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 134 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index dbbf6cd..4c66a7c 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -63,6 +63,49 @@
>   * Good-instruction tables for 32-bit apps.  This is non-const and volatile
>   * to keep gcc from statically optimizing it out, as variable_test_bit makes
>   * some versions of gcc to think only *(unsigned long*) is used.
> + *
> + * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> + * won't report *prefixes* as OPCODE1(insn).
> + * 0f - 2-byte opcode prefix
> + * 26,2e,36,3e - es:/cs:/ss:/ds:
> + * 64 - fs: (marked as "good", why?)
> + * 65 - gs: (marked as "good", why?)
> + * 66 - operand-size prefix
> + * 67 - address-size prefix
> + * f0 - lock prefix
> + * f2 - repnz    (marked as "good", why?)
> + * f3 - rep/repz (marked as "good", why?)
> + *
> + * Opcodes we'll probably never support:
> + * 6c-6f - ins,outs. SEGVs if used in userspace
> + * e4-e7 - in,out imm. SEGVs if used in userspace
> + * ec-ef - in,out acc. SEGVs if used in userspace
> + * cc - int3. SIGTRAP if used in userspace
> + * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs
> + *	(why we support bound (62) then? it's similar, and similarly unused...)
> + * f1 - int1. SIGTRAP if used in userspace
> + * f4 - hlt. SEGVs if used in userspace
> + * fa - cli. SEGVs if used in userspace
> + * fb - sti. SEGVs if used in userspace
> + *
> + * Opcodes which need some work to be supported:
> + * 07,17,1f - pop es/ss/ds
> + *	Normally not used in userspace, but would execute if used.
> + *	Can cause GP or stack exception if tries to load wrong segment descriptor.
> + *	We hesitate to run them under single step since kernel's handling
> + *	of userspace single-stepping (TF flag) is fragile.
> + *	We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e)
> + *	on the same grounds that they are never used.
> + * cd - int N.
> + *	Used by userspace for "int 80" syscall entry. (Other "int N"
> + *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
> + *	Not supported since kernel's handling of userspace single-stepping
> + *	(TF flag) is fragile.
> + * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
> + *
> + * Opcodes which can be enabled right away:
> + * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
> + * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
>   */
>  #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
>  static volatile u32 good_insns_32[256 / 32] = {
> @@ -91,7 +134,55 @@ static volatile u32 good_insns_32[256 / 32] = {
>  #define good_insns_32	NULL
>  #endif
> 
> -/* Good-instruction tables for 64-bit apps */
> +/* Good-instruction tables for 64-bit apps.
> + *
> + * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> + * won't report *prefixes* as OPCODE1(insn).
> + * 0f - 2-byte opcode prefix
> + * 26,2e,36,3e - es:/cs:/ss:/ds:
> + * 40-4f - rex prefixes
> + * 64 - fs: (marked as "good", why?)
> + * 65 - gs: (marked as "good", why?)
> + * 66 - operand-size prefix
> + * 67 - address-size prefix
> + * f0 - lock prefix
> + * f2 - repnz    (marked as "good", why?)
> + * f3 - rep/repz (marked as "good", why?)
> + *
> + * Genuinely invalid opcodes:
> + * 06,07 - formerly push/pop es
> + * 0e - formerly push cs
> + * 16,17 - formerly push/pop ss
> + * 1e,1f - formerly push/pop ds
> + * 27,2f,37,3f - formerly daa/das/aaa/aas
> + * 60,61 - formerly pusha/popa
> + * 62 - formerly bound. EVEX prefix for AVX512
> + * 82 - formerly redundant encoding of Group1
> + * 9a - formerly call seg:ofs (marked as "supported"???)
> + * c4,c5 - formerly les/lds. VEX prefixes for AVX
> + * ce - formerly into
> + * d4,d5 - formerly aam/aad
> + * d6 - formerly undocumented salc
> + * ea - formerly jmp seg:ofs (marked as "supported"???)
> + *
> + * Opcodes we'll probably never support:
> + * 6c-6f - ins,outs. SEGVs if used in userspace
> + * e4-e7 - in,out imm. SEGVs if used in userspace
> + * ec-ef - in,out acc. SEGVs if used in userspace
> + * cc - int3. SIGTRAP if used in userspace
> + * f1 - int1. SIGTRAP if used in userspace
> + * f4 - hlt. SEGVs if used in userspace
> + * fa - cli. SEGVs if used in userspace
> + * fb - sti. SEGVs if used in userspace
> + *
> + * Opcodes which need some work to be supported:
> + * cd - int N.
> + *	Used by userspace for "int 80" syscall entry. (Other "int N"
> + *	cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
> + *	Not supported since kernel's handling of userspace single-stepping
> + *	(TF flag) is fragile.
> + * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
> + */
>  #if defined(CONFIG_X86_64)
>  static volatile u32 good_insns_64[256 / 32] = {
>  	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
> @@ -119,7 +210,48 @@ static volatile u32 good_insns_64[256 / 32] = {
>  #define good_insns_64	NULL
>  #endif
> 
> -/* Using this for both 64-bit and 32-bit apps */
> +/* Using this for both 64-bit and 32-bit apps.
> + * Opcodes we don't support:
> + * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns
> + * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
> + *	Also encodes tons of other system insns if mod=11.
> + *	Some are in fact non-system: xend, xtest, rdtscp, maybe more
> + * 0f 02 - lar (why? should be safe, it throws no exceptipons)
> + * 0f 03 - lsl (why? should be safe, it throws no exceptipons)
> + * 0f 04 - undefined
> + * 0f 05 - syscall
> + * 0f 06 - clts (CPL0 insn)
> + * 0f 07 - sysret
> + * 0f 08 - invd (CPL0 insn)
> + * 0f 09 - wbinvd (CPL0 insn)
> + * 0f 0a - undefined
> + * 0f 0b - ud1
> + * 0f 0c - undefined
> + * 0f 0d - prefetchFOO (amd prefetch insns)
> + * 0f 18 - prefetchBAR (intel prefetch insns)
> + * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
> + * 0f 25 - undefined
> + * 0f 26 - mov to test regs
> + * 0f 27 - undefined
> + * 0f 30 - wrmsr (CPL0 insn)
> + * 0f 34 - sysenter
> + * 0f 35 - sysexit
> + * 0f 36 - undefined
> + * 0f 37 - getsec
> + * 0f 38-3f - 3-byte opcodes (why?? all look safe)
> + * 0f 78 - vmread
> + * 0f 79 - vmwrite
> + * 0f 7a - undefined
> + * 0f 7b - undefined
> + * 0f 7c - undefined
> + * 0f 7d - undefined
> + * 0f a6 - undefined
> + * 0f a7 - undefined
> + * 0f b8 - popcnt (why?? it's an ordinary ALU op)
> + * 0f d0 - undefined
> + * 0f f0 - lddqu (why?? it's an ordinary vector load op)
> + * 0f ff - undefined
> + */
>  static volatile u32 good_2byte_insns[256 / 32] = {
>  	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f         */
>  	/*      ----------------------------------------------         */
> @@ -145,23 +277,6 @@ static volatile u32 good_2byte_insns[256 / 32] = {
>  #undef W
> 
>  /*
> - * opcodes we'll probably never support:
> - *
> - *  6c-6d, e4-e5, ec-ed - in
> - *  6e-6f, e6-e7, ee-ef - out
> - *  cc, cd - int3, int
> - *  cf - iret
> - *  d6 - illegal instruction
> - *  f1 - int1/icebp
> - *  f4 - hlt
> - *  fa, fb - cli, sti
> - *  0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
> - *
> - * invalid opcodes in 64-bit mode:
> - *
> - *  06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
> - *  63 - we support this opcode in x86_64 but not in i386.
> - *
>   * opcodes we may need to refine support for:
>   *
>   *  0f - 2-byte instructions: For many of these instructions, the validity



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-05-05 22:32 ` Jim Keniston
@ 2014-05-06 16:11   ` Denys Vlasenko
  0 siblings, 0 replies; 7+ messages in thread
From: Denys Vlasenko @ 2014-05-06 16:11 UTC (permalink / raw)
  To: Jim Keniston
  Cc: linux-kernel, Masami Hiramatsu, Srikar Dronamraju, Ingo Molnar,
	Oleg Nesterov

On 05/06/2014 12:32 AM, Jim Keniston wrote:
> All of the following is FYI.
> 
> The good-instruction tables date back 2006-2007.  Back then, the
> philosophy was to disallow any questionable opcodes, and add them back
> into the "good" tables only when a need was demonstrated (i.e., somebody
> needed to probe that particular instruction) and we could verify that
> probing that instruction worked.

For the record, I do agree with this.
I think it's not worthwhile to expend additional work
to add support for instructions which won't ever be seen
in any sane program.

Example: iret instruction works in userspace
(with the correct stack layout, it will successfully
pop instruction pointer, statck pointer, flags, and continue
executing whereever that address points to).
But no one in userspace has any need to do that.

And since we need to expend additional work to support it
(we need to account for the fact it sets EFLAGS),
we can simply prohibit its probing.

However, if a particular insn requires no special support
whatsoever, it should be enabled even if it's not usually used
(examples: arpl, lsl). Merely because we want fewer '0' bits
in the tables - less explaining to ourselves "ok, why is that one
disallowed?".

And second, if a particular insn will be used in legitimate programs
(example: the boatload of new vector insns *will be* used by scientific
supercomputers, and will need debugging), we can't just disable probing
of it at the slightest inconvenience.

Example: 0f 78 - vmread (Intel VMX. CPL0 insn).
We now have it disabled. Presumably because it generates GP exception
when used in userspace, and we aren't sure we handle it absolutely
correctly. (The "slightest inconvenience"). So we disable it.

But there is a problem. With 66 and f2 prefixes, it's not vmread anymore.
These are valid, innocuous SSE4A vector ops. We disable them too.
With EVEX prefix on Knight's Landing CPU these are vector conversion ops.
We disable them too. But they eventually will be used.

I think we should try to support cases like this one.
Especially that in this case (we disabled an insn which
*always* throws an exception) it may even already be working
- we need to check that. If we do, many more '0' bits can become
'1' - in fact more than half of them.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-05-05 19:41 ` [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
@ 2014-05-06 17:01   ` Denys Vlasenko
  2014-05-06 19:26     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Denys Vlasenko @ 2014-05-06 17:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Jim Keniston, Masami Hiramatsu, Srikar Dronamraju,
	Ingo Molnar

On 05/05/2014 09:41 PM, Oleg Nesterov wrote:
> On 05/05, Denys Vlasenko wrote:
>>
>> + * Opcodes we'll probably never support:
>> + * 6c-6f - ins,outs. SEGVs if used in userspace
>> + * e4-e7 - in,out imm. SEGVs if used in userspace
>> + * ec-ef - in,out acc. SEGVs if used in userspace
> 
> Well. I have no idea why they are nacked, but this is not the reason.
> 
> SEGVs are fine. Plus we have ioperm().

Noted.

Oleg, can you clear for me the following -

If the probed instruction triggers an "illegal insn" or "privileged insn"
CPU exception - are we completely fine?

Or there are some problems? how bad are they?
Slightly wrong signal stack? Wrong EFLAGs on stack?
Wrong address of failing insn?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them
  2014-05-06 17:01   ` Denys Vlasenko
@ 2014-05-06 19:26     ` Oleg Nesterov
  0 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2014-05-06 19:26 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: linux-kernel, Jim Keniston, Masami Hiramatsu, Srikar Dronamraju,
	Ingo Molnar

Firstly, let me remind that my understanding of low lovel hardware details
is very limited.

On 05/06, Denys Vlasenko wrote:
>
> Oleg, can you clear for me the following -
>
> If the probed instruction triggers an "illegal insn" or "privileged insn"
> CPU exception - are we completely fine?

Yes I think we are fine. I assume that, say, do_debug() won't be called in this
case, and do_invalid_op()->do_trap() should trigger arch_uprobe_xol_was_trapped()
logic.

Well, actually we are not 100% fine because si_addr can be wrong. But this is
not invalid_op-specific, we need to fix this anyway and the fix is simple.

I do not want to discuss this now, but I am going to make another series later
which adds something like uprobe_instruction_pointer(regs). It can (should) be
used by DO_ERROR_INFO() (perhaps by something else, not sure about math_error())
_and_ by show_unhandled_signals users (actually the main reason to me). The only
problem is that this code should be cleanuped first. In fact I was thinking about
this change from the very beginning of the recent fixes, the "wrong" ip reported
by do_general_protection() greatly complicated the investigation of that problem.
But I need to take a rest of uprobes ;)

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-05-06 19:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-05 18:24 [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Denys Vlasenko
2014-05-05 18:24 ` [PATCH 2/2] uprobes: fix 1-byte opcode tables Denys Vlasenko
2014-05-05 19:41 ` [PATCH 1/2] uprobes: add comment with insn opcodes, mnemonics and why we dont support them Oleg Nesterov
2014-05-06 17:01   ` Denys Vlasenko
2014-05-06 19:26     ` Oleg Nesterov
2014-05-05 22:32 ` Jim Keniston
2014-05-06 16:11   ` Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).