All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] Drop some asm from copy_user_64.S
@ 2015-05-12 20:57 Borislav Petkov
  2015-05-12 21:13 ` Linus Torvalds
  2015-05-13  6:19 ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2015-05-12 20:57 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Andy Lutomirski, Denys Vlasenko
  Cc: lkml

Hi guys,

this is just an RFC first to sanity-check what I'm trying to do:

I want to get rid of the asm glue in arch/x86/lib/copy_user_64.S which
prepares the copy_user* alternatives calls. And replace it with nice and
clean C.

The other intention is to switch to using copy_user_generic() which does
CALL <copy_user_function> directly instead of as it is now with CALL
_copy_*_user and inside the JMP to the proper <copy_user_function>,
i.e., to save us that JMP.

I'm not 100% sure about the equivalence between the addition carry and
segment limit check we're doing in asm in arch/x86/lib/copy_user_64.S
now and with the access_ok() I've replaced it with.

I mean, it *looks* like access_ok() and __chk_range_not_ok() especially
does the proper checks - addition carry and segment limit with
user_addr_max() but I'd like for someone much more experienced than me
to double-check that.

So, without much further ado, here is the diff. It looks simple enough...

Thanks!

---
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ace9dec050b1..098f3fd5cc75 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -642,10 +642,11 @@ extern struct movsl_mask {
 # include <asm/uaccess_64.h>
 #endif
 
-unsigned long __must_check _copy_from_user(void *to, const void __user *from,
-					   unsigned n);
-unsigned long __must_check _copy_to_user(void __user *to, const void *from,
-					 unsigned n);
+extern __always_inline __must_check
+int _copy_from_user(void *dst, const void __user *src, unsigned size);
+
+extern __always_inline __must_check
+int _copy_to_user(void __user *dst, const void *src, unsigned size);
 
 #ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
 # define copy_user_diag __compiletime_error
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index f2f9b39b274a..1aebc658acf9 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -99,6 +99,17 @@ int __copy_from_user(void *dst, const void __user *src, unsigned size)
 }
 
 static __always_inline __must_check
+int _copy_from_user(void *dst, const void __user *src, unsigned size)
+{
+	if (!access_ok(VERIFY_READ, src, size)) {
+		memset(dst, 0, size);
+		return 0;
+	}
+
+	return copy_user_generic(dst, src, size);
+}
+
+static __always_inline __must_check
 int __copy_to_user_nocheck(void __user *dst, const void *src, unsigned size)
 {
 	int ret = 0;
@@ -149,6 +160,15 @@ int __copy_to_user(void __user *dst, const void *src, unsigned size)
 }
 
 static __always_inline __must_check
+int _copy_to_user(void __user *dst, const void *src, unsigned size)
+{
+	if (!access_ok(VERIFY_WRITE, dst, size))
+		return size;
+
+	return copy_user_generic(dst, src, size);
+}
+
+static __always_inline __must_check
 int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
 {
 	int ret = 0;
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 06ce685c3a5d..a577bdc0f5bf 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -12,60 +12,9 @@
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
 #include <asm/cpufeature.h>
-#include <asm/alternative-asm.h>
 #include <asm/asm.h>
 #include <asm/smap.h>
 
-/* Standard copy_to_user with segment limit checking */
-ENTRY(_copy_to_user)
-	CFI_STARTPROC
-	GET_THREAD_INFO(%rax)
-	movq %rdi,%rcx
-	addq %rdx,%rcx
-	jc bad_to_user
-	cmpq TI_addr_limit(%rax),%rcx
-	ja bad_to_user
-	ALTERNATIVE_2 "jmp copy_user_generic_unrolled",		\
-		      "jmp copy_user_generic_string",		\
-		      X86_FEATURE_REP_GOOD,			\
-		      "jmp copy_user_enhanced_fast_string",	\
-		      X86_FEATURE_ERMS
-	CFI_ENDPROC
-ENDPROC(_copy_to_user)
-
-/* Standard copy_from_user with segment limit checking */
-ENTRY(_copy_from_user)
-	CFI_STARTPROC
-	GET_THREAD_INFO(%rax)
-	movq %rsi,%rcx
-	addq %rdx,%rcx
-	jc bad_from_user
-	cmpq TI_addr_limit(%rax),%rcx
-	ja bad_from_user
-	ALTERNATIVE_2 "jmp copy_user_generic_unrolled",		\
-		      "jmp copy_user_generic_string",		\
-		      X86_FEATURE_REP_GOOD,			\
-		      "jmp copy_user_enhanced_fast_string",	\
-		      X86_FEATURE_ERMS
-	CFI_ENDPROC
-ENDPROC(_copy_from_user)
-
-	.section .fixup,"ax"
-	/* must zero dest */
-ENTRY(bad_from_user)
-bad_from_user:
-	CFI_STARTPROC
-	movl %edx,%ecx
-	xorl %eax,%eax
-	rep
-	stosb
-bad_to_user:
-	movl %edx,%eax
-	ret
-	CFI_ENDPROC
-ENDPROC(bad_from_user)
-	.previous
-
 /*
  * copy_user_generic_unrolled - memory copy with exception handling.
  * This version is for CPUs like P4 that don't have efficient micro

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-12 20:57 [RFC PATCH] Drop some asm from copy_user_64.S Borislav Petkov
@ 2015-05-12 21:13 ` Linus Torvalds
  2015-05-12 21:53   ` Borislav Petkov
  2015-05-13  6:19 ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2015-05-12 21:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Tue, May 12, 2015 at 1:57 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> I want to get rid of the asm glue in arch/x86/lib/copy_user_64.S which
> prepares the copy_user* alternatives calls. And replace it with nice and
> clean C.

Ack. I'm not a fan of the x86-64 usercopy funmctions.

That said, I think you should uninline those things, and move them
from a header file to a C file (arch/x86/lib/uaccess.c?).

Move all the copy_user_generic_unrolled/string garbage there too, and
keep the header file simple.

Because I think that we would actually be better off trying to inline
the copy_user_generic_string() thing into the various versions (in
that uaccess.c file) than try to inline the access_ok() check into the
caller.

                           Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-12 21:13 ` Linus Torvalds
@ 2015-05-12 21:53   ` Borislav Petkov
  2015-05-13  9:52     ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-05-12 21:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Tue, May 12, 2015 at 02:13:33PM -0700, Linus Torvalds wrote:
> On Tue, May 12, 2015 at 1:57 PM, Borislav Petkov <bp@alien8.de> wrote:
> >
> > I want to get rid of the asm glue in arch/x86/lib/copy_user_64.S which
> > prepares the copy_user* alternatives calls. And replace it with nice and
> > clean C.
> 
> Ack. I'm not a fan of the x86-64 usercopy funmctions.
> 
> That said, I think you should uninline those things, and move them
> from a header file to a C file (arch/x86/lib/uaccess.c?).

Ok.

> Move all the copy_user_generic_unrolled/string garbage there too, and
> keep the header file simple.

Those are just forward declarations for the asm functions in
arch/x86/lib/copy_user_64.S but yeah, I'll do some experimenting.

Just to make sure - I'm not getting rid of the different asm string copy
versions in copy_user_64.S - just the _copy_from_user/_copy_to_user
stubs at the beginning of that file as that gunk can be replaced with
calls with inlined copy_user_generic() workhorse.

The alternatives give us directly then

	CALL <optimal asm version>

which is as optimal as it gets.

> Because I think that we would actually be better off trying to inline
> the copy_user_generic_string() thing into the various versions (in
> that uaccess.c file) than try to inline the access_ok() check into the
> caller.

Right.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-12 20:57 [RFC PATCH] Drop some asm from copy_user_64.S Borislav Petkov
  2015-05-12 21:13 ` Linus Torvalds
@ 2015-05-13  6:19 ` Ingo Molnar
  2015-05-13 10:28   ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-05-13  6:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> Hi guys,
> 
> this is just an RFC first to sanity-check what I'm trying to do:
> 
> I want to get rid of the asm glue in arch/x86/lib/copy_user_64.S which
> prepares the copy_user* alternatives calls. And replace it with nice and
> clean C.
> 
> The other intention is to switch to using copy_user_generic() which does
> CALL <copy_user_function> directly instead of as it is now with CALL
> _copy_*_user and inside the JMP to the proper <copy_user_function>,
> i.e., to save us that JMP.
> 
> I'm not 100% sure about the equivalence between the addition carry and
> segment limit check we're doing in asm in arch/x86/lib/copy_user_64.S
> now and with the access_ok() I've replaced it with.
> 
> I mean, it *looks* like access_ok() and __chk_range_not_ok() especially
> does the proper checks - addition carry and segment limit with
> user_addr_max() but I'd like for someone much more experienced than me
> to double-check that.
> 
> So, without much further ado, here is the diff. It looks simple enough...

Looks nice. Would be useful to do before/after analysis of the 
generated asm with a defconfig and document that in the changelog.

I'd keep any changes to inlining decisions a separate patch and do 
vmlinux before/after size analysis as well, so that we don't mix the 
effects of the various enhancements.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-12 21:53   ` Borislav Petkov
@ 2015-05-13  9:52     ` Borislav Petkov
  2015-05-13 10:31       ` Ingo Molnar
  2015-05-13 16:02       ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2015-05-13  9:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Tue, May 12, 2015 at 11:53:20PM +0200, Borislav Petkov wrote:
> > That said, I think you should uninline those things, and move them
> > from a header file to a C file (arch/x86/lib/uaccess.c?).

It is starting to look better wrt size:

x86_64_defconfig:

		   text    data     bss     dec     hex filename
before: 	12375798        1812800 1085440 15274038         e91036 vmlinux
after:		12269658        1812800 1085440 15167898         e7719a vmlinux

Now we CALL _copy_*_user which does CALL the optimal alternative
version. Advantage is that we're saving some space and alternatives
application for copy_user* is being done in less places, i.e.
arch/x86/lib/uaccess_64.c. If I move all copy_user_generic() callers
there, it would be the only compilation unit where the alternatives will
be done.

The disadvantage is that we have CALL after CALL and I wanted to have a
single CALL directly to the optimal copy_user function. That'll cost us
space, though, and more alternatives sites to patch during boot...

Thoughts?

Here's the first step only:

---
 arch/x86/include/asm/uaccess.h    |  7 ++-----
 arch/x86/include/asm/uaccess_64.h | 44 ++++-----------------------------------
 arch/x86/lib/Makefile             |  2 +-
 arch/x86/lib/uaccess_64.c         | 42 +++++++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+), 46 deletions(-)
 create mode 100644 arch/x86/lib/uaccess_64.c

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 098f3fd5cc75..bdd5234fe9a3 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -642,11 +642,8 @@ extern struct movsl_mask {
 # include <asm/uaccess_64.h>
 #endif
 
-extern __always_inline __must_check
-int _copy_from_user(void *dst, const void __user *src, unsigned size);
-
-extern __always_inline __must_check
-int _copy_to_user(void __user *dst, const void *src, unsigned size);
+extern __must_check int _copy_from_user(void *dst, const void __user *src, unsigned size);
+extern __must_check int _copy_to_user(void __user *dst, const void *src, unsigned size);
 
 #ifdef CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
 # define copy_user_diag __compiletime_error
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 1aebc658acf9..440ba6b91c86 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -23,31 +23,11 @@ copy_user_generic_string(void *to, const void *from, unsigned len);
 __must_check unsigned long
 copy_user_generic_unrolled(void *to, const void *from, unsigned len);
 
-static __always_inline __must_check unsigned long
-copy_user_generic(void *to, const void *from, unsigned len)
-{
-	unsigned ret;
-
-	/*
-	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
-	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
-	 * Otherwise, use copy_user_generic_unrolled.
-	 */
-	alternative_call_2(copy_user_generic_unrolled,
-			 copy_user_generic_string,
-			 X86_FEATURE_REP_GOOD,
-			 copy_user_enhanced_fast_string,
-			 X86_FEATURE_ERMS,
-			 ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
-				     "=d" (len)),
-			 "1" (to), "2" (from), "3" (len)
-			 : "memory", "rcx", "r8", "r9", "r10", "r11");
-	return ret;
-}
-
 __must_check unsigned long
 copy_in_user(void __user *to, const void __user *from, unsigned len);
 
+extern __must_check unsigned long copy_user_generic(void *to, const void *from, unsigned len);
+
 static __always_inline __must_check
 int __copy_from_user_nocheck(void *dst, const void __user *src, unsigned size)
 {
@@ -98,16 +78,7 @@ int __copy_from_user(void *dst, const void __user *src, unsigned size)
 	return __copy_from_user_nocheck(dst, src, size);
 }
 
-static __always_inline __must_check
-int _copy_from_user(void *dst, const void __user *src, unsigned size)
-{
-	if (!access_ok(VERIFY_READ, src, size)) {
-		memset(dst, 0, size);
-		return 0;
-	}
-
-	return copy_user_generic(dst, src, size);
-}
+extern __must_check int _copy_from_user(void *dst, const void __user *src, unsigned size);
 
 static __always_inline __must_check
 int __copy_to_user_nocheck(void __user *dst, const void *src, unsigned size)
@@ -159,14 +130,7 @@ int __copy_to_user(void __user *dst, const void *src, unsigned size)
 	return __copy_to_user_nocheck(dst, src, size);
 }
 
-static __always_inline __must_check
-int _copy_to_user(void __user *dst, const void *src, unsigned size)
-{
-	if (!access_ok(VERIFY_WRITE, dst, size))
-		return size;
-
-	return copy_user_generic(dst, src, size);
-}
+__must_check int _copy_to_user(void __user *dst, const void *src, unsigned size);
 
 static __always_inline __must_check
 int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 982989d282ff..1885cc5eee32 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -40,6 +40,6 @@ else
         lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
         lib-y += clear_page_64.o copy_page_64.o
         lib-y += memmove_64.o memset_64.o
-        lib-y += copy_user_64.o
+        lib-y += copy_user_64.o uaccess_64.o
 	lib-y += cmpxchg16b_emu.o
 endif
diff --git a/arch/x86/lib/uaccess_64.c b/arch/x86/lib/uaccess_64.c
new file mode 100644
index 000000000000..6cd15d874ab4
--- /dev/null
+++ b/arch/x86/lib/uaccess_64.c
@@ -0,0 +1,42 @@
+#include <asm/uaccess.h>
+
+__always_inline __must_check unsigned long
+copy_user_generic(void *to, const void *from, unsigned len)
+{
+	unsigned ret;
+
+	/*
+	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
+	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
+	 * Otherwise, use copy_user_generic_unrolled.
+	 */
+	alternative_call_2(copy_user_generic_unrolled,
+			 copy_user_generic_string,
+			 X86_FEATURE_REP_GOOD,
+			 copy_user_enhanced_fast_string,
+			 X86_FEATURE_ERMS,
+			 ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
+				     "=d" (len)),
+			 "1" (to), "2" (from), "3" (len)
+			 : "memory", "rcx", "r8", "r9", "r10", "r11");
+	return ret;
+}
+EXPORT_SYMBOL_GPL(copy_user_generic);
+
+__must_check int _copy_from_user(void *dst, const void __user *src, unsigned size)
+{
+	if (!access_ok(VERIFY_READ, src, size)) {
+		memset(dst, 0, size);
+		return 0;
+	}
+
+	return copy_user_generic(dst, src, size);
+}
+
+__must_check int _copy_to_user(void __user *dst, const void *src, unsigned size)
+{
+	if (!access_ok(VERIFY_WRITE, dst, size))
+		return size;
+
+	return copy_user_generic(dst, src, size);
+}
-- 
2.3.5


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13  6:19 ` Ingo Molnar
@ 2015-05-13 10:28   ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2015-05-13 10:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Thomas Gleixner, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko, lkml

On Wed, May 13, 2015 at 08:19:55AM +0200, Ingo Molnar wrote:
> Looks nice. Would be useful to do before/after analysis of the
> generated asm with a defconfig and document that in the changelog.

Right, so I'm looking at what we have now:

/* Standard copy_to_user with segment limit checking */
ENTRY(_copy_to_user)
	CFI_STARTPROC
	GET_THREAD_INFO(%rax)
	movq %rdi,%rcx
	addq %rdx,%rcx
	jc bad_to_user
	cmpq TI_addr_limit(%rax),%rcx
	ja bad_to_user

This is adding @to (in %rdi) with size (in %rdx) and then looking at the
carry flag. __chk_range_not_ok() does the same thing, but with a single
operation, AFAICT:

static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit)
{
        /*
         * If we have used "sizeof()" for the size,
         * we know it won't overflow the limit (but
         * it might overflow the 'addr', so it's
         * important to subtract the size from the
         * limit, not add it to the address).
         */
        if (__builtin_constant_p(size))
                return addr > limit - size;

and we're avoiding the addr overflow by subtracting size from limit.

So the resulting asm looks like this:

        .file 22 "./arch/x86/include/asm/uaccess.h"
        .loc 22 54 0
        movq    -16360(%r14), %rax      # _208->addr_limit.seg, tmp347		%r14 contains thread_info
        subq    $88, %rax       #, D.37904					88 is the size

        .file 23 "./arch/x86/include/asm/uaccess_64.h"
        .loc 23 165 0
        cmpq    %rax, %r12      # D.37904, ubuf					%r12 contains the user ptr
        ja      .L493   #,
        movq    %r12, %rdi      # ubuf, to					prep args for copy_user...
        movl    $88, %edx       #, len

										alternative starts here
	#APP
	# 36 "./arch/x86/include/asm/uaccess_64.h" 1
	661:
	call copy_user_generic_unrolled	#
	....


so we end up replacing

	MOV
	ADD
	JC
	CMP
	JA
	JMP (alternative)

with

	MOV
	SUB
	CMP
	JA
	MOV
	MOV
	CALL (alternative)

The only problem I see here is that we have to do two MOVs to put args
in proper registers before calling the copy_user* version. But we end
up with a single conditional instead of two. And the MOVs are cheaper.
Also, we gets rid of asm glue, even betterer :-)

> I'd keep any changes to inlining decisions a separate patch and do
> vmlinux before/after size analysis as well, so that we don't mix the
> effects of the various enhancements.

Yap.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13  9:52     ` Borislav Petkov
@ 2015-05-13 10:31       ` Ingo Molnar
  2015-05-13 10:43         ` Borislav Petkov
  2015-05-13 16:02       ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-05-13 10:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> On Tue, May 12, 2015 at 11:53:20PM +0200, Borislav Petkov wrote:
> > > That said, I think you should uninline those things, and move them
> > > from a header file to a C file (arch/x86/lib/uaccess.c?).
> 
> It is starting to look better wrt size:
> 
> x86_64_defconfig:
> 
> 		   text    data     bss     dec     hex filename
> before: 	12375798        1812800 1085440 15274038         e91036 vmlinux
> after:		12269658        1812800 1085440 15167898         e7719a vmlinux
> 
> Now we CALL _copy_*_user which does CALL the optimal alternative 
> version. Advantage is that we're saving some space and alternatives 
> application for copy_user* is being done in less places, i.e. 
> arch/x86/lib/uaccess_64.c. If I move all copy_user_generic() callers 
> there, it would be the only compilation unit where the alternatives 
> will be done.
> 
> The disadvantage is that we have CALL after CALL and I wanted to 
> have a single CALL directly to the optimal copy_user function. 
> That'll cost us space, though, and more alternatives sites to patch 
> during boot...
> 
> Thoughts?

So why should an alternatives-CALL, inlined directly into call sites, 
cost more kernel space?

It should only be some more metadata, but that's outside any hot path 
and would be freed on init. The actual hot instructions should be 
exactly the same as a regular call, minus the double CALL indirection.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13 10:31       ` Ingo Molnar
@ 2015-05-13 10:43         ` Borislav Petkov
  2015-05-13 10:46           ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-05-13 10:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Wed, May 13, 2015 at 12:31:40PM +0200, Ingo Molnar wrote:
> So why should an alternatives-CALL, inlined directly into call sites,
> cost more kernel space?

Not the alternatives CALL alone but inlining _copy_*_user with all the
preparation glue around it would. Basically what we're doing currently.

See how uninlining copy_user_generic() dropped text size from 12375798 to
12269658.

But let me do some more staring...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13 10:43         ` Borislav Petkov
@ 2015-05-13 10:46           ` Ingo Molnar
  2015-05-13 11:16             ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-05-13 10:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> On Wed, May 13, 2015 at 12:31:40PM +0200, Ingo Molnar wrote:
> > So why should an alternatives-CALL, inlined directly into call sites,
> > cost more kernel space?
> 
> Not the alternatives CALL alone but inlining _copy_*_user with all 
> the preparation glue around it would. Basically what we're doing 
> currently.

So I reacted to this comment of yours:

> > > The disadvantage is that we have CALL after CALL [...]

Is the CALL after CALL caused by us calling an alternatives patched 
function? If yes then we probably should not do that: alternatives 
switching should IMHO happen at the highest possible level.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13 10:46           ` Ingo Molnar
@ 2015-05-13 11:16             ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2015-05-13 11:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Wed, May 13, 2015 at 12:46:30PM +0200, Ingo Molnar wrote:
> 
> * Borislav Petkov <bp@alien8.de> wrote:
> 
> > On Wed, May 13, 2015 at 12:31:40PM +0200, Ingo Molnar wrote:
> > > So why should an alternatives-CALL, inlined directly into call sites,
> > > cost more kernel space?
> > 
> > Not the alternatives CALL alone but inlining _copy_*_user with all 
> > the preparation glue around it would. Basically what we're doing 
> > currently.
> 
> So I reacted to this comment of yours:
> 
> > > > The disadvantage is that we have CALL after CALL [...]
> 
> Is the CALL after CALL caused by us calling an alternatives patched 
> function? If yes then we probably should not do that: alternatives 
> switching should IMHO happen at the highest possible level.

Right, so I was trying to analyze Linus' suggestion to uninline stuff
and put it in uaccess_64.c. And that does save us some size and
alternatives patch sites but produces the CALL ... CALL thing.

So let me show you what we have now:

ffffffff8102a774:       0f 1f 40 00             nopl   0x0(%rax)
ffffffff8102a778:       ba 58 00 00 00          mov    $0x58,%edx
ffffffff8102a77d:       4c 89 ff                mov    %r15,%rdi
ffffffff8102a780:       49 83 c7 58             add    $0x58,%r15
ffffffff8102a784:       e8 b7 19 2f 00          callq  ffffffff8131c140 <_copy_to_user>

...

ffffffff8131c140 <_copy_to_user>:
ffffffff8131c140:       65 48 8b 04 25 88 a9    mov    %gs:0xa988,%rax
ffffffff8131c147:       00 00 
ffffffff8131c149:       48 2d 00 40 00 00       sub    $0x4000,%rax
ffffffff8131c14f:       48 89 f9                mov    %rdi,%rcx
ffffffff8131c152:       48 01 d1                add    %rdx,%rcx
ffffffff8131c155:       0f 82 bb c5 57 00       jb     ffffffff81898716 <bad_to_user>
ffffffff8131c15b:       48 3b 48 18             cmp    0x18(%rax),%rcx
ffffffff8131c15f:       0f 87 b1 c5 57 00       ja     ffffffff81898716 <bad_to_user>
ffffffff8131c165:       e9 36 00 00 00          jmpq   ffffffff8131c1a0 <copy_user_generic_unrolled>
ffffffff8131c16a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

so we prep args, call _copy_to_user, do checks and then JMP to the
optimal alternative function.

What I'd like to do is (hypothetically copy'pasted together):

ffffffff8102a778:       ba 58 00 00 00          mov    $0x58,%edx
ffffffff8102a77d:       4c 89 ff                mov    %r15,%rdi

        					movq    -16360(%r14), %rax      # _208->addr_limit.seg, tmp347
					        subq    $88, %rax       #, D.37904
        					cmpq    %rax, %r15      # D.37904, ubuf
					        ja      .L493   #,
        					call copy_user_generic_unrolled #

which saves us the first CALL to _copy_to_user and we do the
alternatives <copy_user_generic_unrolled> CALL directly.

This would mean that we will have to inline _copy_*_user() and switch it
to use copy_user_generic() which already does the proper alternatives.

For the price of some minor size increase and more alternatives patch
sites.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13  9:52     ` Borislav Petkov
  2015-05-13 10:31       ` Ingo Molnar
@ 2015-05-13 16:02       ` Linus Torvalds
  2015-05-14  9:36         ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2015-05-13 16:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Wed, May 13, 2015 at 2:52 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> Now we CALL _copy_*_user which does CALL the optimal alternative
> version. Advantage is that we're saving some space and alternatives
> application for copy_user* is being done in less places, i.e.
> arch/x86/lib/uaccess_64.c. If I move all copy_user_generic() callers
> there, it would be the only compilation unit where the alternatives will
> be done.
>
> The disadvantage is that we have CALL after CALL and I wanted to have a
> single CALL directly to the optimal copy_user function. That'll cost us
> space, though, and more alternatives sites to patch during boot...
>
> Thoughts?

So I think we should do this first call-to-call thing, because it
makes it easier to go to the second step: replace the final call with
a asm-alternative that just puts the "rep movsb" inline for the (more
and more common) case of X86_FEATURE_ERMS.

The nice thing about using "rep movsb" for the user copy is that not
only is it fairly close to optimal (for non-constant sizes) on newer
Intel CPU's, but the fixup is also trivial. So we really should inline
it. Just look at it: the copy_user_enhanced_fast_string function is
literally just three 2-byte instructions right now:

    mov    %edx,%ecx
    rep movsb
    xor    %eax,%eax

and the rest is just the exception table thing.

(And yes, there's the STAC/CLAC thing around it, but I think that
should just be moved into _copy_from/to_user() too, since *all* of the
copy_user_generic() cases need it).

Yeah, yeah, we'd still do the double call thing for the more complex
cases of the unrolled copy loop or the "movsq + tail" cases, but those
are at least big enough that it makes sense. And they are presumably
getting less common anyway.

                          Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Drop some asm from copy_user_64.S
  2015-05-13 16:02       ` Linus Torvalds
@ 2015-05-14  9:36         ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2015-05-14  9:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Andy Lutomirski,
	Denys Vlasenko, lkml

On Wed, May 13, 2015 at 09:02:41AM -0700, Linus Torvalds wrote:
> The nice thing about using "rep movsb" for the user copy is that not
> only is it fairly close to optimal (for non-constant sizes) on newer
> Intel CPU's, but the fixup is also trivial. So we really should inline
> it. Just look at it: the copy_user_enhanced_fast_string function is
> literally just three 2-byte instructions right now:
> 
>     mov    %edx,%ecx
>     rep movsb
>     xor    %eax,%eax
> 
> and the rest is just the exception table thing.

Yeah, so I thought about it for a while and yeah, those labels there
would be a problem. Because you have this:

      mov    %edx,%ecx
1:    rep movsb
      xor    %eax,%eax

and _ASM_EXTABLE adds the .fixup section entry in the form of relative
offsets.

So I *think* it would work if we make the REP;STOSB case the default
one, i.e. those insns get issued during build. Then the labels will be
fine and all is good.

When they have to get patched probably with a CALL to the other
variants, the label 1: above (or rather the fixup entry) will point to
the newly patched instruction which, if it faults, might get fixed up
erroneously.

Hmm, let me give it a try - I'll have a better idea after I've done it.

> (And yes, there's the STAC/CLAC thing around it, but I think that
> should just be moved into _copy_from/to_user() too, since *all* of the
> copy_user_generic() cases need it).

Yeah.

> Yeah, yeah, we'd still do the double call thing for the more complex
> cases of the unrolled copy loop or the "movsq + tail" cases, but those
> are at least big enough that it makes sense. And they are presumably
> getting less common anyway.

Right, so we can avoid the first CALL if I inline copy_user_generic()
which practically inlines the alternative directly.

Lemme play with it a little...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-14  9:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 20:57 [RFC PATCH] Drop some asm from copy_user_64.S Borislav Petkov
2015-05-12 21:13 ` Linus Torvalds
2015-05-12 21:53   ` Borislav Petkov
2015-05-13  9:52     ` Borislav Petkov
2015-05-13 10:31       ` Ingo Molnar
2015-05-13 10:43         ` Borislav Petkov
2015-05-13 10:46           ` Ingo Molnar
2015-05-13 11:16             ` Borislav Petkov
2015-05-13 16:02       ` Linus Torvalds
2015-05-14  9:36         ` Borislav Petkov
2015-05-13  6:19 ` Ingo Molnar
2015-05-13 10:28   ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.