All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB
@ 2011-05-17 22:29 Fenghua Yu
  2011-05-17 22:29 ` [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature Fenghua Yu
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/STOSB).

Intel 64 and IA-32 SDM and performance optimization guide may include more model
specific information and how the feature can be used. The documents will be
published soon.

Fenghua Yu (9):
  x86, cpu: Enable enhanced REP MOVSB/STOSB feature
  x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced
  x86/kernel/alternative.c: Add comment for applying alternatives order
  x86, alternative-asm.h: Add altinstruction_entry macro
  x86/lib/clear_page_64.S: Support clear_page() with enhanced REP
    MOVSB/STOSB
  x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by
    enhanced REP MOVSB/STOSB
  x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
  x86/lib/memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
  x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB

 arch/x86/include/asm/alternative-asm.h |    9 ++++
 arch/x86/include/asm/cpufeature.h      |    1 +
 arch/x86/kernel/alternative.c          |    9 ++++
 arch/x86/kernel/cpu/intel.c            |   19 +++++++--
 arch/x86/lib/clear_page_64.S           |   33 ++++++++++++----
 arch/x86/lib/copy_user_64.S            |   65 +++++++++++++++++++++++++++-----
 arch/x86/lib/memcpy_64.S               |   45 ++++++++++++++++------
 arch/x86/lib/memmove_64.S              |   29 ++++++++++++++-
 arch/x86/lib/memset_64.S               |   54 ++++++++++++++++++++------
 9 files changed, 215 insertions(+), 49 deletions(-)

-- 
1.7.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-17 23:13   ` [tip:x86/cpufeature] x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced Fenghua Yu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 50c0d30..30afb46 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -195,6 +195,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
 #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
+#define X86_FEATURE_ERMS	(9*32+ 9) /* Enhanced REP MOVSB/STOSB */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
  2011-05-17 22:29 ` [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18  2:46   ` Andi Kleen
  2011-05-18 20:40   ` [tip:perf/core] x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 3/9] x86/kernel/alternative.c: Add comment for applying alternatives order Fenghua Yu
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H, ECX=0H):
EBX[bit 9] also reports 1.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 32e86aa..1edf5ba 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -29,10 +29,10 @@
 
 static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 {
+	u64 misc_enable;
+
 	/* Unmask CPUID levels if masked: */
 	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
-		u64 misc_enable;
-
 		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
 
 		if (misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID) {
@@ -118,8 +118,6 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 	 * (model 2) with the same problem.
 	 */
 	if (c->x86 == 15) {
-		u64 misc_enable;
-
 		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
 
 		if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) {
@@ -130,6 +128,19 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 		}
 	}
 #endif
+
+	/*
+	 * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
+	 * clear the fast string and enhanced fast string CPU capabilities.
+	 */
+	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
+		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
+		if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) {
+			printk(KERN_INFO "Disabled fast string operations\n");
+			setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
+			setup_clear_cpu_cap(X86_FEATURE_ERMS);
+		}
+	}
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/9] x86/kernel/alternative.c: Add comment for applying alternatives order
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
  2011-05-17 22:29 ` [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature Fenghua Yu
  2011-05-17 22:29 ` [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18 20:40   ` [tip:perf/core] x86, alternative, doc: " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 4/9] x86, alternative-asm.h: Add altinstruction_entry macro Fenghua Yu
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Some string operation functions may be patched twice, e.g. on enhanced REP MOVSB
/STOSB processors, memcpy is patched first by fast string alternative function,
then it is patched by enhanced REP MOVSB/STOSB alternative function.

Add comment for applying alternatives order to warn people who may change the
applying alternatives order for any reason.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/alternative.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c0501ea..a81f2d5 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -266,6 +266,15 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	u8 insnbuf[MAX_PATCH_LEN];
 
 	DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
+	/*
+	 * The scan order should be from start to end. A later scanned
+	 * alternative code can overwrite a previous scanned alternative code.
+	 * Some kernel functions (e.g. memcpy, memset, etc) use this order to
+	 * patch code.
+	 *
+	 * So be careful if you want to change the scan order to any other
+	 * order.
+	 */
 	for (a = start; a < end; a++) {
 		u8 *instr = a->instr;
 		BUG_ON(a->replacementlen > a->instrlen);
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/9] x86, alternative-asm.h: Add altinstruction_entry macro
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (2 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 3/9] x86/kernel/alternative.c: Add comment for applying alternatives order Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18 20:41   ` [tip:perf/core] x86, alternative: " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 5/9] x86/lib/clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Add altinstruction_entry macro which will be used in .altinstructions section.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/alternative-asm.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index a63a68b..94d420b 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -15,4 +15,13 @@
 	.endm
 #endif
 
+.macro altinstruction_entry orig alt feature orig_len alt_len
+	.align 8
+	.quad \orig
+	.quad \alt
+	.word \feature
+	.byte \orig_len
+	.byte \alt_len
+.endm
+
 #endif  /*  __ASSEMBLY__  */
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/9] x86/lib/clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (3 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 4/9] x86, alternative-asm.h: Add altinstruction_entry macro Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18 20:41   ` [tip:perf/core] x86, mem: clear_page_64.S: " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 6/9] x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by " Fenghua Yu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/lib/clear_page_64.S |   33 ++++++++++++++++++++++++---------
 1 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index aa4326b..0e109d3 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -1,5 +1,6 @@
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
 
 /*
  * Zero a page. 	
@@ -14,6 +15,15 @@ ENTRY(clear_page_c)
 	CFI_ENDPROC
 ENDPROC(clear_page_c)
 
+ENTRY(clear_page_c_e)
+	CFI_STARTPROC
+	movl $4096,%ecx
+	xorl %eax,%eax
+	rep stosb
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_c_e)
+
 ENTRY(clear_page)
 	CFI_STARTPROC
 	xorl   %eax,%eax
@@ -38,21 +48,26 @@ ENTRY(clear_page)
 .Lclear_page_end:
 ENDPROC(clear_page)
 
-	/* Some CPUs run faster using the string instructions.
-	   It is also a lot simpler. Use this when possible */
+	/*
+	 * Some CPUs support enhanced REP MOVSB/STOSB instructions.
+	 * It is recommended to use this when possible.
+	 * If enhanced REP MOVSB/STOSB is not available, try to use fast string.
+	 * Otherwise, use original function.
+	 *
+	 */
 
 #include <asm/cpufeature.h>
 
 	.section .altinstr_replacement,"ax"
 1:	.byte 0xeb					/* jmp <disp8> */
 	.byte (clear_page_c - clear_page) - (2f - 1b)	/* offset */
-2:
+2: 	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_c_e - clear_page) - (3f - 2b)	/* offset */
+3:
 	.previous
 	.section .altinstructions,"a"
-	.align 8
-	.quad clear_page
-	.quad 1b
-	.word X86_FEATURE_REP_GOOD
-	.byte .Lclear_page_end - clear_page
-	.byte 2b - 1b
+	altinstruction_entry clear_page,1b,X86_FEATURE_REP_GOOD,\
+			     .Lclear_page_end-clear_page, 2b-1b
+	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
+			     .Lclear_page_end-clear_page,3b-2b
 	.previous
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/9] x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (4 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 5/9] x86/lib/clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18 20:42   ` [tip:perf/core] x86, mem: copy_user_64.S: Support copy_to/from_user " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy " Fenghua Yu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/lib/copy_user_64.S |   65 ++++++++++++++++++++++++++++++++++++------
 1 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 99e4826..d17a117 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -15,23 +15,30 @@
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
 #include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
 
-	.macro ALTERNATIVE_JUMP feature,orig,alt
+/*
+ * By placing feature2 after feature1 in altinstructions section, we logically
+ * implement:
+ * If CPU has feature2, jmp to alt2 is used
+ * else if CPU has feature1, jmp to alt1 is used
+ * else jmp to orig is used.
+ */
+	.macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
 0:
 	.byte 0xe9	/* 32bit jump */
 	.long \orig-1f	/* by default jump to orig */
 1:
 	.section .altinstr_replacement,"ax"
 2:	.byte 0xe9			/* near jump with 32bit immediate */
-	.long \alt-1b /* offset */   /* or alternatively to alt */
+	.long \alt1-1b /* offset */   /* or alternatively to alt1 */
+3:	.byte 0xe9			/* near jump with 32bit immediate */
+	.long \alt2-1b /* offset */   /* or alternatively to alt2 */
 	.previous
+
 	.section .altinstructions,"a"
-	.align 8
-	.quad  0b
-	.quad  2b
-	.word  \feature			/* when feature is set */
-	.byte  5
-	.byte  5
+	altinstruction_entry 0b,2b,\feature1,5,5
+	altinstruction_entry 0b,3b,\feature2,5,5
 	.previous
 	.endm
 
@@ -73,7 +80,9 @@ ENTRY(_copy_to_user)
 	jc bad_to_user
 	cmpq TI_addr_limit(%rax),%rcx
 	jae bad_to_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
+		copy_user_generic_unrolled,copy_user_generic_string,	\
+		copy_user_enhanced_fast_string
 	CFI_ENDPROC
 ENDPROC(_copy_to_user)
 
@@ -86,7 +95,9 @@ ENTRY(_copy_from_user)
 	jc bad_from_user
 	cmpq TI_addr_limit(%rax),%rcx
 	jae bad_from_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
+		copy_user_generic_unrolled,copy_user_generic_string,	\
+		copy_user_enhanced_fast_string
 	CFI_ENDPROC
 ENDPROC(_copy_from_user)
 
@@ -255,3 +266,37 @@ ENTRY(copy_user_generic_string)
 	.previous
 	CFI_ENDPROC
 ENDPROC(copy_user_generic_string)
+
+/*
+ * Some CPUs are adding enhanced REP MOVSB/STOSB instructions.
+ * It's recommended to use enhanced REP MOVSB/STOSB if it's enabled.
+ *
+ * Input:
+ * rdi destination
+ * rsi source
+ * rdx count
+ *
+ * Output:
+ * eax uncopied bytes or 0 if successful.
+ */
+ENTRY(copy_user_enhanced_fast_string)
+	CFI_STARTPROC
+	andl %edx,%edx
+	jz 2f
+	movl %edx,%ecx
+1:	rep
+	movsb
+2:	xorl %eax,%eax
+	ret
+
+	.section .fixup,"ax"
+12:	movl %ecx,%edx		/* ecx is zerorest also */
+	jmp copy_user_handle_tail
+	.previous
+
+	.section __ex_table,"a"
+	.align 8
+	.quad 1b,12b
+	.previous
+	CFI_ENDPROC
+ENDPROC(copy_user_enhanced_fast_string)
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (5 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 6/9] x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by " Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18  6:35   ` Ingo Molnar
  2011-05-18 20:42   ` [tip:perf/core] x86, mem: memcpy_64.S: " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 8/9] x86/lib/memmove_64.S: Optimize memmove " Fenghua Yu
  2011-05-17 22:29 ` [PATCH 9/9] x86/lib/memset_64.S: Optimize memset " Fenghua Yu
  8 siblings, 2 replies; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/lib/memcpy_64.S |   45 ++++++++++++++++++++++++++++++++-------------
 1 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 2a560bb..efbf2a0 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,6 +4,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
 
 /*
  * memcpy - Copy a memory block.
@@ -37,6 +38,23 @@
 .Lmemcpy_e:
 	.previous
 
+/*
+ * memcpy_c_e() - enhanced fast string memcpy. This is faster and simpler than
+ * memcpy_c. Use memcpy_c_e when possible.
+ *
+ * This gets patched over the unrolled variant (below) via the
+ * alternative instructions framework:
+ */
+	.section .altinstr_replacement, "ax", @progbits
+.Lmemcpy_c_e:
+	movq %rdi, %rax
+
+	movl %edx, %ecx
+	rep movsb
+	ret
+.Lmemcpy_e_e:
+	.previous
+
 ENTRY(__memcpy)
 ENTRY(memcpy)
 	CFI_STARTPROC
@@ -171,21 +189,22 @@ ENDPROC(memcpy)
 ENDPROC(__memcpy)
 
 	/*
-	 * Some CPUs run faster using the string copy instructions.
-	 * It is also a lot simpler. Use this when possible:
-	 */
-
-	.section .altinstructions, "a"
-	.align 8
-	.quad memcpy
-	.quad .Lmemcpy_c
-	.word X86_FEATURE_REP_GOOD
-
-	/*
+	 * Some CPUs are adding enhanced REP MOVSB/STOSB feature
+	 * If the feature is supported, memcpy_c_e() is the first choice.
+	 * If enhanced rep movsb copy is not available, use fast string copy
+	 * memcpy_c() when possible. This is faster and code is simpler than
+	 * original memcpy().
+	 * Otherwise, original memcpy() is used.
+	 * In .altinstructions section, ERMS feature is placed after REG_GOOD
+         * feature to implement the right patch order.
+	 *
 	 * Replace only beginning, memcpy is used to apply alternatives,
 	 * so it is silly to overwrite itself with nops - reboot is the
 	 * only outcome...
 	 */
-	.byte .Lmemcpy_e - .Lmemcpy_c
-	.byte .Lmemcpy_e - .Lmemcpy_c
+	.section .altinstructions, "a"
+	altinstruction_entry memcpy,.Lmemcpy_c,X86_FEATURE_REP_GOOD,\
+			     .Lmemcpy_e-.Lmemcpy_c,.Lmemcpy_e-.Lmemcpy_c
+	altinstruction_entry memcpy,.Lmemcpy_c_e,X86_FEATURE_ERMS, \
+			     .Lmemcpy_e_e-.Lmemcpy_c_e,.Lmemcpy_e_e-.Lmemcpy_c_e
 	.previous
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/9] x86/lib/memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (6 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy " Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18 20:43   ` [tip:perf/core] x86, mem: memmove_64.S: " tip-bot for Fenghua Yu
  2011-05-17 22:29 ` [PATCH 9/9] x86/lib/memset_64.S: Optimize memset " Fenghua Yu
  8 siblings, 1 reply; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change backward memmove case with enhanced rep movsb.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/lib/memmove_64.S |   29 ++++++++++++++++++++++++++++-
 1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 0ecb843..d0ec9c2 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -8,6 +8,7 @@
 #define _STRING_C
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
 
 #undef memmove
 
@@ -24,6 +25,7 @@
  */
 ENTRY(memmove)
 	CFI_STARTPROC
+
 	/* Handle more 32bytes in loop */
 	mov %rdi, %rax
 	cmp $0x20, %rdx
@@ -31,8 +33,13 @@ ENTRY(memmove)
 
 	/* Decide forward/backward copy mode */
 	cmp %rdi, %rsi
-	jb	2f
+	jge .Lmemmove_begin_forward
+	mov %rsi, %r8
+	add %rdx, %r8
+	cmp %rdi, %r8
+	jg 2f
 
+.Lmemmove_begin_forward:
 	/*
 	 * movsq instruction have many startup latency
 	 * so we handle small size by general register.
@@ -78,6 +85,8 @@ ENTRY(memmove)
 	rep movsq
 	movq %r11, (%r10)
 	jmp 13f
+.Lmemmove_end_forward:
+
 	/*
 	 * Handle data backward by movsq.
 	 */
@@ -194,4 +203,22 @@ ENTRY(memmove)
 13:
 	retq
 	CFI_ENDPROC
+
+	.section .altinstr_replacement,"ax"
+.Lmemmove_begin_forward_efs:
+	/* Forward moving data. */
+	movq %rdx, %rcx
+	rep movsb
+	retq
+.Lmemmove_end_forward_efs:
+	.previous
+
+	.section .altinstructions,"a"
+	.align 8
+	.quad .Lmemmove_begin_forward
+	.quad .Lmemmove_begin_forward_efs
+	.word X86_FEATURE_ERMS
+	.byte .Lmemmove_end_forward-.Lmemmove_begin_forward
+	.byte .Lmemmove_end_forward_efs-.Lmemmove_begin_forward_efs
+	.previous
 ENDPROC(memmove)
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
                   ` (7 preceding siblings ...)
  2011-05-17 22:29 ` [PATCH 8/9] x86/lib/memmove_64.S: Optimize memmove " Fenghua Yu
@ 2011-05-17 22:29 ` Fenghua Yu
  2011-05-18  2:57   ` Andi Kleen
  2011-05-18 20:43   ` [tip:perf/core] x86, mem: memset_64.S: " tip-bot for Fenghua Yu
  8 siblings, 2 replies; 30+ messages in thread
From: Fenghua Yu @ 2011-05-17 22:29 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen
  Cc: linux-kernel, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/lib/memset_64.S |   54 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 09d3442..79bd454 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -2,9 +2,13 @@
 
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
 
 /*
- * ISO C memset - set a memory block to a byte value.
+ * ISO C memset - set a memory block to a byte value. This function uses fast
+ * string to get better performance than the original function. The code is
+ * simpler and shorter than the orignal function as well.
  *	
  * rdi   destination
  * rsi   value (char) 
@@ -31,6 +35,28 @@
 .Lmemset_e:
 	.previous
 
+/*
+ * ISO C memset - set a memory block to a byte value. This function uses
+ * enhanced rep stosb to override the fast string function.
+ * The code is simpler and shorter than the fast string function as well.
+ *
+ * rdi   destination
+ * rsi   value (char)
+ * rdx   count (bytes)
+ *
+ * rax   original destination
+ */
+	.section .altinstr_replacement, "ax", @progbits
+.Lmemset_c_e:
+	movq %rdi,%r9
+	movb %sil,%al
+	movl %edx,%ecx
+	rep stosb
+	movq %r9,%rax
+	ret
+.Lmemset_e_e:
+	.previous
+
 ENTRY(memset)
 ENTRY(__memset)
 	CFI_STARTPROC
@@ -112,16 +138,20 @@ ENTRY(__memset)
 ENDPROC(memset)
 ENDPROC(__memset)
 
-	/* Some CPUs run faster using the string instructions.
-	   It is also a lot simpler. Use this when possible */
-
-#include <asm/cpufeature.h>
-
+	/* Some CPUs support enhanced REP MOVSB/STOSB feature.
+	 * It is recommended to use this when possible.
+	 *
+	 * If enhanced REP MOVSB/STOSB feature is not available, use fast string
+	 * instructions.
+	 *
+	 * Otherwise, use original memset function.
+	 *
+	 * In .altinstructions section, ERMS feature is placed after REG_GOOD
+         * feature to implement the right patch order.
+	 */
 	.section .altinstructions,"a"
-	.align 8
-	.quad memset
-	.quad .Lmemset_c
-	.word X86_FEATURE_REP_GOOD
-	.byte .Lfinal - memset
-	.byte .Lmemset_e - .Lmemset_c
+	altinstruction_entry memset,.Lmemset_c,X86_FEATURE_REP_GOOD,\
+			     .Lfinal-memset,.Lmemset_e-.Lmemset_c
+	altinstruction_entry memset,.Lmemset_c_e,X86_FEATURE_ERMS, \
+			     .Lfinal-memset,.Lmemset_e_e-.Lmemset_c_e
 	.previous
-- 
1.7.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:x86/cpufeature] x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature Fenghua Yu
@ 2011-05-17 23:13   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-17 23:13 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  724a92ee45c04cb9d82884a856b03b1e594d9de1
Gitweb:     http://git.kernel.org/tip/724a92ee45c04cb9d82884a856b03b1e594d9de1
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:10 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 14:56:36 -0700

x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/cpufeature.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 91f3e087..7f2f7b1 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -195,6 +195,7 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
 #define X86_FEATURE_FSGSBASE	(9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
+#define X86_FEATURE_ERMS	(9*32+ 9) /* Enhanced REP MOVSB/STOSB */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced
  2011-05-17 22:29 ` [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced Fenghua Yu
@ 2011-05-18  2:46   ` Andi Kleen
  2011-05-18  3:47     ` H. Peter Anvin
  2011-05-18 20:40   ` [tip:perf/core] x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB tip-bot for Fenghua Yu
  1 sibling, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2011-05-18  2:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen, linux-kernel, Fenghua Yu

> From: Fenghua Yu <fenghua.yu@intel.com>
>
> If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
> IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H,
> ECX=0H):
> EBX[bit 9] also reports 1.

I suspect the check at this place is not too useful because it will
only work for the BSP. For all others it's too late -- the patching
has already happened.

So either this is a problem and then it should be checked on all CPUs.
Or maybe not at all.

The problem is that the alternative patching currently relies on being
run early with no other CPUs. It has no race protections, support
for cross modification etc.

While it would be possible to fix that it would be quite complicated
I bet.

So I think it's better to just remove it unless it's a real problem
in the field.

-Andi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 9/9] x86/lib/memset_64.S: Optimize memset " Fenghua Yu
@ 2011-05-18  2:57   ` Andi Kleen
  2011-05-18  3:09     ` Yu, Fenghua
  2011-05-18 20:43   ` [tip:perf/core] x86, mem: memset_64.S: " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2011-05-18  2:57 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	Andi Kleen, linux-kernel, Fenghua Yu

> From: Fenghua Yu <fenghua.yu@intel.com>
>
> Support memset() with enhanced rep stosb. On processors supporting
> enhanced
> REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep
> stosb
> overrides the fast string alternative memset_c and the original function.

FWIW most memsets and memcpys are generated by modern gccs as inline code,
depending on alignment etc., so will never call your new function.
Same may be true for memmove (not fully sure)

One way to work around this would be to add suitable logic
to the string.h macros and make sure the out of line code is always
called for large copies if the count is constant and large enough.

There used to be such logic, but it was removed partly later.

The only problem is that it's hard to decide if the count is variable
and where a good threshold is.

Or maybe it would be better to just fix gcc to use the new instructions,
but then it would be difficult to patch them in.

-Andi



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18  2:57   ` Andi Kleen
@ 2011-05-18  3:09     ` Yu, Fenghua
  2011-05-18  4:05       ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Yu, Fenghua @ 2011-05-18  3:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Mallick, Asit K,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	linux-kernel

> From: Andi Kleen [mailto:andi@firstfloor.org]
> Sent: Tuesday, May 17, 2011 7:58 PM
> To: Yu, Fenghua
> Cc: Ingo Molnar; Thomas Gleixner; H Peter Anvin; Mallick, Asit K; Linus
> Torvalds; Avi Kivity; Arjan van de Ven; Andrew Morton; Andi Kleen;
> linux-kernel; Yu, Fenghua
> Subject: Re: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by
> enhanced REP MOVSB/STOSB
> 
> > From: Fenghua Yu <fenghua.yu@intel.com>
> >
> > Support memset() with enhanced rep stosb. On processors supporting
> > enhanced
> > REP MOVSB/STOSB, the alternative memset_c_e function using enhanced
> rep
> > stosb
> > overrides the fast string alternative memset_c and the original
> function.
> 
> FWIW most memsets and memcpys are generated by modern gccs as inline
> code,
> depending on alignment etc., so will never call your new function.
> Same may be true for memmove (not fully sure)
> 
> One way to work around this would be to add suitable logic
> to the string.h macros and make sure the out of line code is always
> called for large copies if the count is constant and large enough.
> 
> There used to be such logic, but it was removed partly later.
> 
> The only problem is that it's hard to decide if the count is variable
> and where a good threshold is.
> 
> Or maybe it would be better to just fix gcc to use the new
> instructions,
> but then it would be difficult to patch them in.

Only memcpy are generated by gcc when gcc version >=4.3. Other functions are defined by kernel lib.

I would leave gcc optimization for most memcpy cases instead of forcing memcpy to call the kernel lib memcpy. I hope gcc will catch up and implement a good enhanced rep movsb/stosb solution soon. If turns out gcc can not generate good memcpy, it's easy to switch to the patching kernel lib memcpy.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced
  2011-05-18  2:46   ` Andi Kleen
@ 2011-05-18  3:47     ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2011-05-18  3:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Fenghua Yu, Ingo Molnar, Thomas Gleixner, Asit K Mallick,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	linux-kernel

On 05/17/2011 07:46 PM, Andi Kleen wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>>
>> If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
>> IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H,
>> ECX=0H):
>> EBX[bit 9] also reports 1.
> 
> I suspect the check at this place is not too useful because it will
> only work for the BSP. For all others it's too late -- the patching
> has already happened.
> 
> So either this is a problem and then it should be checked on all CPUs.
> Or maybe not at all.
> 
> The problem is that the alternative patching currently relies on being
> run early with no other CPUs. It has no race protections, support
> for cross modification etc.
> 
> While it would be possible to fix that it would be quite complicated
> I bet.
> 
> So I think it's better to just remove it unless it's a real problem
> in the field.
> 

The reason for having it is that if the BIOS has a chicken switch that
disables FAST_STRING (and it's the BIOS' responsibility to do it on all
CPUs) then this will make the kernel honor this.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18  3:09     ` Yu, Fenghua
@ 2011-05-18  4:05       ` Andi Kleen
  2011-05-18 18:33         ` Yu, Fenghua
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2011-05-18  4:05 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Andi Kleen, Ingo Molnar, Thomas Gleixner, H Peter Anvin, Mallick,
	Asit K, Linus Torvalds, Avi Kivity, Arjan van de Ven,
	Andrew Morton, linux-kernel


>
> Only memcpy are generated by gcc when gcc version >=4.3. Other functions
> are defined by kernel lib.

Are you sure? AFAIK it supports more.

> I would leave gcc optimization for most memcpy cases instead of forcing
> memcpy to call the kernel lib memcpy. I hope gcc will catch up and
> implement a good enhanced rep movsb/stosb solution soon. If turns out gcc
> can not generate good memcpy, it's easy to switch to the patching kernel
> lib memcpy.

The problem is that gcc can only do that if you tell it to generate
code for that. But it has no mechanism to patch in/out different
variants for the same binary. So it would only work for a specially
optimized kernel for that CPU.

I suspect for smaller copies it won't make too much different anyways
and gcc's code is probably fine. But gcc won't know that you
can do better on large copies, so using a macro would be a way
to tell it that.

-Andi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy " Fenghua Yu
@ 2011-05-18  6:35   ` Ingo Molnar
  2011-05-18 19:04     ` Yu, Fenghua
  2011-05-18 20:42   ` [tip:perf/core] x86, mem: memcpy_64.S: " tip-bot for Fenghua Yu
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2011-05-18  6:35 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, H Peter Anvin, Asit K Mallick, Linus Torvalds,
	Avi Kivity, Arjan van de Ven, Andrew Morton, Andi Kleen,
	linux-kernel


* Fenghua Yu <fenghua.yu@intel.com> wrote:

> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Support memcpy() with enhanced rep movsb. On processors supporting enhanced 
> rep movsb, the alternative memcpy() function using enhanced rep movsb 
> overrides the original function and the fast string function.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/lib/memcpy_64.S |   45 ++++++++++++++++++++++++++++++++-------------
>  1 files changed, 32 insertions(+), 13 deletions(-)

>  ENDPROC(__memcpy)
>  
>  	/*
> -	 * Some CPUs run faster using the string copy instructions.
> -	 * It is also a lot simpler. Use this when possible:
> -	 */
> -
> -	.section .altinstructions, "a"
> -	.align 8
> -	.quad memcpy
> -	.quad .Lmemcpy_c
> -	.word X86_FEATURE_REP_GOOD
> -
> -	/*
> +	 * Some CPUs are adding enhanced REP MOVSB/STOSB feature
> +	 * If the feature is supported, memcpy_c_e() is the first choice.
> +	 * If enhanced rep movsb copy is not available, use fast string copy
> +	 * memcpy_c() when possible. This is faster and code is simpler than
> +	 * original memcpy().

Please use more obvious names than cryptic and meaningless _c and _c_e 
postfixes. We do not repeat these many times.

Also, did you know about the 'perf bench mem memcpy' tool prototype we have in 
the kernel tree? It is intended to check and evaluate exactly the patches you 
are offering here. The code lives in:

  tools/perf/bench/mem-memcpy-arch.h
  tools/perf/bench/mem-memcpy.c
  tools/perf/bench/mem-memcpy-x86-64-asm-def.h
  tools/perf/bench/mem-memcpy-x86-64-asm.S

Please look into testing (fixing if needed), using and extending it:

 - We want to measure the alternatives variants as well, not just the generic one

 - We want to measure memmove, memclear, etc. operations as well, not just 
   memcpy

 - We want cache-cold and cache-hot numbers as well, going along multiple sizes

This tool can also useful when developing these changes: they can be tested in 
user-space and can be iterated very quickly, without having to build and 
booting the kernel.

We can commit any enhancements/fixes you do to perf bench alongside your memcpy 
patches. All in one, such measurements will make it much easier for us to apply 
the patches.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18  4:05       ` Andi Kleen
@ 2011-05-18 18:33         ` Yu, Fenghua
  2011-05-18 18:39           ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Yu, Fenghua @ 2011-05-18 18:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Mallick, Asit K,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	linux-kernel

> -----Original Message-----
> From: Andi Kleen [mailto:andi@firstfloor.org]
> Sent: Tuesday, May 17, 2011 9:05 PM
> To: Yu, Fenghua
> Cc: Andi Kleen; Ingo Molnar; Thomas Gleixner; H Peter Anvin; Mallick,
> Asit K; Linus Torvalds; Avi Kivity; Arjan van de Ven; Andrew Morton;
> linux-kernel
> Subject: RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by
> enhanced REP MOVSB/STOSB
> > Only memcpy are generated by gcc when gcc version >=4.3. Other
> functions
> > are defined by kernel lib.
> 
> Are you sure? AFAIK it supports more.

I use gcc 4.3.2 installed by FC10 to build kernel with defconfig. Only memcpy is built with gcc builtin and inline memcpy. All of others (i.e. memset, clear_page, memmove, and copy_user) call the kernel lib.

It's easy to check this by disassembling kernel binary.

Gcc 4.3.2 and FC10 are old but not so old. They have this capabilities.

> 
> > I would leave gcc optimization for most memcpy cases instead of
> forcing
> > memcpy to call the kernel lib memcpy. I hope gcc will catch up and
> > implement a good enhanced rep movsb/stosb solution soon. If turns out
> gcc
> > can not generate good memcpy, it's easy to switch to the patching
> kernel
> > lib memcpy.
> 
> The problem is that gcc can only do that if you tell it to generate
> code for that. But it has no mechanism to patch in/out different
> variants for the same binary. So it would only work for a specially
> optimized kernel for that CPU.
> 
> I suspect for smaller copies it won't make too much different anyways
> and gcc's code is probably fine. But gcc won't know that you
> can do better on large copies, so using a macro would be a way
> to tell it that.
> 
> -Andi

I absolutely agree with you on that. For example, gcc builds memcpy as inlined rep movsb for big copy. This works fine on enhanced rep movsb/stosb processors. But it doesn't work as good as kernel lib memcpy on non rep movsb/stosb processors which are mostly current machine in the market.

I discussed this issue with others before. Seems people like to wait for enhanced rep movsb/stosb enabled gcc to come and see the performance data with gcc version and kernel lib version to decide which way to go.

With the patch set, at least on gcc 4.3.2, the optimization works fine except memcpy.

If people don't want to wait for gcc to optimize the mem lib with ERMS, it's easy to force those function to use lib functions. I can send a small patch in string_64/32.h to do so.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18 18:33         ` Yu, Fenghua
@ 2011-05-18 18:39           ` Andi Kleen
  2011-05-18 18:47             ` Ingo Molnar
  2011-05-18 18:49             ` Yu, Fenghua
  0 siblings, 2 replies; 30+ messages in thread
From: Andi Kleen @ 2011-05-18 18:39 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Andi Kleen, Ingo Molnar, Thomas Gleixner, H Peter Anvin, Mallick,
	Asit K, Linus Torvalds, Avi Kivity, Arjan van de Ven,
	Andrew Morton, linux-kernel

> I use gcc 4.3.2 installed by FC10 to build kernel with defconfig. Only memcpy is built with gcc builtin and inline memcpy. All of others (i.e. memset, clear_page, memmove, and copy_user) call the kernel lib.
> 
> It's easy to check this by disassembling kernel binary.

gcc has a complex set of heuristics. For example if it cannot decide
the length or the alignment it calls the kernel code. Otherwise
it inlines.

So just looking at a few examples won't give you the whole picture.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18 18:39           ` Andi Kleen
@ 2011-05-18 18:47             ` Ingo Molnar
  2011-05-18 18:49             ` Yu, Fenghua
  1 sibling, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-18 18:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Yu, Fenghua, Thomas Gleixner, H Peter Anvin, Mallick, Asit K,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	linux-kernel

* Andi Kleen <andi@firstfloor.org> wrote:

> > I use gcc 4.3.2 installed by FC10 to build kernel with defconfig. Only memcpy is built with gcc builtin and inline memcpy. All of others (i.e. memset, clear_page, memmove, and copy_user) call the kernel lib.
> > 
> > It's easy to check this by disassembling kernel binary.
> 
> gcc has a complex set of heuristics. For example if it cannot decide the 
> length or the alignment it calls the kernel code. Otherwise it inlines.
> 
> So just looking at a few examples won't give you the whole picture.

Well, your generic, vague claims about what GCC could possibly do does not 
count much when weighted against the hard data he provided.

He used a defconfig and reported his results and has proven that your claim is 
wrong in fairly common circumstances.

If you think the result is or should be different provide a config and test it 
yourself, contradicting/countering the data he provided.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-18 18:39           ` Andi Kleen
  2011-05-18 18:47             ` Ingo Molnar
@ 2011-05-18 18:49             ` Yu, Fenghua
  1 sibling, 0 replies; 30+ messages in thread
From: Yu, Fenghua @ 2011-05-18 18:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Mallick, Asit K,
	Linus Torvalds, Avi Kivity, Arjan van de Ven, Andrew Morton,
	linux-kernel

> -----Original Message-----
> From: Andi Kleen [mailto:andi@firstfloor.org]
> Sent: Wednesday, May 18, 2011 11:39 AM
> To: Yu, Fenghua
> Cc: Andi Kleen; Ingo Molnar; Thomas Gleixner; H Peter Anvin; Mallick,
> Asit K; Linus Torvalds; Avi Kivity; Arjan van de Ven; Andrew Morton;
> linux-kernel
> Subject: Re: [PATCH 9/9] x86/lib/memset_64.S: Optimize memset by
> enhanced REP MOVSB/STOSB
> 
> > I use gcc 4.3.2 installed by FC10 to build kernel with defconfig.
> Only memcpy is built with gcc builtin and inline memcpy. All of others
> (i.e. memset, clear_page, memmove, and copy_user) call the kernel lib.
> >
> > It's easy to check this by disassembling kernel binary.
> 
> gcc has a complex set of heuristics. For example if it cannot decide
> the length or the alignment it calls the kernel code. Otherwise
> it inlines.
> 
> So just looking at a few examples won't give you the whole picture.

What gcc 4.3.2 handles memcpy in kernel is to put inline code in kernel. Gcc 4.3.2 doesn't generate code to call kernel memcpy. Other version may do this. But not 4.3.2.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
  2011-05-18  6:35   ` Ingo Molnar
@ 2011-05-18 19:04     ` Yu, Fenghua
  0 siblings, 0 replies; 30+ messages in thread
From: Yu, Fenghua @ 2011-05-18 19:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, H Peter Anvin, Mallick, Asit K, Linus Torvalds,
	Avi Kivity, Arjan van de Ven, Andrew Morton, Andi Kleen,
	linux-kernel

> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@elte.hu]
> Sent: Tuesday, May 17, 2011 11:36 PM
> To: Yu, Fenghua
> Cc: Thomas Gleixner; H Peter Anvin; Mallick, Asit K; Linus Torvalds;
> Avi Kivity; Arjan van de Ven; Andrew Morton; Andi Kleen; linux-kernel
> Subject: Re: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by
> enhanced REP MOVSB/STOSB
> 
> 
> * Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> > From: Fenghua Yu <fenghua.yu@intel.com>
> >
> > Support memcpy() with enhanced rep movsb. On processors supporting
> enhanced
> > rep movsb, the alternative memcpy() function using enhanced rep movsb
> > overrides the original function and the fast string function.
> >
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > ---
> >  arch/x86/lib/memcpy_64.S |   45 ++++++++++++++++++++++++++++++++----
> ---------
> >  1 files changed, 32 insertions(+), 13 deletions(-)
> 
> >  ENDPROC(__memcpy)
> >
> >  	/*
> > -	 * Some CPUs run faster using the string copy instructions.
> > -	 * It is also a lot simpler. Use this when possible:
> > -	 */
> > -
> > -	.section .altinstructions, "a"
> > -	.align 8
> > -	.quad memcpy
> > -	.quad .Lmemcpy_c
> > -	.word X86_FEATURE_REP_GOOD
> > -
> > -	/*
> > +	 * Some CPUs are adding enhanced REP MOVSB/STOSB feature
> > +	 * If the feature is supported, memcpy_c_e() is the first choice.
> > +	 * If enhanced rep movsb copy is not available, use fast string
> copy
> > +	 * memcpy_c() when possible. This is faster and code is simpler
> than
> > +	 * original memcpy().
> 
> Please use more obvious names than cryptic and meaningless _c and _c_e
> postfixes. We do not repeat these many times.
> 
> Also, did you know about the 'perf bench mem memcpy' tool prototype we
> have in
> the kernel tree? It is intended to check and evaluate exactly the
> patches you
> are offering here. The code lives in:
> 
>   tools/perf/bench/mem-memcpy-arch.h
>   tools/perf/bench/mem-memcpy.c
>   tools/perf/bench/mem-memcpy-x86-64-asm-def.h
>   tools/perf/bench/mem-memcpy-x86-64-asm.S
> 
> Please look into testing (fixing if needed), using and extending it:
> 
>  - We want to measure the alternatives variants as well, not just the
> generic one
> 
>  - We want to measure memmove, memclear, etc. operations as well, not
> just
>    memcpy
> 
>  - We want cache-cold and cache-hot numbers as well, going along
> multiple sizes
> 
> This tool can also useful when developing these changes: they can be
> tested in
> user-space and can be iterated very quickly, without having to build
> and
> booting the kernel.
> 
> We can commit any enhancements/fixes you do to perf bench alongside
> your memcpy
> patches. All in one, such measurements will make it much easier for us
> to apply
> the patches.
> 
> Thanks,
> 
> 	Ingo

I'll work on the bench tool and will let you know when it's ready.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced Fenghua Yu
  2011-05-18  2:46   ` Andi Kleen
@ 2011-05-18 20:40   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  161ec53c702ce9df2f439804dfb9331807066daa
Gitweb:     http://git.kernel.org/tip/161ec53c702ce9df2f439804dfb9331807066daa
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:11 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:23 -0700

x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB

If kernel intends to use enhanced REP MOVSB/STOSB, it must ensure
IA32_MISC_ENABLE.Fast_String_Enable (bit 0) is set and CPUID.(EAX=07H, ECX=0H):
EBX[bit 9] also reports 1.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index df86bc8..fc73a34 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -29,10 +29,10 @@
 
 static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 {
+	u64 misc_enable;
+
 	/* Unmask CPUID levels if masked: */
 	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
-		u64 misc_enable;
-
 		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
 
 		if (misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID) {
@@ -118,8 +118,6 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 	 * (model 2) with the same problem.
 	 */
 	if (c->x86 == 15) {
-		u64 misc_enable;
-
 		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
 
 		if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) {
@@ -130,6 +128,19 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 		}
 	}
 #endif
+
+	/*
+	 * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
+	 * clear the fast string and enhanced fast string CPU capabilities.
+	 */
+	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
+		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
+		if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) {
+			printk(KERN_INFO "Disabled fast string operations\n");
+			setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
+			setup_clear_cpu_cap(X86_FEATURE_ERMS);
+		}
+	}
 }
 
 #ifdef CONFIG_X86_32

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, alternative, doc: Add comment for applying alternatives order
  2011-05-17 22:29 ` [PATCH 3/9] x86/kernel/alternative.c: Add comment for applying alternatives order Fenghua Yu
@ 2011-05-18 20:40   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  509731336313b3799cf03071d72c64fa6383895e
Gitweb:     http://git.kernel.org/tip/509731336313b3799cf03071d72c64fa6383895e
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:12 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:25 -0700

x86, alternative, doc: Add comment for applying alternatives order

Some string operation functions may be patched twice, e.g. on enhanced REP MOVSB
/STOSB processors, memcpy is patched first by fast string alternative function,
then it is patched by enhanced REP MOVSB/STOSB alternative function.

Add comment for applying alternatives order to warn people who may change the
applying alternatives order for any reason.

[ Documentation-only patch ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/alternative.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 4a23467..f4fe15d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -210,6 +210,15 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	u8 insnbuf[MAX_PATCH_LEN];
 
 	DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
+	/*
+	 * The scan order should be from start to end. A later scanned
+	 * alternative code can overwrite a previous scanned alternative code.
+	 * Some kernel functions (e.g. memcpy, memset, etc) use this order to
+	 * patch code.
+	 *
+	 * So be careful if you want to change the scan order to any other
+	 * order.
+	 */
 	for (a = start; a < end; a++) {
 		u8 *instr = a->instr;
 		BUG_ON(a->replacementlen > a->instrlen);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, alternative: Add altinstruction_entry macro
  2011-05-17 22:29 ` [PATCH 4/9] x86, alternative-asm.h: Add altinstruction_entry macro Fenghua Yu
@ 2011-05-18 20:41   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  9072d11da15a71e086eab3b5085184f2c1d06913
Gitweb:     http://git.kernel.org/tip/9072d11da15a71e086eab3b5085184f2c1d06913
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:13 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:25 -0700

x86, alternative: Add altinstruction_entry macro

Add altinstruction_entry macro to generate .altinstructions section
entries from assembly code.  This should be less failure-prone than
open-coding.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/include/asm/alternative-asm.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index a63a68b..94d420b 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -15,4 +15,13 @@
 	.endm
 #endif
 
+.macro altinstruction_entry orig alt feature orig_len alt_len
+	.align 8
+	.quad \orig
+	.quad \alt
+	.word \feature
+	.byte \orig_len
+	.byte \alt_len
+.endm
+
 #endif  /*  __ASSEMBLY__  */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 5/9] x86/lib/clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB Fenghua Yu
@ 2011-05-18 20:41   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  e365c9df2f2f001450decf9512412d2d5bd1cdef
Gitweb:     http://git.kernel.org/tip/e365c9df2f2f001450decf9512412d2d5bd1cdef
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:14 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:27 -0700

x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB

Intel processors are adding enhancements to REP MOVSB/STOSB and the use of
REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB
/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative
clear_page_c_e function using enhanced REP STOSB overrides the original function
and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/lib/clear_page_64.S |   33 ++++++++++++++++++++++++---------
 1 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index aa4326b..f2145cf 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -1,5 +1,6 @@
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
 
 /*
  * Zero a page. 	
@@ -14,6 +15,15 @@ ENTRY(clear_page_c)
 	CFI_ENDPROC
 ENDPROC(clear_page_c)
 
+ENTRY(clear_page_c_e)
+	CFI_STARTPROC
+	movl $4096,%ecx
+	xorl %eax,%eax
+	rep stosb
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_c_e)
+
 ENTRY(clear_page)
 	CFI_STARTPROC
 	xorl   %eax,%eax
@@ -38,21 +48,26 @@ ENTRY(clear_page)
 .Lclear_page_end:
 ENDPROC(clear_page)
 
-	/* Some CPUs run faster using the string instructions.
-	   It is also a lot simpler. Use this when possible */
+	/*
+	 * Some CPUs support enhanced REP MOVSB/STOSB instructions.
+	 * It is recommended to use this when possible.
+	 * If enhanced REP MOVSB/STOSB is not available, try to use fast string.
+	 * Otherwise, use original function.
+	 *
+	 */
 
 #include <asm/cpufeature.h>
 
 	.section .altinstr_replacement,"ax"
 1:	.byte 0xeb					/* jmp <disp8> */
 	.byte (clear_page_c - clear_page) - (2f - 1b)	/* offset */
-2:
+2:	.byte 0xeb					/* jmp <disp8> */
+	.byte (clear_page_c_e - clear_page) - (3f - 2b)	/* offset */
+3:
 	.previous
 	.section .altinstructions,"a"
-	.align 8
-	.quad clear_page
-	.quad 1b
-	.word X86_FEATURE_REP_GOOD
-	.byte .Lclear_page_end - clear_page
-	.byte 2b - 1b
+	altinstruction_entry clear_page,1b,X86_FEATURE_REP_GOOD,\
+			     .Lclear_page_end-clear_page, 2b-1b
+	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
+			     .Lclear_page_end-clear_page,3b-2b
 	.previous

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 6/9] x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by " Fenghua Yu
@ 2011-05-18 20:42   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  4307bec9344aed83f8107c3eb4285bd9d218fc10
Gitweb:     http://git.kernel.org/tip/4307bec9344aed83f8107c3eb4285bd9d218fc10
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:15 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:28 -0700

x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB

Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/lib/copy_user_64.S |   65 ++++++++++++++++++++++++++++++++++++------
 1 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 99e4826..d17a117 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -15,23 +15,30 @@
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
 #include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
 
-	.macro ALTERNATIVE_JUMP feature,orig,alt
+/*
+ * By placing feature2 after feature1 in altinstructions section, we logically
+ * implement:
+ * If CPU has feature2, jmp to alt2 is used
+ * else if CPU has feature1, jmp to alt1 is used
+ * else jmp to orig is used.
+ */
+	.macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
 0:
 	.byte 0xe9	/* 32bit jump */
 	.long \orig-1f	/* by default jump to orig */
 1:
 	.section .altinstr_replacement,"ax"
 2:	.byte 0xe9			/* near jump with 32bit immediate */
-	.long \alt-1b /* offset */   /* or alternatively to alt */
+	.long \alt1-1b /* offset */   /* or alternatively to alt1 */
+3:	.byte 0xe9			/* near jump with 32bit immediate */
+	.long \alt2-1b /* offset */   /* or alternatively to alt2 */
 	.previous
+
 	.section .altinstructions,"a"
-	.align 8
-	.quad  0b
-	.quad  2b
-	.word  \feature			/* when feature is set */
-	.byte  5
-	.byte  5
+	altinstruction_entry 0b,2b,\feature1,5,5
+	altinstruction_entry 0b,3b,\feature2,5,5
 	.previous
 	.endm
 
@@ -73,7 +80,9 @@ ENTRY(_copy_to_user)
 	jc bad_to_user
 	cmpq TI_addr_limit(%rax),%rcx
 	jae bad_to_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
+		copy_user_generic_unrolled,copy_user_generic_string,	\
+		copy_user_enhanced_fast_string
 	CFI_ENDPROC
 ENDPROC(_copy_to_user)
 
@@ -86,7 +95,9 @@ ENTRY(_copy_from_user)
 	jc bad_from_user
 	cmpq TI_addr_limit(%rax),%rcx
 	jae bad_from_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
+		copy_user_generic_unrolled,copy_user_generic_string,	\
+		copy_user_enhanced_fast_string
 	CFI_ENDPROC
 ENDPROC(_copy_from_user)
 
@@ -255,3 +266,37 @@ ENTRY(copy_user_generic_string)
 	.previous
 	CFI_ENDPROC
 ENDPROC(copy_user_generic_string)
+
+/*
+ * Some CPUs are adding enhanced REP MOVSB/STOSB instructions.
+ * It's recommended to use enhanced REP MOVSB/STOSB if it's enabled.
+ *
+ * Input:
+ * rdi destination
+ * rsi source
+ * rdx count
+ *
+ * Output:
+ * eax uncopied bytes or 0 if successful.
+ */
+ENTRY(copy_user_enhanced_fast_string)
+	CFI_STARTPROC
+	andl %edx,%edx
+	jz 2f
+	movl %edx,%ecx
+1:	rep
+	movsb
+2:	xorl %eax,%eax
+	ret
+
+	.section .fixup,"ax"
+12:	movl %ecx,%edx		/* ecx is zerorest also */
+	jmp copy_user_handle_tail
+	.previous
+
+	.section __ex_table,"a"
+	.align 8
+	.quad 1b,12b
+	.previous
+	CFI_ENDPROC
+ENDPROC(copy_user_enhanced_fast_string)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy " Fenghua Yu
  2011-05-18  6:35   ` Ingo Molnar
@ 2011-05-18 20:42   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  101068c1f4a947ffa08f2782c78e40097300754d
Gitweb:     http://git.kernel.org/tip/101068c1f4a947ffa08f2782c78e40097300754d
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:16 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:29 -0700

x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSB

Support memcpy() with enhanced rep movsb. On processors supporting enhanced
rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string
function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/lib/memcpy_64.S |   45 ++++++++++++++++++++++++++++++++-------------
 1 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 75ef61e..daab21d 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,6 +4,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
 
 /*
  * memcpy - Copy a memory block.
@@ -37,6 +38,23 @@
 .Lmemcpy_e:
 	.previous
 
+/*
+ * memcpy_c_e() - enhanced fast string memcpy. This is faster and simpler than
+ * memcpy_c. Use memcpy_c_e when possible.
+ *
+ * This gets patched over the unrolled variant (below) via the
+ * alternative instructions framework:
+ */
+	.section .altinstr_replacement, "ax", @progbits
+.Lmemcpy_c_e:
+	movq %rdi, %rax
+
+	movl %edx, %ecx
+	rep movsb
+	ret
+.Lmemcpy_e_e:
+	.previous
+
 ENTRY(__memcpy)
 ENTRY(memcpy)
 	CFI_STARTPROC
@@ -171,21 +189,22 @@ ENDPROC(memcpy)
 ENDPROC(__memcpy)
 
 	/*
-	 * Some CPUs run faster using the string copy instructions.
-	 * It is also a lot simpler. Use this when possible:
-	 */
-
-	.section .altinstructions, "a"
-	.align 8
-	.quad memcpy
-	.quad .Lmemcpy_c
-	.word X86_FEATURE_REP_GOOD
-
-	/*
+	 * Some CPUs are adding enhanced REP MOVSB/STOSB feature
+	 * If the feature is supported, memcpy_c_e() is the first choice.
+	 * If enhanced rep movsb copy is not available, use fast string copy
+	 * memcpy_c() when possible. This is faster and code is simpler than
+	 * original memcpy().
+	 * Otherwise, original memcpy() is used.
+	 * In .altinstructions section, ERMS feature is placed after REG_GOOD
+         * feature to implement the right patch order.
+	 *
 	 * Replace only beginning, memcpy is used to apply alternatives,
 	 * so it is silly to overwrite itself with nops - reboot is the
 	 * only outcome...
 	 */
-	.byte .Lmemcpy_e - .Lmemcpy_c
-	.byte .Lmemcpy_e - .Lmemcpy_c
+	.section .altinstructions, "a"
+	altinstruction_entry memcpy,.Lmemcpy_c,X86_FEATURE_REP_GOOD,\
+			     .Lmemcpy_e-.Lmemcpy_c,.Lmemcpy_e-.Lmemcpy_c
+	altinstruction_entry memcpy,.Lmemcpy_c_e,X86_FEATURE_ERMS, \
+			     .Lmemcpy_e_e-.Lmemcpy_c_e,.Lmemcpy_e_e-.Lmemcpy_c_e
 	.previous

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 8/9] x86/lib/memmove_64.S: Optimize memmove " Fenghua Yu
@ 2011-05-18 20:43   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:43 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  057e05c1d6440117875f455e59da8691e08f65d5
Gitweb:     http://git.kernel.org/tip/057e05c1d6440117875f455e59da8691e08f65d5
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:17 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:30 -0700

x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB

Support memmove() by enhanced rep movsb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb
overrides the original function.

The patch doesn't change the backward memmove case to use enhanced rep
movsb.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/lib/memmove_64.S |   29 ++++++++++++++++++++++++++++-
 1 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 0ecb843..d0ec9c2 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -8,6 +8,7 @@
 #define _STRING_C
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
 
 #undef memmove
 
@@ -24,6 +25,7 @@
  */
 ENTRY(memmove)
 	CFI_STARTPROC
+
 	/* Handle more 32bytes in loop */
 	mov %rdi, %rax
 	cmp $0x20, %rdx
@@ -31,8 +33,13 @@ ENTRY(memmove)
 
 	/* Decide forward/backward copy mode */
 	cmp %rdi, %rsi
-	jb	2f
+	jge .Lmemmove_begin_forward
+	mov %rsi, %r8
+	add %rdx, %r8
+	cmp %rdi, %r8
+	jg 2f
 
+.Lmemmove_begin_forward:
 	/*
 	 * movsq instruction have many startup latency
 	 * so we handle small size by general register.
@@ -78,6 +85,8 @@ ENTRY(memmove)
 	rep movsq
 	movq %r11, (%r10)
 	jmp 13f
+.Lmemmove_end_forward:
+
 	/*
 	 * Handle data backward by movsq.
 	 */
@@ -194,4 +203,22 @@ ENTRY(memmove)
 13:
 	retq
 	CFI_ENDPROC
+
+	.section .altinstr_replacement,"ax"
+.Lmemmove_begin_forward_efs:
+	/* Forward moving data. */
+	movq %rdx, %rcx
+	rep movsb
+	retq
+.Lmemmove_end_forward_efs:
+	.previous
+
+	.section .altinstructions,"a"
+	.align 8
+	.quad .Lmemmove_begin_forward
+	.quad .Lmemmove_begin_forward_efs
+	.word X86_FEATURE_ERMS
+	.byte .Lmemmove_end_forward-.Lmemmove_begin_forward
+	.byte .Lmemmove_end_forward_efs-.Lmemmove_begin_forward_efs
+	.previous
 ENDPROC(memmove)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:perf/core] x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
  2011-05-17 22:29 ` [PATCH 9/9] x86/lib/memset_64.S: Optimize memset " Fenghua Yu
  2011-05-18  2:57   ` Andi Kleen
@ 2011-05-18 20:43   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot for Fenghua Yu @ 2011-05-18 20:43 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, fenghua.yu, tglx, hpa

Commit-ID:  2f19e06ac30771c7cb96fd61d8aeacfa74dac21c
Gitweb:     http://git.kernel.org/tip/2f19e06ac30771c7cb96fd61d8aeacfa74dac21c
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Tue, 17 May 2011 15:29:18 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 17 May 2011 15:40:31 -0700

x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB

Support memset() with enhanced rep stosb. On processors supporting enhanced
REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb
overrides the fast string alternative memset_c and the original function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/lib/memset_64.S |   54 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 09d3442..79bd454 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -2,9 +2,13 @@
 
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
 
 /*
- * ISO C memset - set a memory block to a byte value.
+ * ISO C memset - set a memory block to a byte value. This function uses fast
+ * string to get better performance than the original function. The code is
+ * simpler and shorter than the orignal function as well.
  *	
  * rdi   destination
  * rsi   value (char) 
@@ -31,6 +35,28 @@
 .Lmemset_e:
 	.previous
 
+/*
+ * ISO C memset - set a memory block to a byte value. This function uses
+ * enhanced rep stosb to override the fast string function.
+ * The code is simpler and shorter than the fast string function as well.
+ *
+ * rdi   destination
+ * rsi   value (char)
+ * rdx   count (bytes)
+ *
+ * rax   original destination
+ */
+	.section .altinstr_replacement, "ax", @progbits
+.Lmemset_c_e:
+	movq %rdi,%r9
+	movb %sil,%al
+	movl %edx,%ecx
+	rep stosb
+	movq %r9,%rax
+	ret
+.Lmemset_e_e:
+	.previous
+
 ENTRY(memset)
 ENTRY(__memset)
 	CFI_STARTPROC
@@ -112,16 +138,20 @@ ENTRY(__memset)
 ENDPROC(memset)
 ENDPROC(__memset)
 
-	/* Some CPUs run faster using the string instructions.
-	   It is also a lot simpler. Use this when possible */
-
-#include <asm/cpufeature.h>
-
+	/* Some CPUs support enhanced REP MOVSB/STOSB feature.
+	 * It is recommended to use this when possible.
+	 *
+	 * If enhanced REP MOVSB/STOSB feature is not available, use fast string
+	 * instructions.
+	 *
+	 * Otherwise, use original memset function.
+	 *
+	 * In .altinstructions section, ERMS feature is placed after REG_GOOD
+         * feature to implement the right patch order.
+	 */
 	.section .altinstructions,"a"
-	.align 8
-	.quad memset
-	.quad .Lmemset_c
-	.word X86_FEATURE_REP_GOOD
-	.byte .Lfinal - memset
-	.byte .Lmemset_e - .Lmemset_c
+	altinstruction_entry memset,.Lmemset_c,X86_FEATURE_REP_GOOD,\
+			     .Lfinal-memset,.Lmemset_e-.Lmemset_c
+	altinstruction_entry memset,.Lmemset_c_e,X86_FEATURE_ERMS, \
+			     .Lfinal-memset,.Lmemset_e_e-.Lmemset_c_e
 	.previous

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-05-18 20:43 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 22:29 [PATCH 0/9] Optimize string operations by enhanced REP MOVSB/STOSB Fenghua Yu
2011-05-17 22:29 ` [PATCH 1/9] x86, cpu: Enable enhanced REP MOVSB/STOSB feature Fenghua Yu
2011-05-17 23:13   ` [tip:x86/cpufeature] x86, cpufeature: Add CPU feature bit for enhanced REP MOVSB/STOSB tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 2/9] x86/kernel/cpu/intel.c: Initialize Enhanced REP MOVSB/STOSBenhanced Fenghua Yu
2011-05-18  2:46   ` Andi Kleen
2011-05-18  3:47     ` H. Peter Anvin
2011-05-18 20:40   ` [tip:perf/core] x86, mem, intel: Initialize Enhanced REP MOVSB/STOSB tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 3/9] x86/kernel/alternative.c: Add comment for applying alternatives order Fenghua Yu
2011-05-18 20:40   ` [tip:perf/core] x86, alternative, doc: " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 4/9] x86, alternative-asm.h: Add altinstruction_entry macro Fenghua Yu
2011-05-18 20:41   ` [tip:perf/core] x86, alternative: " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 5/9] x86/lib/clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSB Fenghua Yu
2011-05-18 20:41   ` [tip:perf/core] x86, mem: clear_page_64.S: " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 6/9] x86/lib/copy_user_64.S: Support copy_to_user/copy_from_user by " Fenghua Yu
2011-05-18 20:42   ` [tip:perf/core] x86, mem: copy_user_64.S: Support copy_to/from_user " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy " Fenghua Yu
2011-05-18  6:35   ` Ingo Molnar
2011-05-18 19:04     ` Yu, Fenghua
2011-05-18 20:42   ` [tip:perf/core] x86, mem: memcpy_64.S: " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 8/9] x86/lib/memmove_64.S: Optimize memmove " Fenghua Yu
2011-05-18 20:43   ` [tip:perf/core] x86, mem: memmove_64.S: " tip-bot for Fenghua Yu
2011-05-17 22:29 ` [PATCH 9/9] x86/lib/memset_64.S: Optimize memset " Fenghua Yu
2011-05-18  2:57   ` Andi Kleen
2011-05-18  3:09     ` Yu, Fenghua
2011-05-18  4:05       ` Andi Kleen
2011-05-18 18:33         ` Yu, Fenghua
2011-05-18 18:39           ` Andi Kleen
2011-05-18 18:47             ` Ingo Molnar
2011-05-18 18:49             ` Yu, Fenghua
2011-05-18 20:43   ` [tip:perf/core] x86, mem: memset_64.S: " tip-bot for Fenghua Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.