[PATCH] X86/Mem: Use string copy operation to optimze copy in kernel compression

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] X86/Mem: Use string copy operation to optimze copy in kernel compression
@ 2010-10-08  1:47 yakui.zhao
  2010-10-08  5:40 ` [tip:x86/setup] x86, setup: " tip-bot for Zhao Yakui
  0 siblings, 1 reply; 5+ messages in thread
From: yakui.zhao @ 2010-10-08  1:47 UTC (permalink / raw)
  To: hpa; +Cc: linux-kernel, Zhao Yakui

From: Zhao Yakui <yakui.zhao@intel.com>

It will parse the elf and then copy them to the corresponding destination after
the kernel decompression is finished. And now it uses the slow byte-copy mode.
How about using the string copy operation to accelerate the copy speed in
course of kernel compression?(It is orignated from the arch/x86/lib/memcpy_32.c)

In the test the copy performance can be improved very significantly after using
the string copy operation mechanism.
        1. The copy time can be reduced from 150ms to 20ms on one atom machine
	2. The copy time can be reduced about 80% on another machine
		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
		The time is reduced from 10ms to 2ms when using 64-bit kernel.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 arch/x86/boot/compressed/misc.c |   29 +++++++++++++++++++++++------
 1 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8f7bef8..23f315c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -229,18 +229,35 @@ void *memset(void *s, int c, size_t n)
 		ss[i] = c;
 	return s;
 }
-
+#ifdef CONFIG_X86_32
 void *memcpy(void *dest, const void *src, size_t n)
 {
-	int i;
-	const char *s = src;
-	char *d = dest;
+	int d0, d1, d2;
+	asm volatile(
+		"rep ; movsl\n\t"
+		"movl %4,%%ecx\n\t"
+		"rep ; movsb\n\t"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src)
+		: "memory");
 
-	for (i = 0; i < n; i++)
-		d[i] = s[i];
 	return dest;
 }
+#else
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	long d0, d1, d2;
+	asm volatile(
+		"rep ; movsq\n\t"
+		"movq %4,%%rcx\n\t"
+		"rep ; movsb\n\t"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
+		: "memory");
 
+	return dest;
+}
+#endif
 
 static void error(char *x)
 {
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip:x86/setup] x86, setup: Use string copy operation to optimze copy in kernel compression
  2010-10-08  1:47 [PATCH] X86/Mem: Use string copy operation to optimze copy in kernel compression yakui.zhao
@ 2010-10-08  5:40 ` tip-bot for Zhao Yakui
  2010-10-08  5:52   ` Yinghai Lu
  0 siblings, 1 reply; 5+ messages in thread
From: tip-bot for Zhao Yakui @ 2010-10-08  5:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, yakui.zhao, tglx

Commit-ID:  68f4d5a00adaab33b136fce2c72d5c377b39b0b0
Gitweb:     http://git.kernel.org/tip/68f4d5a00adaab33b136fce2c72d5c377b39b0b0
Author:     Zhao Yakui <yakui.zhao@intel.com>
AuthorDate: Fri, 8 Oct 2010 09:47:33 +0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 7 Oct 2010 21:23:09 -0700

x86, setup: Use string copy operation to optimze copy in kernel compression

The kernel decompression code parses the ELF header and then copies
the segment to the corresponding destination.  Currently it uses slow
byte-copy code.  This patch makes it use the string copy operations
instead.

In the test the copy performance can be improved very significantly after using
the string copy operation mechanism.
        1. The copy time can be reduced from 150ms to 20ms on one Atom machine
	2. The copy time can be reduced about 80% on another machine
		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
		The time is reduced from 10ms to 2ms when using 64-bit kernel.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/boot/compressed/misc.c |   29 +++++++++++++++++++++++------
 1 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8f7bef8..23f315c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -229,18 +229,35 @@ void *memset(void *s, int c, size_t n)
 		ss[i] = c;
 	return s;
 }
-
+#ifdef CONFIG_X86_32
 void *memcpy(void *dest, const void *src, size_t n)
 {
-	int i;
-	const char *s = src;
-	char *d = dest;
+	int d0, d1, d2;
+	asm volatile(
+		"rep ; movsl\n\t"
+		"movl %4,%%ecx\n\t"
+		"rep ; movsb\n\t"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src)
+		: "memory");
 
-	for (i = 0; i < n; i++)
-		d[i] = s[i];
 	return dest;
 }
+#else
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	long d0, d1, d2;
+	asm volatile(
+		"rep ; movsq\n\t"
+		"movq %4,%%rcx\n\t"
+		"rep ; movsb\n\t"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
+		: "memory");
 
+	return dest;
+}
+#endif
 
 static void error(char *x)
 {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [tip:x86/setup] x86, setup: Use string copy operation to optimze copy in kernel compression
  2010-10-08  5:40 ` [tip:x86/setup] x86, setup: " tip-bot for Zhao Yakui
@ 2010-10-08  5:52   ` Yinghai Lu
  2010-10-08  6:21     ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Yinghai Lu @ 2010-10-08  5:52 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, yakui.zhao, tglx; +Cc: linux-tip-commits

On 10/07/2010 10:40 PM, tip-bot for Zhao Yakui wrote:
> Commit-ID:  68f4d5a00adaab33b136fce2c72d5c377b39b0b0
> Gitweb:     http://git.kernel.org/tip/68f4d5a00adaab33b136fce2c72d5c377b39b0b0
> Author:     Zhao Yakui <yakui.zhao@intel.com>
> AuthorDate: Fri, 8 Oct 2010 09:47:33 +0800
> Committer:  H. Peter Anvin <hpa@zytor.com>
> CommitDate: Thu, 7 Oct 2010 21:23:09 -0700
> 
> x86, setup: Use string copy operation to optimze copy in kernel compression
> 
> The kernel decompression code parses the ELF header and then copies
> the segment to the corresponding destination.  Currently it uses slow
> byte-copy code.  This patch makes it use the string copy operations
> instead.
> 
> In the test the copy performance can be improved very significantly after using
> the string copy operation mechanism.
>         1. The copy time can be reduced from 150ms to 20ms on one Atom machine
> 	2. The copy time can be reduced about 80% on another machine
> 		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
> 		The time is reduced from 10ms to 2ms when using 64-bit kernel.
> 
> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> ---
>  arch/x86/boot/compressed/misc.c |   29 +++++++++++++++++++++++------
>  1 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 8f7bef8..23f315c 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -229,18 +229,35 @@ void *memset(void *s, int c, size_t n)
>  		ss[i] = c;
>  	return s;
>  }
> -
> +#ifdef CONFIG_X86_32
>  void *memcpy(void *dest, const void *src, size_t n)
>  {
> -	int i;
> -	const char *s = src;
> -	char *d = dest;
> +	int d0, d1, d2;
> +	asm volatile(
> +		"rep ; movsl\n\t"
> +		"movl %4,%%ecx\n\t"
> +		"rep ; movsb\n\t"
> +		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
> +		: "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src)
> +		: "memory");
>  
> -	for (i = 0; i < n; i++)
> -		d[i] = s[i];
>  	return dest;
>  }
> +#else
> +void *memcpy(void *dest, const void *src, size_t n)
> +{
> +	long d0, d1, d2;
> +	asm volatile(
> +		"rep ; movsq\n\t"
> +		"movq %4,%%rcx\n\t"
> +		"rep ; movsb\n\t"
> +		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
> +		: "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
> +		: "memory");
>  
> +	return dest;
> +}
> +#endif
>  
>  static void error(char *x)
>  {

wonder if it would have problem with some old AMD K8 systems.

in amd.c

        /* On C+ stepping K8 rep microcode works well for copy/memset */
        if (c->x86 == 0xf) {
                u32 level;

                level = cpuid_eax(1);
                if ((level >= 0x0f48 && level < 0x0f50) || level >= 0x0f58)
                        set_cpu_cap(c, X86_FEATURE_REP_GOOD);
...

        }
        if (c->x86 >= 0x10)
                set_cpu_cap(c, X86_FEATURE_REP_GOOD);

Yinghai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tip:x86/setup] x86, setup: Use string copy operation to optimze copy in kernel compression
  2010-10-08  5:52   ` Yinghai Lu
@ 2010-10-08  6:21     ` H. Peter Anvin
  0 siblings, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2010-10-08  6:21 UTC (permalink / raw)
  To: Yinghai Lu, mingo, linux-kernel, yakui.zhao, tglx; +Cc: linux-tip-commits

Almost certainly still faster than bytewise copy!

"Yinghai Lu" <yinghai@kernel.org> wrote:

>On 10/07/2010 10:40 PM, tip-bot for Zhao Yakui wrote:
>> Commit-ID:  68f4d5a00adaab33b136fce2c72d5c377b39b0b0
>> Gitweb:     http://git.kernel.org/tip/68f4d5a00adaab33b136fce2c72d5c377b39b0b0
>> Author:     Zhao Yakui <yakui.zhao@intel.com>
>> AuthorDate: Fri, 8 Oct 2010 09:47:33 +0800
>> Committer:  H. Peter Anvin <hpa@zytor.com>
>> CommitDate: Thu, 7 Oct 2010 21:23:09 -0700
>> 
>> x86, setup: Use string copy operation to optimze copy in kernel compression
>> 
>> The kernel decompression code parses the ELF header and then copies
>> the segment to the corresponding destination.  Currently it uses slow
>> byte-copy code.  This patch makes it use the string copy operations
>> instead.
>> 
>> In the test the copy performance can be improved very significantly after using
>> the string copy operation mechanism.
>>         1. The copy time can be reduced from 150ms to 20ms on one Atom machine
>> 	2. The copy time can be reduced about 80% on another machine
>> 		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
>> 		The time is reduced from 10ms to 2ms when using 64-bit kernel.
>> 
>> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
>> LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com>
>> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
>> ---
>>  arch/x86/boot/compressed/misc.c |   29 +++++++++++++++++++++++------
>>  1 files changed, 23 insertions(+), 6 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
>> index 8f7bef8..23f315c 100644
>> --- a/arch/x86/boot/compressed/misc.c
>> +++ b/arch/x86/boot/compressed/misc.c
>> @@ -229,18 +229,35 @@ void *memset(void *s, int c, size_t n)
>>  		ss[i] = c;
>>  	return s;
>>  }
>> -
>> +#ifdef CONFIG_X86_32
>>  void *memcpy(void *dest, const void *src, size_t n)
>>  {
>> -	int i;
>> -	const char *s = src;
>> -	char *d = dest;
>> +	int d0, d1, d2;
>> +	asm volatile(
>> +		"rep ; movsl\n\t"
>> +		"movl %4,%%ecx\n\t"
>> +		"rep ; movsb\n\t"
>> +		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
>> +		: "0" (n >> 2), "g" (n & 3), "1" (dest), "2" (src)
>> +		: "memory");
>>  
>> -	for (i = 0; i < n; i++)
>> -		d[i] = s[i];
>>  	return dest;
>>  }
>> +#else
>> +void *memcpy(void *dest, const void *src, size_t n)
>> +{
>> +	long d0, d1, d2;
>> +	asm volatile(
>> +		"rep ; movsq\n\t"
>> +		"movq %4,%%rcx\n\t"
>> +		"rep ; movsb\n\t"
>> +		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
>> +		: "0" (n >> 3), "g" (n & 7), "1" (dest), "2" (src)
>> +		: "memory");
>>  
>> +	return dest;
>> +}
>> +#endif
>>  
>>  static void error(char *x)
>>  {
>
>wonder if it would have problem with some old AMD K8 systems.
>
>in amd.c
>
>        /* On C+ stepping K8 rep microcode works well for copy/memset */
>        if (c->x86 == 0xf) {
>                u32 level;
>
>                level = cpuid_eax(1);
>                if ((level >= 0x0f48 && level < 0x0f50) || level >= 0x0f58)
>                        set_cpu_cap(c, X86_FEATURE_REP_GOOD);
>...
>
>        }
>        if (c->x86 >= 0x10)
>                set_cpu_cap(c, X86_FEATURE_REP_GOOD);
>
>Yinghai

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] X86/Mem: Use string copy operation to optimze copy in kernel compression
@ 2010-09-26  9:12 yakui.zhao
  0 siblings, 0 replies; 5+ messages in thread
From: yakui.zhao @ 2010-09-26  9:12 UTC (permalink / raw)
  To: hpa; +Cc: linux-kernel, Zhao Yakui

From: Zhao Yakui <yakui.zhao@intel.com>

It will parse the elf and then copy them to the corresponding destination after
the kernel decompression is finished. And now it uses the slow byte-copy mode.
How about using the string copy operation to accelerate the copy speed in
course of kernel compression?(It is orignated from the arch/x86/lib/memcpy_32.c)

In the test the copy performance can be improved very significantly after using
the string copy operation mechanism.
        1. The copy time can be reduced from 150ms to 20ms on one atom machine
	2. The copy time can be reduced about 80% on another machine
		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
		The time is reduced from 10ms to 2ms when using 64-bit kernel.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 arch/x86/boot/compressed/misc.c |   35 +++++++++++++++++++++++++++++------
 1 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 8f7bef8..34793ae 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -229,18 +229,41 @@ void *memset(void *s, int c, size_t n)
 		ss[i] = c;
 	return s;
 }
-
+#ifdef CONFIG_X86_32
 void *memcpy(void *dest, const void *src, size_t n)
 {
-	int i;
-	const char *s = src;
-	char *d = dest;
+	int d0, d1, d2;
+	asm volatile(
+		"rep ; movsl\n\t"
+		"movl %4,%%ecx\n\t"
+		"andl $3,%%ecx\n\t"
+		"jz 1f\n\t"
+		"rep ; movsb\n\t"
+		"1:"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n / 4), "g" (n), "1" ((long)dest), "2" ((long)src)
+		: "memory");
 
-	for (i = 0; i < n; i++)
-		d[i] = s[i];
 	return dest;
 }
+#else
+void *memcpy(void *dest, const void *src, size_t n)
+{
+	long d0, d1, d2;
+	asm volatile(
+		"rep ; movsq\n\t"
+		"movq %4,%%rcx\n\t"
+		"andq $7,%%rcx\n\t"
+		"jz 1f\n\t"
+		"rep ; movsb\n\t"
+		"1:"
+		: "=&c" (d0), "=&D" (d1), "=&S" (d2)
+		: "0" (n / 8), "g" (n), "1" ((long)dest), "2" ((long)src)
+		: "memory");
 
+	return dest;
+}
+#endif
 
 static void error(char *x)
 {
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-08  6:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-08  1:47 [PATCH] X86/Mem: Use string copy operation to optimze copy in kernel compression yakui.zhao
2010-10-08  5:40 ` [tip:x86/setup] x86, setup: " tip-bot for Zhao Yakui
2010-10-08  5:52   ` Yinghai Lu
2010-10-08  6:21     ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2010-09-26  9:12 [PATCH] X86/Mem: " yakui.zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).