All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
@ 2009-09-28  9:34 Arjan van de Ven
  2009-09-28 12:21 ` Arjan van de Ven
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Arjan van de Ven @ 2009-09-28  9:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, tglx, hpa, torvalds


>From ebb81aab0c3df19771ebc0eec1261ae314ddc0af Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Mon, 28 Sep 2009 11:21:32 +0200
Subject: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy

GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
current existing functions, but for fixed size versions it'll inline
something smart. Quite often that will be the same as we have now,
but sometimes it can do something smarter (for example, if the code
then sets the first member of a struct, it can do a shorter memset).

In addition, and this is more important, gcc knows which registers and
such are not clobbered (while for our asm version it pretty much
acts like a compiler barrier), so for various cases it can avoid reloading
values.

The effect on codesize is shown below on my typical laptop .config:

   text	   data	    bss	    dec	    hex	filename
5605675	2041100	6525148	14171923	 d83f13	vmlinux.before
5595849	2041668	6525148	14162665	 d81ae9	vmlinux.after

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/string_32.h |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index ae907e6..23a0bc2 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -177,10 +177,7 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len)
  */
 
 #ifndef CONFIG_KMEMCHECK
-#define memcpy(t, f, n)				\
-	(__builtin_constant_p((n))		\
-	 ? __constant_memcpy((t), (f), (n))	\
-	 : __memcpy((t), (f), (n)))
+#define memcpy(t, f, n)	__builtin_memcpy(t, f, n)
 #else
 /*
  * kmemcheck becomes very happy if we use the REP instructions unconditionally,
@@ -316,11 +313,7 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 	 : __memset_generic((s), (c), (count)))
 
 #define __HAVE_ARCH_MEMSET
-#define memset(s, c, count)						\
-	(__builtin_constant_p(c)					\
-	 ? __constant_c_x_memset((s), (0x01010101UL * (unsigned char)(c)), \
-				 (count))				\
-	 : __memset((s), (c), (count)))
+#define memset(s, c, count) __builtin_memset(s, c, count)
 
 /*
  * find the first occurrence of byte 'c', or 1 past the area if none
-- 
1.6.2.5


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-28  9:34 [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy Arjan van de Ven
@ 2009-09-28 12:21 ` Arjan van de Ven
  2009-09-28 23:47   ` [tip:x86/asm] " tip-bot for Arjan van de Ven
  2009-09-29 12:44 ` [PATCH] " Arnd Bergmann
  2009-10-02 19:19 ` Andi Kleen
  2 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2009-09-28 12:21 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, mingo, tglx, hpa, torvalds

On Mon, 28 Sep 2009 11:34:33 +0200

turns out doing this unconditional is not a good idea due to gcc 3.x;
the updated patch below only does the change for gcc 4.x

>From e86cf2618a2e489cafcafb4361569c4d0853156e Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Mon, 28 Sep 2009 11:21:32 +0200
Subject: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy

GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
current existing functions, but for fixed size versions it'll inline
something smart. Quite often that will be the same as we have now,
but sometimes it can do something smarter (for example, if the code
then sets the first member of a struct, it can do a shorter memset).

In addition, and this is more important, gcc knows which registers and
such are not clobbered (while for our asm version it pretty much
acts like a compiler barrier), so for various cases it can avoid reloading
values.

The effect on codesize is shown below on my typical laptop .config:

   text	   data	    bss	    dec	    hex	filename
5605675	2041100	6525148	14171923	 d83f13	vmlinux.before
5595849	2041668	6525148	14162665	 d81ae9	vmlinux.after

Due to some not-so-good behavior in the gcc 3.x series, this change
is only done for GCC 4.x and above.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/string_32.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index ae907e6..3d3e835 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -177,10 +177,15 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len)
  */
 
 #ifndef CONFIG_KMEMCHECK
+
+#if (__GNUC__ >= 4)
+#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+#else
 #define memcpy(t, f, n)				\
 	(__builtin_constant_p((n))		\
 	 ? __constant_memcpy((t), (f), (n))	\
 	 : __memcpy((t), (f), (n)))
+#endif
 #else
 /*
  * kmemcheck becomes very happy if we use the REP instructions unconditionally,
@@ -316,11 +321,15 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 	 : __memset_generic((s), (c), (count)))
 
 #define __HAVE_ARCH_MEMSET
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#else
 #define memset(s, c, count)						\
 	(__builtin_constant_p(c)					\
 	 ? __constant_c_x_memset((s), (0x01010101UL * (unsigned char)(c)), \
 				 (count))				\
 	 : __memset((s), (c), (count)))
+#endif
 
 /*
  * find the first occurrence of byte 'c', or 1 past the area if none
-- 
1.6.2.5



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:x86/asm] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-28 12:21 ` Arjan van de Ven
@ 2009-09-28 23:47   ` tip-bot for Arjan van de Ven
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Arjan van de Ven @ 2009-09-28 23:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, arjan, hpa, mingo, arjan, tglx

Commit-ID:  ff60fab71bb3b4fdbf8caf57ff3739ffd0887396
Gitweb:     http://git.kernel.org/tip/ff60fab71bb3b4fdbf8caf57ff3739ffd0887396
Author:     Arjan van de Ven <arjan@infradead.org>
AuthorDate: Mon, 28 Sep 2009 14:21:22 +0200
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Mon, 28 Sep 2009 16:43:15 -0700

x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy

GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
current existing functions, but for fixed size versions it'll inline
something smart. Quite often that will be the same as we have now,
but sometimes it can do something smarter (for example, if the code
then sets the first member of a struct, it can do a shorter memset).

In addition, and this is more important, gcc knows which registers and
such are not clobbered (while for our asm version it pretty much
acts like a compiler barrier), so for various cases it can avoid reloading
values.

The effect on codesize is shown below on my typical laptop .config:

   text	   data	    bss	    dec	    hex	filename
5605675	2041100	6525148	14171923	 d83f13	vmlinux.before
5595849	2041668	6525148	14162665	 d81ae9	vmlinux.after

Due to some not-so-good behavior in the gcc 3.x series, this change
is only done for GCC 4.x and above.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090928142122.6fc57e9c@infradead.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


---
 arch/x86/include/asm/string_32.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/string_32.h b/arch/x86/include/asm/string_32.h
index ae907e6..3d3e835 100644
--- a/arch/x86/include/asm/string_32.h
+++ b/arch/x86/include/asm/string_32.h
@@ -177,10 +177,15 @@ static inline void *__memcpy3d(void *to, const void *from, size_t len)
  */
 
 #ifndef CONFIG_KMEMCHECK
+
+#if (__GNUC__ >= 4)
+#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+#else
 #define memcpy(t, f, n)				\
 	(__builtin_constant_p((n))		\
 	 ? __constant_memcpy((t), (f), (n))	\
 	 : __memcpy((t), (f), (n)))
+#endif
 #else
 /*
  * kmemcheck becomes very happy if we use the REP instructions unconditionally,
@@ -316,11 +321,15 @@ void *__constant_c_and_count_memset(void *s, unsigned long pattern,
 	 : __memset_generic((s), (c), (count)))
 
 #define __HAVE_ARCH_MEMSET
+#if (__GNUC__ >= 4)
+#define memset(s, c, count) __builtin_memset(s, c, count)
+#else
 #define memset(s, c, count)						\
 	(__builtin_constant_p(c)					\
 	 ? __constant_c_x_memset((s), (0x01010101UL * (unsigned char)(c)), \
 				 (count))				\
 	 : __memset((s), (c), (count)))
+#endif
 
 /*
  * find the first occurrence of byte 'c', or 1 past the area if none

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-28  9:34 [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy Arjan van de Ven
  2009-09-28 12:21 ` Arjan van de Ven
@ 2009-09-29 12:44 ` Arnd Bergmann
  2009-09-29 12:53   ` Arjan van de Ven
  2009-09-29 15:32   ` H. Peter Anvin
  2009-10-02 19:19 ` Andi Kleen
  2 siblings, 2 replies; 9+ messages in thread
From: Arnd Bergmann @ 2009-09-29 12:44 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, mingo, tglx, hpa, torvalds

On Monday 28 September 2009, Arjan van de Ven wrote:
> 
> GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
> and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
> current existing functions, but for fixed size versions it'll inline
> something smart. Quite often that will be the same as we have now,
> but sometimes it can do something smarter (for example, if the code
> then sets the first member of a struct, it can do a shorter memset).
> 
> In addition, and this is more important, gcc knows which registers and
> such are not clobbered (while for our asm version it pretty much
> acts like a compiler barrier), so for various cases it can avoid reloading
> values.
> 
> The effect on codesize is shown below on my typical laptop .config:
> 
>    text	   data	    bss	    dec	    hex	filename
> 5605675	2041100	6525148	14171923	 d83f13	vmlinux.before
> 5595849	2041668	6525148	14162665	 d81ae9	vmlinux.after
> 

The patch looks good, but is there a reason to keep it architecture
specific? I would guess that the same logic applies to all architectures
with gcc-4.x and could be put into include/linux/compiler-gcc4.h.

	Arnd <><

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-29 12:44 ` [PATCH] " Arnd Bergmann
@ 2009-09-29 12:53   ` Arjan van de Ven
  2009-09-29 15:32   ` H. Peter Anvin
  1 sibling, 0 replies; 9+ messages in thread
From: Arjan van de Ven @ 2009-09-29 12:53 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-kernel, mingo, tglx, hpa, torvalds

On Tue, 29 Sep 2009 14:44:06 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> > 
> 
> The patch looks good, but is there a reason to keep it architecture
> specific? I would guess that the same logic applies to all
> architectures with gcc-4.x and could be put into
> include/linux/compiler-gcc4.h.

there are some requirements on the architecture for this to work....
and as always with compiler things, the tradeoffs and how well it works
will vary per architecture.

If the architectures in large majority make this switch we could move it
generic... but it's a bit early for that.



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-29 12:44 ` [PATCH] " Arnd Bergmann
  2009-09-29 12:53   ` Arjan van de Ven
@ 2009-09-29 15:32   ` H. Peter Anvin
  1 sibling, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2009-09-29 15:32 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Arjan van de Ven, linux-kernel, mingo, tglx, torvalds

On 09/29/2009 05:44 AM, Arnd Bergmann wrote:
> 
> The patch looks good, but is there a reason to keep it architecture
> specific? I would guess that the same logic applies to all architectures
> with gcc-4.x and could be put into include/linux/compiler-gcc4.h.
> 

Each architecture has its own implementation, and in some cases gcc will
even generate code which is illegal in the kernel.  It really needs to
be enabled by each architecture.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-09-28  9:34 [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy Arjan van de Ven
  2009-09-28 12:21 ` Arjan van de Ven
  2009-09-29 12:44 ` [PATCH] " Arnd Bergmann
@ 2009-10-02 19:19 ` Andi Kleen
  2009-10-02 20:04   ` H. Peter Anvin
  2 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2009-10-02 19:19 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, mingo, tglx, hpa, torvalds

Arjan van de Ven <arjan@infradead.org> writes:

> From ebb81aab0c3df19771ebc0eec1261ae314ddc0af Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Mon, 28 Sep 2009 11:21:32 +0200
> Subject: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
>
> GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
> and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
> current existing functions, but for fixed size versions it'll inline
> something smart. Quite often that will be the same as we have now,
> but sometimes it can do something smarter (for example, if the code
> then sets the first member of a struct, it can do a shorter memset).
>
> In addition, and this is more important, gcc knows which registers and
> such are not clobbered (while for our asm version it pretty much
> acts like a compiler barrier), so for various cases it can avoid reloading
> values.
>
> The effect on codesize is shown below on my typical laptop .config:
>
>    text	   data	    bss	    dec	    hex	filename
> 5605675	2041100	6525148	14171923	 d83f13	vmlinux.before
> 5595849	2041668	6525148	14162665	 d81ae9	vmlinux.after

I tried this some time ago, but it it generates bad code on some 
gcc 3 versions.

You really need to test such kind of changes on a wide variety
of compilers, not assuming everyone uses the same version as you.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-10-02 19:19 ` Andi Kleen
@ 2009-10-02 20:04   ` H. Peter Anvin
  2009-10-02 20:12     ` Andi Kleen
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2009-10-02 20:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Arjan van de Ven, linux-kernel, mingo, tglx, torvalds

On 10/02/2009 12:19 PM, Andi Kleen wrote:
> 
> I tried this some time ago, but it it generates bad code on some 
> gcc 3 versions.
> 
> You really need to test such kind of changes on a wide variety
> of compilers, not assuming everyone uses the same version as you.
> 

The version that went in is for gcc 4 only.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  2009-10-02 20:04   ` H. Peter Anvin
@ 2009-10-02 20:12     ` Andi Kleen
  0 siblings, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2009-10-02 20:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, Arjan van de Ven, linux-kernel, mingo, tglx, torvalds

On Fri, Oct 02, 2009 at 01:04:25PM -0700, H. Peter Anvin wrote:
> On 10/02/2009 12:19 PM, Andi Kleen wrote:
> > 
> > I tried this some time ago, but it it generates bad code on some 
> > gcc 3 versions.
> > 
> > You really need to test such kind of changes on a wide variety
> > of compilers, not assuming everyone uses the same version as you.
> > 
> 
> The version that went in is for gcc 4 only.

Thanks.

My memory might be faulty, but iirc 4.0/4.1 was also slightly 
problematic. So better take a look at code for that too
if you haven't yet (just code size numbers are not enough)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-10-02 20:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-28  9:34 [PATCH] x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy Arjan van de Ven
2009-09-28 12:21 ` Arjan van de Ven
2009-09-28 23:47   ` [tip:x86/asm] " tip-bot for Arjan van de Ven
2009-09-29 12:44 ` [PATCH] " Arnd Bergmann
2009-09-29 12:53   ` Arjan van de Ven
2009-09-29 15:32   ` H. Peter Anvin
2009-10-02 19:19 ` Andi Kleen
2009-10-02 20:04   ` H. Peter Anvin
2009-10-02 20:12     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.