linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] arm: use built-in byte swap function
@ 2013-01-29  1:30 Kim Phillips
  2013-01-29  8:35 ` Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Kim Phillips @ 2013-01-29  1:30 UTC (permalink / raw)
  To: Russell King, Andrew Morton
  Cc: Daniel Santos, Borislav Petkov, David Rientjes, Rusty Russell,
	David Woodhouse, linux-arm-kernel, linux-kernel

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
at91rm9200, and lpd270 (so far, at least) builds fail with:

include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'

I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
20120303 (prerelease)

 arch/arm/Kconfig              |    1 +
 include/linux/compiler-gcc4.h |    3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index eda8711..437d11a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
 	default y
 	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select BUILDTIME_EXTABLE_SORT if MMU
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..da5f728 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -67,7 +67,8 @@
 
 
 #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
-#if GCC_VERSION >= 40400
+#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
+    (defined(__arm__) && GCC_VERSION >= 40600)
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
 #endif
-- 
1.7.9.7



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29  1:30 [RFC] arm: use built-in byte swap function Kim Phillips
@ 2013-01-29  8:35 ` Borislav Petkov
  2013-01-29 16:46   ` Woodhouse, David
  2013-01-29 14:13 ` Russell King - ARM Linux
  2013-01-29 14:53 ` Rob Herring
  2 siblings, 1 reply; 63+ messages in thread
From: Borislav Petkov @ 2013-01-29  8:35 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Russell King, Andrew Morton, Daniel Santos, David Rientjes,
	Rusty Russell, David Woodhouse, linux-arm-kernel, linux-kernel

On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
> 
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
> 
> based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
> currently in the akpm branch.
> 
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
> 
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
> 
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
> 
>  arch/arm/Kconfig              |    1 +
>  include/linux/compiler-gcc4.h |    3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
>  	default y
>  	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> +	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_HAVE_CUSTOM_GPIO_H
>  	select ARCH_WANT_IPC_PARSE_VERSION
>  	select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>  
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> +    (defined(__arm__) && GCC_VERSION >= 40600)

There should be no arch-specific stuff in a generic header. I guess
you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
compiler.h header after checking compiler version...

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29  1:30 [RFC] arm: use built-in byte swap function Kim Phillips
  2013-01-29  8:35 ` Borislav Petkov
@ 2013-01-29 14:13 ` Russell King - ARM Linux
  2013-01-29 14:43   ` Woodhouse, David
  2013-01-29 14:53 ` Rob Herring
  2 siblings, 1 reply; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-01-29 14:13 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Andrew Morton, Daniel Santos, Borislav Petkov, David Rientjes,
	Rusty Russell, David Woodhouse, linux-arm-kernel, linux-kernel

On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

Hmm.

$ /usr/local/aeabi/bin/arm-linux-gcc --version
arm-linux-gcc (GCC) 4.5.4
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
   echo $a:; \
   for f in 16 32 64; do \
      echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | \
      /usr/local/aeabi/bin/arm-linux-gcc -x c -S -o - - -march=$a | grep 'bl'; \
   done; \
done

produces:

armv3:
        bl __builtin_bswap16
armv4:
        bl __builtin_bswap16
armv4t:
        bl __builtin_bswap16
armv5t:
        bl __builtin_bswap16
armv5te:
        bl __builtin_bswap16
armv6k:
        bl __builtin_bswap16
armv6:
        bl __builtin_bswap16
armv7-a:
        bl __builtin_bswap16

So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
across all our architectures, but not the 16-bit ones.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 14:13 ` Russell King - ARM Linux
@ 2013-01-29 14:43   ` Woodhouse, David
  0 siblings, 0 replies; 63+ messages in thread
From: Woodhouse, David @ 2013-01-29 14:43 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kim Phillips, Andrew Morton, Daniel Santos, Borislav Petkov,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On Tue, 2013-01-29 at 14:13 +0000, Russell King - ARM Linux wrote:
> 
> So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
> across all our architectures, but not the 16-bit ones.

That observation is consistent with my dig through GCC history. I had
come to the conclusion that the 32-bit and 64-bit versions were added
*generically* in 4.4, and that the 16-bit version was added in 4.6 to
that PowerPC back end, and made generic in 4.8. So I *had* put that
arch-specific check into compiler-gcc4.h, for PowerPC. It's just outside
the context of Kim's patch. If it really does end up being different for
every arch, I may reconsider that.

As for the __bswapsi2() calls... if it's ever emitting an out-of-line
call for something like that, that seems like a really dubious decision;
surely it's better being inlined? So rather than adding it to your
bits-of-libgcc.a in the kernel, I'd suggest just *not* using
ARCH_USE_BUILTIN_BSWAP for the offending compilers, and filing a bug to
get them fixed.

But really, this is why I created ARCH_USE_BUILTIN_BSWAP and left it to
architecture maintainers to enable it at their leisure.... :)

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29  1:30 [RFC] arm: use built-in byte swap function Kim Phillips
  2013-01-29  8:35 ` Borislav Petkov
  2013-01-29 14:13 ` Russell King - ARM Linux
@ 2013-01-29 14:53 ` Rob Herring
  2013-01-29 15:10   ` Woodhouse, David
  2 siblings, 1 reply; 63+ messages in thread
From: Rob Herring @ 2013-01-29 14:53 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Russell King, Andrew Morton, David Woodhouse, Rusty Russell,
	linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes,
	linux-arm-kernel

On 01/28/2013 07:30 PM, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
> 
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

I think these are the versions ARM got optimized swap support. The newer
compilers are smart enough to replace a swap written in C with a rev
instruction. Last time I checked, they did not do rev16, but that is
probably what was added in 4.8 (and backported to Linaro gcc). The
Linaro backports make it impossible to define what gcc version has a
specific feature.

If you specify to use the builtin's, do you still get inline rev
instructions emitted?

Rob

> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
> 
> based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
> currently in the akpm branch.
> 
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
> 
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
> 
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
> 
>  arch/arm/Kconfig              |    1 +
>  include/linux/compiler-gcc4.h |    3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
>  	default y
>  	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> +	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_HAVE_CUSTOM_GPIO_H
>  	select ARCH_WANT_IPC_PARSE_VERSION
>  	select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>  
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> +    (defined(__arm__) && GCC_VERSION >= 40600)
>  #define __HAVE_BUILTIN_BSWAP32__
>  #define __HAVE_BUILTIN_BSWAP64__
>  #endif
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 14:53 ` Rob Herring
@ 2013-01-29 15:10   ` Woodhouse, David
  0 siblings, 0 replies; 63+ messages in thread
From: Woodhouse, David @ 2013-01-29 15:10 UTC (permalink / raw)
  To: Rob Herring
  Cc: Kim Phillips, Russell King, Andrew Morton, Rusty Russell,
	linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Tue, 2013-01-29 at 08:53 -0600, Rob Herring wrote:
> If you specify to use the builtin's, do you still get inline rev
> instructions emitted?

You mean from the architecture's __arch_swab32() et al. macros?

No. If the architecture enables ARCH_USE_BUILTIN_BSWAP and the compiler
version checks indicate that the correspondingly-sized builtin is
available, then the builtin will be used. Only if the arch *doesn't* set
ARCH_USE_BUILTIN_BSWAP, or if the compiler is old enough not to have the
corresponding intrinsic, will the inline assembler in the __arch_swabXX
macros get used.

Note, however, that there is no such compiler intrinsic for swahb32(),
which is where rev16 gets used. That's still being left to the inline
asm in all cases.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29  8:35 ` Borislav Petkov
@ 2013-01-29 16:46   ` Woodhouse, David
  2013-01-29 17:42     ` Borislav Petkov
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-01-29 16:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On Tue, 2013-01-29 at 09:35 +0100, Borislav Petkov wrote:
> 
> >  
> >  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> > -#if GCC_VERSION >= 40400
> > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> > +    (defined(__arm__) && GCC_VERSION >= 40600)
> 
> There should be no arch-specific stuff in a generic header. I guess
> you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
> compiler.h header after checking compiler version...

If we're really going to have many different architectures depending on
different versions of GCC for this (if it wasn't sane to use it from
4.4/4.8 when it got introduced, and depends on some later arch-specific
optimisation), then perhaps we'll have the arch provide the
corresponding required GCC_VERSION for using each of 64/32/16 bit
builtins, instead of just a yes/no flag? Or just define
__HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

In fact we could start by having just the problematic architectures set
__HAVE_BUILTIN_BSWAPxx__ for themselves according to whatever criteria
they want, and then if there's scope for consolidating that in the
generic code then we can do so later.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 16:46   ` Woodhouse, David
@ 2013-01-29 17:42     ` Borislav Petkov
  2013-01-29 17:55       ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Borislav Petkov @ 2013-01-29 17:42 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> If we're really going to have many different architectures depending
> on different versions of GCC for this (if it wasn't sane to use
> it from 4.4/4.8 when it got introduced, and depends on some later
> arch-specific optimisation), then perhaps we'll have the arch
> provide the corresponding required GCC_VERSION for using each of
> 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

Damn, there's already the __powerpc__ thing in there.

Yeah, something like defininig __HAVE_BUILTIN_BSWAPxx__ makes sense and
can keep the header arch-agnostic without growing all those different
arch defines.

But I liked your other suggestion better to get the offending compilers
fixed.

I dunno though, how generically is stuff like that getting implemented
for every arch so probably single arches doing __HAVE* defines is
probably going to be the realizable solution in the end.

Hmmm.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 17:42     ` Borislav Petkov
@ 2013-01-29 17:55       ` Woodhouse, David
  2013-01-29 18:10         ` Borislav Petkov
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-01-29 17:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On Tue, 2013-01-29 at 18:42 +0100, Borislav Petkov wrote:
> On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> > If we're really going to have many different architectures depending
> > on different versions of GCC for this (if it wasn't sane to use
> > it from 4.4/4.8 when it got introduced, and depends on some later
> > arch-specific optimisation), then perhaps we'll have the arch
> > provide the corresponding required GCC_VERSION for using each of
> > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?
> 
> Damn, there's already the __powerpc__ thing in there.

That's different though. That's because GCC didn't have a generic
__builtin_bswap16() until 4.8, while PowerPC got it in 4.6.

That's a relatively simple and manageable one-off arch-dependency. But
once we get into a mass of "well, it wasn't actually *usable* for ARM
until 4.9", we start wanting to push it into arch code.

> But I liked your other suggestion better to get the offending
> compilers fixed.

That wasn't an *alternative*. It's required if the compiler is doing
something suboptimal, whatever happens. And then we start to *use* the
compiler from the first version that's known to be fixed.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 17:55       ` Woodhouse, David
@ 2013-01-29 18:10         ` Borislav Petkov
  2013-01-30 10:22           ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Borislav Petkov @ 2013-01-29 18:10 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

On Tue, Jan 29, 2013 at 05:55:54PM +0000, Woodhouse, David wrote:
> That's different though. That's because GCC didn't have a generic
> __builtin_bswap16() until 4.8, while PowerPC got it in 4.6.
> 
> That's a relatively simple and manageable one-off arch-dependency. But
> once we get into a mass of "well, it wasn't actually *usable* for ARM
> until 4.9", we start wanting to push it into arch code.

Yeah, and once people see one arch mentioned, all of a sudden they think
it is ok to add just one more ...

> > But I liked your other suggestion better to get the offending
> > compilers fixed.
> 
> That wasn't an *alternative*. It's required if the compiler is doing
> something suboptimal, whatever happens. And then we start to *use* the
> compiler from the first version that's known to be fixed.

So, IMHO it sounds to me like we want to explicitly state for each arch
separately that it is ok to use the __builtin_bswapXX things. This also
takes care of the case where the compiler is doing something suboptimal
by excluding the affected versions.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-29 18:10         ` Borislav Petkov
@ 2013-01-30 10:22           ` Woodhouse, David
  2013-01-31  2:09             ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-01-30 10:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote:
> So, IMHO it sounds to me like we want to explicitly state for each arch
> separately that it is ok to use the __builtin_bswapXX things. This also
> takes care of the case where the compiler is doing something suboptimal
> by excluding the affected versions.

Well, if it really does end up being different for every architecture,
then that means I probably made the wrong decision when I chose to make
it "generic", and override the __arch_swabXX() macros. I could have just
pushed all the architectures to use the builtins in their __arch_swabXX
macros instead, as appropriate.

Let's see how many special cases we actually end up with, and perhaps
we'll end up switching that round. For now, let's just make ARM set
__HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>,
according to whatever criteria it needs.

It's not entirely clear how much of a win it is on ARM anyway; we don't
have load-and-swap or store-and-swap instructions so there are only a
few added opportunities for optimisation that we get by letting the
compiler see what's going on.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-30 10:22           ` Woodhouse, David
@ 2013-01-31  2:09             ` Kim Phillips
  2013-01-31  6:44               ` Borislav Petkov
  2013-01-31  9:28               ` Russell King - ARM Linux
  0 siblings, 2 replies; 63+ messages in thread
From: Kim Phillips @ 2013-01-31  2:09 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Borislav Petkov, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Wed, 30 Jan 2013 10:22:15 +0000
"Woodhouse, David" <david.woodhouse@intel.com> wrote:

> On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote:
> > So, IMHO it sounds to me like we want to explicitly state for each arch
> > separately that it is ok to use the __builtin_bswapXX things. This also
> > takes care of the case where the compiler is doing something suboptimal
> > by excluding the affected versions.
> 
> Well, if it really does end up being different for every architecture,
> then that means I probably made the wrong decision when I chose to make
> it "generic", and override the __arch_swabXX() macros. I could have just
> pushed all the architectures to use the builtins in their __arch_swabXX
> macros instead, as appropriate.
> 
> Let's see how many special cases we actually end up with, and perhaps
> we'll end up switching that round. For now, let's just make ARM set
> __HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>,
> according to whatever criteria it needs.

thanks - I've attempted to do this - see v2 patch below.

> It's not entirely clear how much of a win it is on ARM anyway; we don't
> have load-and-swap or store-and-swap instructions so there are only a
> few added opportunities for optimisation that we get by letting the
> compiler see what's going on.

I've added some text size figures to the patch description.  
They are indeed very small, but it should help drivers for
big-endian devices.

>From 31df859be202f00d017b707649e7709281994d15 Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size:

multi_v7_defconfig:
   text    data     bss     dec     hex filename
3135208  188396  203344 3526948  35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
3135112  188396  203344 3526852  35d0c4 vmlinux

exynos_defconfig:
   text    data     bss     dec     hex filename
4286605  360564  223172 4870341  4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
4286405  360564  223172 4870141  4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128.  Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
  (i.e., ARM cores that have support for the 'rev' instruction).
  Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
  these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
  All ARM defconfigs now have the same build status as they did
  without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
  - pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
  - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
  - not too sure about this having to be a new CONFIG_, but it's hard
    to find a place for it given linux/compiler.h doesn't include any
    arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
  as is done in David Woodhouse's original patchseries for ppc/x86.

 arch/Kconfig                     |   10 ++++++++++
 arch/arm/Kconfig                 |    2 ++
 arch/arm/include/uapi/asm/swab.h |   10 ++++++++++
 include/linux/compiler-gcc4.h    |    3 ++-
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 40e2b12..c8798b9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP
 	 instructions should set this. And it shouldn't hurt to set it
 	 on architectures that don't have such instructions.
 
+config ARCH_DEFINES_BUILTIN_BSWAP
+       depends on ARCH_USE_BUILTIN_BSWAP
+       bool
+       help
+	 ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx
+	 definitions over those in the generic compiler headers.  It
+	 can be dependent on a combination of byte swapping instruction
+	 availability, the instruction set version, and the state
+	 of support in different compiler versions.
+
 config HAVE_SYSCALL_WRAPPERS
 	bool
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73027aa..b5868c2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -57,6 +57,8 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
+	select ARCH_DEFINES_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/include/uapi/asm/swab.h b/arch/arm/include/uapi/asm/swab.h
index 6fcb32a..5d86ed0 100644
--- a/arch/arm/include/uapi/asm/swab.h
+++ b/arch/arm/include/uapi/asm/swab.h
@@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
 
 #endif
 
+#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#endif
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
 #endif /* _UAPI__ASM_ARM_SWAB_H */
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..fce39cb 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -66,7 +66,8 @@
 #endif
 
 
-#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
+#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \
+    !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP)
 #if GCC_VERSION >= 40400
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
-- 
1.7.9.7



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31  2:09             ` Kim Phillips
@ 2013-01-31  6:44               ` Borislav Petkov
  2013-01-31  9:28               ` Russell King - ARM Linux
  1 sibling, 0 replies; 63+ messages in thread
From: Borislav Petkov @ 2013-01-31  6:44 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 40e2b12..c8798b9 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP
>  	 instructions should set this. And it shouldn't hurt to set it
>  	 on architectures that don't have such instructions.
>  
> +config ARCH_DEFINES_BUILTIN_BSWAP
> +       depends on ARCH_USE_BUILTIN_BSWAP
> +       bool
> +       help
> +	 ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx

This is what threw me off: if ARCH selects this, I don't think we want
to have a help text and the option be user visible? Normally such
options are only bool and are selected automatically by hm.. ARCH, as
you say above and do so below. :-)

> +	 definitions over those in the generic compiler headers.  It
> +	 can be dependent on a combination of byte swapping instruction
> +	 availability, the instruction set version, and the state
> +	 of support in different compiler versions.
> +
>  config HAVE_SYSCALL_WRAPPERS
>  	bool
>  
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 73027aa..b5868c2 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -57,6 +57,8 @@ config ARM
>  	select CLONE_BACKWARDS
>  	select OLD_SIGSUSPEND3
>  	select OLD_SIGACTION
> +	select ARCH_USE_BUILTIN_BSWAP
> +	select ARCH_DEFINES_BUILTIN_BSWAP
>  	help

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31  2:09             ` Kim Phillips
  2013-01-31  6:44               ` Borislav Petkov
@ 2013-01-31  9:28               ` Russell King - ARM Linux
  2013-01-31 20:59                 ` Kim Phillips
  1 sibling, 1 reply; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-01-31  9:28 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> The savings come mostly from device-tree related code, and some
> from drivers.

You forget that IP networking is all big endian, so these will be using
the byte swapping too (search it for htons/ntohs/htonl/ntohl).

> v2:
> - at91 and lpd270 builds fixed by limiting to ARMv6 and above
>   (i.e., ARM cores that have support for the 'rev' instruction).
>   Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
>   these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).

Which compiler version?  gcc 4.5.4 doesn't do this, except for the 16-bit
swap, so I doubt that any later compiler does.

> --- a/arch/arm/include/uapi/asm/swab.h
> +++ b/arch/arm/include/uapi/asm/swab.h
> @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
>  
>  #endif
>  
> +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
> +#if GCC_VERSION >= 40600
> +#define __HAVE_BUILTIN_BSWAP32__
> +#define __HAVE_BUILTIN_BSWAP64__
> +#endif
> +#if GCC_VERSION >= 40800
> +#define __HAVE_BUILTIN_BSWAP16__
> +#endif
> +#endif

If this is __KERNEL__ only, it should not be in a uapi header.  UAPI
headers get exported to userland, this is not userland interface code.
IT should be in arch/arm/include/asm/swab.h

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31  9:28               ` Russell King - ARM Linux
@ 2013-01-31 20:59                 ` Kim Phillips
  2013-01-31 21:33                   ` Borislav Petkov
                                     ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Kim Phillips @ 2013-01-31 20:59 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Thu, 31 Jan 2013 09:28:01 +0000
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> > The savings come mostly from device-tree related code, and some
> > from drivers.
> 
> You forget that IP networking is all big endian, so these will be using
> the byte swapping too (search it for htons/ntohs/htonl/ntohl).

As David mentioned, there isn't much gain from net/ code.

> > v2:
> > - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> >   (i.e., ARM cores that have support for the 'rev' instruction).
> >   Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> >   these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
> 
> Which compiler version?  gcc 4.5.4 doesn't do this, except for the 16-bit
> swap, so I doubt that any later compiler does.

I've tried both gcc 4.6.3 [1] and 4.6.4 [2].  If you can point me to
a 4.5.x, I'll try that, too, but as it stands now, if one moves the
code added to swab.h below outside of its armv6 protection,
gcc adds calls to __bswapsi2.

> > --- a/arch/arm/include/uapi/asm/swab.h
> > +++ b/arch/arm/include/uapi/asm/swab.h
> > @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
> >  
> >  #endif
> >  
> > +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
> > +#if GCC_VERSION >= 40600
> > +#define __HAVE_BUILTIN_BSWAP32__
> > +#define __HAVE_BUILTIN_BSWAP64__
> > +#endif
> > +#if GCC_VERSION >= 40800
> > +#define __HAVE_BUILTIN_BSWAP16__
> > +#endif
> > +#endif
> 
> If this is __KERNEL__ only, it should not be in a uapi header.  UAPI
> headers get exported to userland, this is not userland interface code.
> IT should be in arch/arm/include/asm/swab.h

right, I've fixed this and Boris' remove the help text comment, and
made a v3:

>From 18c86580efba42d2680f2947867722705292f80a Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select
it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over
those in the generic compiler headers.  It can be dependent on a
combination of byte swapping instruction availability, the
instruction set version, and the state of support in different
compiler versions.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
   text    data     bss     dec     hex filename
3135208  188396  203344 3526948  35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
3135112  188396  203344 3526852  35d0c4 vmlinux

exynos_defconfig:
   text    data     bss     dec     hex filename
4286605  360564  223172 4870341  4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
4286405  360564  223172 4870141  4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128.  Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
  (i.e., ARM cores that have support for the 'rev' instruction).
  Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
  these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
  All ARM defconfigs now have the same build status as they did
  without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
  - pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
  - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
  - not too sure about this having to be a new CONFIG_, but it's hard
    to find a place for it given linux/compiler.h doesn't include any
    arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
  as is done in David Woodhouse's original patchseries for ppc/x86.

 arch/Kconfig                  |    4 ++++
 arch/arm/Kconfig              |    2 ++
 arch/arm/include/asm/swab.h   |    8 ++++++++
 include/linux/compiler-gcc4.h |    3 ++-
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 40e2b12..bc5ed77 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -141,6 +141,10 @@ config ARCH_USE_BUILTIN_BSWAP
 	 instructions should set this. And it shouldn't hurt to set it
 	 on architectures that don't have such instructions.
 
+config ARCH_DEFINES_BUILTIN_BSWAP
+       depends on ARCH_USE_BUILTIN_BSWAP
+       bool
+
 config HAVE_SYSCALL_WRAPPERS
 	bool
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73027aa..b5868c2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -57,6 +57,8 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
+	select ARCH_DEFINES_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..e56acff 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
 }
 #define __arch_swab32 __arch_swab32
 
+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
 #endif
 #endif
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..fce39cb 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -66,7 +66,8 @@
 #endif
 
 
-#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
+#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \
+    !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP)
 #if GCC_VERSION >= 40400
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
-- 
1.7.9.7

Thanks,

Kim

[1] http://kernel.org/pub/tools/crosstool/files/bin/x86_64/

[2] http://ftp.denx.de/pub/eldk/5.2.1/targets/armv7a/


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31 20:59                 ` Kim Phillips
@ 2013-01-31 21:33                   ` Borislav Petkov
  2013-01-31 22:11                   ` Woodhouse, David
  2013-02-01  1:17                   ` [RFC] " Russell King - ARM Linux
  2 siblings, 0 replies; 63+ messages in thread
From: Borislav Petkov @ 2013-01-31 21:33 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Russell King - ARM Linux, Woodhouse, David, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote:
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
>   - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
>   - not too sure about this having to be a new CONFIG_, but it's hard
>     to find a place for it given linux/compiler.h doesn't include any
>     arch-specific files.

Yeah, me neither. It seems to me the whole deal can be simplified even
further without introducing CONFIG_ARCH_DEFINES_BUILTIN_BSWAP.

And, we don't even want to use CONFIG_ARCH_USE_BUILTIN_BSWAP on arm due
to different compiler versions supporting it (correct me if I'm wrong
here) vs the generic thing in include/linux/compiler-gcc4.h which we
want off.

If so, you'd need to simply put the following from below in
arch/arm/include/asm/swab.h

#if GCC_VERSION >= 40600
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
#if GCC_VERSION >= 40800
#define __HAVE_BUILTIN_BSWAP16__
#endif /* GCC_VERSION >= 40800 */
#endif /* GCC_VERSION >= 40600 */

and that's it.

Makes sense or am I over-simplifying this?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31 20:59                 ` Kim Phillips
  2013-01-31 21:33                   ` Borislav Petkov
@ 2013-01-31 22:11                   ` Woodhouse, David
  2013-02-01  0:37                     ` [PATCH v4] " Kim Phillips
  2013-02-01  1:17                   ` [RFC] " Russell King - ARM Linux
  2 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-01-31 22:11 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On Thu, 2013-01-31 at 14:59 -0600, Kim Phillips wrote:
> 
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).

Ick, no.

>   - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx

It won't do that anyway if !ARCH_USE_BUILTIN_BSWAP. I don't see the
point in adding a new config option just for this.

If you want to define __HAVE_BUILTIN_BSWAPxx__ for yourself manually,
just go ahead and do so. As I said, if lots of architectures end up
doing it then we'll worry about cleaning things up when we've got a
better picture of who needs what.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v4] arm: use built-in byte swap function
  2013-01-31 22:11                   ` Woodhouse, David
@ 2013-02-01  0:37                     ` Kim Phillips
  2013-02-01 10:46                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-01  0:37 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

>From 1490bd8823c05e0dda982524bb70cb6c6427ddf9 Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select
it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over
those in the generic compiler headers.  It can be dependent on a
combination of byte swapping instruction availability, the
instruction set version, and the state of support in different
compiler versions.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
   text    data     bss     dec     hex filename
3135208  188396  203344 3526948  35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
3135112  188396  203344 3526852  35d0c4 vmlinux

exynos_defconfig:
   text    data     bss     dec     hex filename
4286605  360564  223172 4870341  4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
4286405  360564  223172 4870141  4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128.  Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

v4:
- undo v2-3's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
  and David - patch is much less intrusive :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
  (i.e., ARM cores that have support for the 'rev' instruction).
  Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
  these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
  All ARM defconfigs now have the same build status as they did
  without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
  - pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
  - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
  - not too sure about this having to be a new CONFIG_, but it's hard
    to find a place for it given linux/compiler.h doesn't include any
    arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
  as is done in David Woodhouse's original patchseries for ppc/x86.

 arch/arm/include/asm/swab.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..e56acff 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
 }
 #define __arch_swab32 __arch_swab32
 
+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
 #endif
 #endif
-- 
1.7.9.7



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-01-31 20:59                 ` Kim Phillips
  2013-01-31 21:33                   ` Borislav Petkov
  2013-01-31 22:11                   ` Woodhouse, David
@ 2013-02-01  1:17                   ` Russell King - ARM Linux
  2013-02-01  7:33                     ` Woodhouse, David
  2 siblings, 1 reply; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-02-01  1:17 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote:
> On Thu, 31 Jan 2013 09:28:01 +0000
> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
> > On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> > > v2:
> > > - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> > >   (i.e., ARM cores that have support for the 'rev' instruction).
> > >   Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> > >   these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
> > 
> > Which compiler version?  gcc 4.5.4 doesn't do this, except for the 16-bit
> > swap, so I doubt that any later compiler does.
> 
> I've tried both gcc 4.6.3 [1] and 4.6.4 [2].  If you can point me to
> a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> code added to swab.h below outside of its armv6 protection,
> gcc adds calls to __bswapsi2.

Take a look at the message I sent on the 29th towards the beginning of
this thread for details of gcc 4.5.4 behaviour.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-01  1:17                   ` [RFC] " Russell King - ARM Linux
@ 2013-02-01  7:33                     ` Woodhouse, David
  2013-02-06  3:04                       ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-01  7:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote:
> 
> > I've tried both gcc 4.6.3 [1] and 4.6.4 [2].  If you can point me to
> > a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> > code added to swab.h below outside of its armv6 protection,
> > gcc adds calls to __bswapsi2.
> 
> Take a look at the message I sent on the 29th towards the beginning of
> this thread for details of gcc 4.5.4 behaviour.

I'd like to see a comment (with PR# if appropriate) explaining clearly
*why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler.

Russell's test also seemed to indicate that the 32-bit and 64-bit swap
support was present and functional in GCC 4.5.4 (as indeed it should
have been since 4.4), so I'm still not quite sure why you require 4.6
for that.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v4] arm: use built-in byte swap function
  2013-02-01  0:37                     ` [PATCH v4] " Kim Phillips
@ 2013-02-01 10:46                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-02-01 10:46 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Thu, Jan 31, 2013 at 06:37:47PM -0600, Kim Phillips wrote:
> +#if GCC_VERSION >= 40600
> +#define __HAVE_BUILTIN_BSWAP32__
> +#define __HAVE_BUILTIN_BSWAP64__

You really aren't listening to anything that's been said to you.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-01  7:33                     ` Woodhouse, David
@ 2013-02-06  3:04                       ` Kim Phillips
  2013-02-06  9:02                         ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-06  3:04 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

On Fri, 1 Feb 2013 07:33:17 +0000
"Woodhouse, David" <david.woodhouse@intel.com> wrote:

> On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote:
> > 
> > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2].  If you can point me to
> > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> > > code added to swab.h below outside of its armv6 protection,
> > > gcc adds calls to __bswapsi2.
> > 
> > Take a look at the message I sent on the 29th towards the beginning of
> > this thread for details of gcc 4.5.4 behaviour.
> 
> I'd like to see a comment (with PR# if appropriate) explaining clearly
> *why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler.

ok I think I've figured it out: the difference in the defconfigs that
fail (at91_dt, at91sam9g45, and lpc32xx) is that they are ARM9's
(armv4/5), have CC_OPTIMIZE_FOR_SIZE set, and have code with
multiple swaps ready for space optimization: gcc -Os emits calls
to __bswapsi2 on those platforms to save space because they don't
have the single rev byte swap instruction.

> Russell's test also seemed to indicate that the 32-bit and 64-bit swap
> support was present and functional in GCC 4.5.4 (as indeed it should
> have been since 4.4), so I'm still not quite sure why you require 4.6
> for that.

initially it was based at looking at gcc commit history for the
'rev' instruction implementation, but now I've got 4.4, 4.5, 4.6 and
4.7 compilers to perform Russell's test:

$ for cc in 4.4 4.5 4.6 4.7; do \
    arm-linux-gnueabi-gcc-$cc --version | grep gcc ; \
    for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
      echo -n $a:; \
      for f in 16 32 64; do \
         echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | arm-linux-gnueabi-gcc-$cc -w -x c -S -o - - -march=$a | grep 'bl'; \
      done; \
    done; \
done

whose output is:

arm-linux-gnueabi-gcc-4.4 (Ubuntu/Linaro 4.4.7-1ubuntu2) 4.4.7
armv3:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv4:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv4t:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv5t:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv5te:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv6k:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv6:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
armv7-a:   	bl	__builtin_bswap16
	bl	__bswapsi2
	bl	__bswapdi2
arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3
armv3:   	bl	__builtin_bswap16
armv4:   	bl	__builtin_bswap16
armv4t:   	bl	__builtin_bswap16
armv5t:   	bl	__builtin_bswap16
armv5te:   	bl	__builtin_bswap16
armv6k:   	bl	__builtin_bswap16
armv6:   	bl	__builtin_bswap16
armv7-a:   	bl	__builtin_bswap16
arm-linux-gnueabi-gcc-4.6 (Ubuntu/Linaro 4.6.3-8ubuntu1) 4.6.3 20120624 (prerelease)
armv3:   	bl	__builtin_bswap16
armv4:   	bl	__builtin_bswap16
armv4t:   	bl	__builtin_bswap16
armv5t:   	bl	__builtin_bswap16
armv5te:   	bl	__builtin_bswap16
armv6k:   	bl	__builtin_bswap16
armv6:   	bl	__builtin_bswap16
armv7-a:   	bl	__builtin_bswap16
arm-linux-gnueabi-gcc-4.7 (Ubuntu/Linaro 4.7.2-1ubuntu1) 4.7.2
armv3:   	bl	__builtin_bswap16
armv4:   	bl	__builtin_bswap16
armv4t:   	bl	__builtin_bswap16
armv5t:   	bl	__builtin_bswap16
armv5te:   	bl	__builtin_bswap16
armv6k:   	bl	__builtin_bswap16
armv6:   	bl	__builtin_bswap16
armv7-a:   	bl	__builtin_bswap16

So 4.4 should be exempt from using the built-ins because it always
emits __bswapsi2 calls: it doesn't matter whether or not -Os or -O2
are added as options in the test.

gcc 4.5, 4.6, and 4.7 all support 32 & 64-bit versions, so we
should check for gcc >= 4.5 instead of gcc >= 4.6.

I've added a new check for !CC_OPTIMIZE_FOR_SIZE and build-tested all
defconfigs with gcc 4.6.3 - here's v5:

>From 11aa942a84fe94d204424a19b6b13fdb2b359ee6 Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings.

__builtin_bswap{32,64} support was added in gcc 4.4, but until
gcc 4.5, it emitted calls to libgcc's __bswap[sd]i2 (even
with -O2).

All gcc versions tested (4.[4567]) emit calls to __bswap[sd]i2 when
optimizing for size, so we add the !CC_OPTIMIZE_FOR_SIZE check.

Support for 16-bit built-ins will be in gcc version 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
   text    data     bss     dec     hex filename
3135208  188396  203344 3526948  35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
3135112  188396  203344 3526852  35d0c4 vmlinux

exynos_defconfig:
   text    data     bss     dec     hex filename
4286605  360564  223172 4870341  4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
   text    data     bss     dec     hex filename
4286405  360564  223172 4870141  4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128.  Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

v5: re-work based on new gcc version test data:
  - moved outside armv6 protection
  - check for gcc 4.6+ demoted to gcc 4.5+ with:
    !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)

v4:
- undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
  and David - object is to find arches that define _HAVE_BSWAP
  and clean it up in the future: patch is much less intrusive. :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
  (i.e., ARM cores that have support for the 'rev' instruction).
  Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
  these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
  All ARM defconfigs now have the same build status as they did
  without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
  - pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
  - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
  - not too sure about this having to be a new CONFIG_, but it's hard
    to find a place for it given linux/compiler.h doesn't include any
    arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
  as is done in David Woodhouse's original patchseries for ppc/x86.

 arch/arm/include/asm/swab.h |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..159ab16 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -35,4 +35,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
 #define __arch_swab32 __arch_swab32
 
 #endif
+
+#if !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) && GCC_VERSION >= 40500
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
 #endif
-- 
1.7.9.7

Thanks,

Kim

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44392
[2] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-06  3:04                       ` Kim Phillips
@ 2013-02-06  9:02                         ` Woodhouse, David
  2013-02-07  1:19                           ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-06  9:02 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 689 bytes --]

On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
> gcc -Os emits calls to __bswapsi2 on those platforms to save space
> because they don't have the single rev byte swap instruction.

Is that the right thing for GCC to do in that situation?

If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
to use?

If not, perhaps there should be a PR filed?

Or is our use case justifiably different to the general case of '-Os'?
If so, why?

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-06  9:02                         ` Woodhouse, David
@ 2013-02-07  1:19                           ` Kim Phillips
  2013-02-07 10:19                             ` Will Newton
  2013-02-07 18:13                             ` Russell King - ARM Linux
  0 siblings, 2 replies; 63+ messages in thread
From: Kim Phillips @ 2013-02-07  1:19 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

On Wed, 6 Feb 2013 09:02:04 +0000
"Woodhouse, David" <david.woodhouse@intel.com> wrote:

> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
> > because they don't have the single rev byte swap instruction.
> 
> Is that the right thing for GCC to do in that situation?

if it saves space, why wouldn't it be?

"Many of these functions are only optimized in certain cases; if they
are not optimized in a particular case, a call to the library
function is emitted." [1]

I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
"bswapsi2" source, so GCC considers size optimization as a
legitimate one of those cases.

> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
> to use?

either that, or link with libgcc - why does arch/arm64 do this and
arch/arm not?  It's not obvious from git log.

> If not, perhaps there should be a PR filed?
> 
> Or is our use case justifiably different to the general case of '-Os'?
> If so, why?

shouldn't be - a patch, such as this, that claims to reduce code
size, and that only turns on the new built-in when
CC_OPTIMIZE_FOR_SIZE is off, is generally not good :)

OTOH, the target here is armv6+ performance - not armv4,5 code
density - the OPTIMIZE_FOR_SIZE protection prevents armv4,5 build
breakage.

Kim

[1] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-07  1:19                           ` Kim Phillips
@ 2013-02-07 10:19                             ` Will Newton
  2013-02-07 10:43                               ` Catalin Marinas
  2013-02-07 18:13                             ` Russell King - ARM Linux
  1 sibling, 1 reply; 63+ messages in thread
From: Will Newton @ 2013-02-07 10:19 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <kim.phillips@freescale.com> wrote:
> On Wed, 6 Feb 2013 09:02:04 +0000
> "Woodhouse, David" <david.woodhouse@intel.com> wrote:
>
>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
>> > because they don't have the single rev byte swap instruction.
>>
>> Is that the right thing for GCC to do in that situation?
>
> if it saves space, why wouldn't it be?
>
> "Many of these functions are only optimized in certain cases; if they
> are not optimized in a particular case, a call to the library
> function is emitted." [1]
>
> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
> "bswapsi2" source, so GCC considers size optimization as a
> legitimate one of those cases.
>
>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
>> to use?
>
> either that, or link with libgcc - why does arch/arm64 do this and
> arch/arm not?  It's not obvious from git log.

One reason I have found, I don't know if it is the canonical one, is
that linking with libgcc allows people to use all intrinsics e.g. soft
float routines in the kernel without noticing it. If you limit the
intrinsics to the ones linked into the kernel explicitly then this
cannot happen.

I have also seen cases where the libgcc intrinsics are improved over
time, having the code in the kernel allows these improvements to be
rolled into the kernel even if the user has an older toolchain.

A number of ports link against libgcc and a roughly equal number do
not, so it isn't clear that there's any consensus on the issue.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-07 10:19                             ` Will Newton
@ 2013-02-07 10:43                               ` Catalin Marinas
  0 siblings, 0 replies; 63+ messages in thread
From: Catalin Marinas @ 2013-02-07 10:43 UTC (permalink / raw)
  To: Will Newton
  Cc: Kim Phillips, Woodhouse, David, Russell King - ARM Linux,
	Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes,
	Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring

On 7 February 2013 10:19, Will Newton <will.newton@gmail.com> wrote:
> On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <kim.phillips@freescale.com> wrote:
>> On Wed, 6 Feb 2013 09:02:04 +0000
>> "Woodhouse, David" <david.woodhouse@intel.com> wrote:
>>
>>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
>>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
>>> > because they don't have the single rev byte swap instruction.
>>>
>>> Is that the right thing for GCC to do in that situation?
>>
>> if it saves space, why wouldn't it be?
>>
>> "Many of these functions are only optimized in certain cases; if they
>> are not optimized in a particular case, a call to the library
>> function is emitted." [1]
>>
>> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
>> "bswapsi2" source, so GCC considers size optimization as a
>> legitimate one of those cases.
>>
>>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
>>> to use?
>>
>> either that, or link with libgcc - why does arch/arm64 do this and
>> arch/arm not?  It's not obvious from git log.
>
> One reason I have found, I don't know if it is the canonical one, is
> that linking with libgcc allows people to use all intrinsics e.g. soft
> float routines in the kernel without noticing it. If you limit the
> intrinsics to the ones linked into the kernel explicitly then this
> cannot happen.

For arm64 we explicitly pass -mgeneral-regs-only to avoid any floating
point generation. Soft-float is excluded by the ABI automatically. But
we use other compiler intrinsics like __ffs and while they are
currently generated inline, you can't guarantee, hence the linking
with libgcc.

> I have also seen cases where the libgcc intrinsics are improved over
> time, having the code in the kernel allows these improvements to be
> rolled into the kernel even if the user has an older toolchain.

Indeed, the gcc guys do a lot benchmarking/optimisations on a wide
range of processors, so we can take advantage of that in the kernel.
But it's much easier on arm64 since the architecture is stable. On
32-bit arm we have to cope with a range of architecture versions with
variations to the instruction set.

-- 
Catalin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-07  1:19                           ` Kim Phillips
  2013-02-07 10:19                             ` Will Newton
@ 2013-02-07 18:13                             ` Russell King - ARM Linux
  2013-02-08 17:25                               ` Woodhouse, David
  1 sibling, 1 reply; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-02-07 18:13 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

On Wed, Feb 06, 2013 at 07:19:05PM -0600, Kim Phillips wrote:
> either that, or link with libgcc - why does arch/arm64 do this and
> arch/arm not?  It's not obvious from git log.

We want to be in control of what code the kernel runs, and that means
not using libgcc.  libgcc can do all sorts of stuff because it makes
assumptions about the environment which it is executing in (glibc).

For example, it assumes there may be a GOT, and that it can issue SWI
calls...

However, the biggest reason not to use libgcc is that we want to control
what gets used in the kernel - for example, no floating point, and no
use of 64 x 64bit division.

So all round, using libgcc in AArch32 would bring us a number of issues
that we don't get with the current approach.

AArch64 uses it because it's maintained entirely separately and the
decision has been made by other people - and some of these concerns do
not exist on a 64-bit architecture.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-07 18:13                             ` Russell King - ARM Linux
@ 2013-02-08 17:25                               ` Woodhouse, David
  2013-02-08 20:04                                 ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-08 17:25 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> 
> However, the biggest reason not to use libgcc is that we want to control
> what gets used in the kernel - for example, no floating point, and no
> use of 64 x 64bit division.

Which is all very sensible. But there's no particular reason we couldn't
add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.

If we're compiling with -Os  on platforms without 'rev' and that's the
thing that makes most sense, then we should make it work.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-08 17:25                               ` Woodhouse, David
@ 2013-02-08 20:04                                 ` Nicolas Pitre
  2013-02-08 22:40                                   ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-08 20:04 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 8 Feb 2013, Woodhouse, David wrote:

> On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > 
> > However, the biggest reason not to use libgcc is that we want to control
> > what gets used in the kernel - for example, no floating point, and no
> > use of 64 x 64bit division.
> 
> Which is all very sensible. But there's no particular reason we couldn't
> add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.

Absolutely.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-08 20:04                                 ` Nicolas Pitre
@ 2013-02-08 22:40                                   ` Woodhouse, David
  2013-02-08 22:47                                     ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-08 22:40 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> On Fri, 8 Feb 2013, Woodhouse, David wrote:
> 
> > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > 
> > > However, the biggest reason not to use libgcc is that we want to control
> > > what gets used in the kernel - for example, no floating point, and no
> > > use of 64 x 64bit division.
> > 
> > Which is all very sensible. But there's no particular reason we couldn't
> > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> 
> Absolutely.

And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
architectures do, right?

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-08 22:40                                   ` Woodhouse, David
@ 2013-02-08 22:47                                     ` Nicolas Pitre
  2013-02-09  1:12                                       ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-08 22:47 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 8 Feb 2013, Woodhouse, David wrote:

> On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > 
> > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > 
> > > > However, the biggest reason not to use libgcc is that we want to control
> > > > what gets used in the kernel - for example, no floating point, and no
> > > > use of 64 x 64bit division.
> > > 
> > > Which is all very sensible. But there's no particular reason we couldn't
> > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > 
> > Absolutely.
> 
> And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> architectures do, right?

If that turns out to be beneficial over what we have now, then yes.
I didn't read back the whole thread to form an opinion though.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-08 22:47                                     ` Nicolas Pitre
@ 2013-02-09  1:12                                       ` Kim Phillips
  2013-02-09  3:16                                         ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-09  1:12 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 8 Feb 2013 17:47:33 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Fri, 8 Feb 2013, Woodhouse, David wrote:
> 
> > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > 
> > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > > 
> > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > use of 64 x 64bit division.
> > > > 
> > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > > 
> > > Absolutely.
> > 
> > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > architectures do, right?
> 
> If that turns out to be beneficial over what we have now, then yes.
> I didn't read back the whole thread to form an opinion though.

The diff below implements __bswap[sd]i2 in arch/arm/lib, and
results in the following savings in vmlinux size:

column 1: name of defconfig
column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)

acs5k_tiny_defconfig: 	2886048 	2886016 	-32
ag5evm_defconfig: 	2767596 	2767564 	-32
am200epdkit_defconfig: 	3283557 	3283557 	0
ap4evb_defconfig: 	2059540 	2059524 	-16
armadillo800eva_defconfig: 	4521440 	4521200 	-240
assabet_defconfig: 	3562923 	3562603 	-320
at91_dt_defconfig: 	4195020 	4189164 	-5856
at91rm9200_defconfig: 	5804566 	5803742 	-824
at91sam9261_defconfig: 	4726625 	4725969 	-656
at91sam9263_defconfig: 	5115407 	5114459 	-948
at91sam9g20_defconfig: 	4175306 	4175138 	-168
at91x40_defconfig: 	1344264 	1344256 	-8
badge4_defconfig: 	3359661 	3359485 	-176
bcm_defconfig: 	3575901 	3575757 	-144
bonito_defconfig: 	2221729 	2221681 	-48
cerfcube_defconfig: 	3063610 	3063370 	-240
cns3420vb_defconfig: 	2595667 	2595651 	-16
collie_defconfig: 	3015148 	3014972 	-176
da8xx_omapl_defconfig: 	4168517 	4168165 	-352
davinci_all_defconfig: 	4060434 	4060110 	-324
dove_defconfig: 	4940257 	4939245 	-1012
ebsa110_defconfig: 	3236795 	3236587 	-208
ep93xx_defconfig: 	4169466 	4168986 	-480
eseries_pxa_defconfig: 	3090656 	3090448 	-208
exynos4_defconfig: 	3008671 	3008639 	-32
ezx_defconfig: 	10042035 	10041835 	-200
footbridge_defconfig: 	3662497 	3662369 	-128
h3600_defconfig: 	3471140 	3470820 	-320
h5000_defconfig: 	2976046 	2976030 	-16
h7201_defconfig: 	2158753 	2158705 	-48
h7202_defconfig: 	3395888 	3395520 	-368
hackkit_defconfig: 	3128690 	3128578 	-112
imote2_defconfig: 	9916412 	9916180 	-232
imx_v4_v5_defconfig: 	6866952 	6866272 	-680
imx_v6_v7_defconfig: 	7672373 	7667089 	-5284
integrator_defconfig: 	4224588 	4224284 	-304
iop13xx_defconfig: 	5229604 	5228468 	-1136
iop32x_defconfig: 	5616676 	5615492 	-1184
iop33x_defconfig: 	4611219 	4610835 	-384
ixp4xx_defconfig: 	4566094 	4564238 	-1856
jornada720_defconfig: 	3140569 	3140345 	-224
kota2_defconfig: 	3971280 	3971280 	0
kzm9d_defconfig: 	3042784 	3042784 	0
kzm9g_defconfig: 	4728939 	4728907 	-32
lart_defconfig: 	2941150 	2941054 	-96
lpc32xx_defconfig: 	5009215 	5004619 	-4596
mackerel_defconfig: 	4687162 	4686922 	-240
mini2440_defconfig: 	5095413 	5095037 	-376
msm_defconfig: 	2920312 	2920224 	-88
multi_v7_defconfig: 	3511744 	3511712 	-32
mvebu_defconfig: 	3871035 	3870931 	-104
mxs_defconfig: 	11091983 	11095679 	3696
netwinder_defconfig: 	3814476 	3814124 	-352
nhk8815_defconfig: 	4277138 	4276778 	-360
nuc910_defconfig: 	2527988 	2527956 	-32
nuc950_defconfig: 	2634352 	2634312 	-40
nuc960_defconfig: 	2546592 	2546552 	-40
omap2plus_defconfig: 	13801444 	13800144 	-1300
palmz72_defconfig: 	3345726 	3345758 	32
pcm027_defconfig: 	3634230 	3634158 	-72
pleb_defconfig: 	2847643 	2847515 	-128
prima2_defconfig: 	3080439 	3080347 	-92
pxa3xx_defconfig: 	4266677 	4266457 	-220
realview_defconfig: 	3891673 	3891465 	-208
realview-smp_defconfig: 	4157253 	4157045 	-208
rpc_defconfig: 	3974677 	3974661 	-16
s3c2410_defconfig: 	5217336 	5217000 	-336
s3c6400_defconfig: 	3209178 	3209178 	0
s5p64x0_defconfig: 	2843307 	2843275 	-32
s5pc100_defconfig: 	2696641 	2696657 	16
shannon_defconfig: 	3586908 	3586636 	-272
shark_defconfig: 	3565471 	3565423 	-48
simpad_defconfig: 	3681433 	3681161 	-272
socfpga_defconfig: 	4326480 	4326408 	-72
spear13xx_defconfig: 	4062410 	4062378 	-32
spear6xx_defconfig: 	3457392 	3457360 	-32
tct_hammer_defconfig: 	2406856 	2406824 	-32
trizeps4_defconfig: 	5869063 	5868483 	-580
u300_defconfig: 	2627488 	2627488 	0
versatile_defconfig: 	3789155 	3788787 	-368
vexpress_defconfig: 	5105737 	5105489 	-248
viper_defconfig: 	3309176 	3308920 	-256
xcep_defconfig: 	2820463 	2820367 	-96
zeus_defconfig: 	3775525 	3775367 	-158

gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
however:

acs5k_tiny_defconfig: 	2873380 	2873330 	-50
ag5evm_defconfig: 	2753016 	2753128 	112
am200epdkit_defconfig: 	3274245 	3274295 	50
ap4evb_defconfig: 	2051952 	2051944 	-8
armadillo800eva_defconfig: 	4531864 	4531718 	-146
assabet_defconfig: 	3540515 	3540337 	-178
at91_dt_defconfig: 	4211064 	4205394 	-5670
at91rm9200_defconfig: 	5785930 	5785368 	-562
at91sam9261_defconfig: 	4705357 	4705139 	-218
at91sam9263_defconfig: 	5098831 	5098501 	-330
at91sam9g20_defconfig: 	4163686 	4163576 	-110
at91sam9g45_defconfig: 	4675215 	4669553 	-5662
at91x40_defconfig: 	1342256 	1342192 	-64
badge4_defconfig: 	3338677 	3338615 	-62
bcm_defconfig: 	3568365 	3568259 	-106
bonito_defconfig: 	2225353 	2225355 	2
cerfcube_defconfig: 	3043650 	3043556 	-94
cns3420vb_defconfig: 	2589035 	2589049 	14
collie_defconfig: 	2994100 	2993972 	-128
da8xx_omapl_defconfig: 	4155189 	4155047 	-142
davinci_all_defconfig: 	4043450 	4043304 	-146
dove_defconfig: 	4902213 	4901627 	-586
ebsa110_defconfig: 	3216131 	3215953 	-178
ep93xx_defconfig: 	4154038 	4153796 	-242
eseries_pxa_defconfig: 	3076688 	3076610 	-78
exynos4_defconfig: 	3001451 	3001465 	14
ezx_defconfig: 	10022647 	10022553 	-94
footbridge_defconfig: 	3640857 	3640763 	-94
h3600_defconfig: 	3453932 	3453722 	-210
h5000_defconfig: 	2963034 	2963036 	2
h7201_defconfig: 	2148969 	2148987 	18
h7202_defconfig: 	3376072 	3375894 	-178
hackkit_defconfig: 	3107994 	3107916 	-78
imote2_defconfig: 	9899372 	9899278 	-94
imx_v4_v5_defconfig: 	6834892 	6834338 	-554
imx_v6_v7_defconfig: 	7637605 	7636935 	-670
integrator_defconfig: 	4204944 	4204674 	-270
iop13xx_defconfig: 	5193128 	5192790 	-338
iop32x_defconfig: 	5576144 	5575758 	-386
iop33x_defconfig: 	4584683 	4584585 	-98
ixp4xx_defconfig: 	4546578 	4545176 	-1402
jornada720_defconfig: 	3121377 	3121299 	-78
kota2_defconfig: 	3951220 	3951220 	0
kzm9d_defconfig: 	3055284 	3055252 	-32
kzm9g_defconfig: 	4742195 	4742161 	-34
lart_defconfig: 	2922550 	2926600 	4050
lpc32xx_defconfig: 	5036722 	5032072 	-4650
mackerel_defconfig: 	4660786 	4660612 	-174
mini2440_defconfig: 	5074529 	5074231 	-298
msm_defconfig: 	2911200 	2911240 	40
multi_v7_defconfig: 	3502888 	3502824 	-64
mvebu_defconfig: 	3853083 	3853165 	82
mxs_defconfig: 	11071139 	11070893 	-246
netwinder_defconfig: 	3791108 	3790884 	-224
netx_defconfig: 	3951133 	3950911 	-222
nhk8815_defconfig: 	4259910 	4259696 	-214
nuc910_defconfig: 	2523448 	2523408 	-40
nuc950_defconfig: 	2633644 	2633604 	-40
nuc960_defconfig: 	2541916 	2541876 	-40
omap2plus_defconfig: 	13770376 	13770026 	-350
palmz72_defconfig: 	3335850 	3335868 	18
pcm027_defconfig: 	3615962 	3615896 	-66
pleb_defconfig: 	2828707 	2828753 	46
prima2_defconfig: 	3021195 	3021213 	18
pxa3xx_defconfig: 	4250245 	4250135 	-110
realview_defconfig: 	3874745 	3874791 	46
realview-smp_defconfig: 	4144973 	4145051 	78
rpc_defconfig: 	3952749 	3952831 	82
s3c2410_defconfig: 	5191024 	5195006 	3982
s3c6400_defconfig: 	3208530 	3208548 	18
s5p64x0_defconfig: 	2837611 	2837625 	14
s5pc100_defconfig: 	2689877 	2689895 	18
shannon_defconfig: 	3567412 	3567234 	-178
shark_defconfig: 	3548103 	3548149 	46
simpad_defconfig: 	3657889 	3657763 	-126
socfpga_defconfig: 	4287524 	4287610 	86
spear13xx_defconfig: 	4040898 	4044932 	4034
spear6xx_defconfig: 	3446968 	3446998 	30
tct_hammer_defconfig: 	2394112 	2394126 	14
trizeps4_defconfig: 	5846883 	5846505 	-378
u300_defconfig: 	2622856 	2622934 	78
versatile_defconfig: 	3766635 	3766409 	-226
vexpress_defconfig: 	5076237 	5076279 	42
viper_defconfig: 	3293388 	3293294 	-94
xcep_defconfig: 	2804423 	2808549 	4126
zeus_defconfig: 	3760613 	3760539 	-74

Haven't looked at why.

In any case, some questions I have are:

(a) are the __bswap[sd]i2 implementations acceptable written in C,
as in the diff?  I don't speak ARM asm (yet at least).  The
generated code looks pretty optimal in both armv5 and 6+.

(b) would adding __bswap[sd]i2 to the kernel build need to be
disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the
8-byte swap, even with -Os, on armv6+.

(c) testing allyesconfigs is proving to be a pain - lots of
breakeage - other than defconfig testing, is there any more I can do?

Thanks, and for your patience,

Kim

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4265a26..5e8b735 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index c9865f6..c225624 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -111,12 +111,12 @@ endif
 
 targets       := vmlinux vmlinux.lds \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
-		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
 		 font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
-		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+		 lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
 	$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.o: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.o
+	$(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+		$(bswapsdi2) FORCE
 	@$(check_for_multiple_zreladdr)
 	$(call if_changed,ld)
 	@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
 EXPORT_SYMBOL(__udivsi3);
 EXPORT_SYMBOL(__umodsi3);
 EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);
 
 #ifdef CONFIG_AEABI
 EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
 		   ucmpdi2.o lib1funcs.o div64.o                      \
 		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
-		   call_with_stack.o
+		   call_with_stack.o bswapsdi2.o
 
 mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
 
diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c
new file mode 100644
index 0000000..1923b2f
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.c
@@ -0,0 +1,19 @@
+unsigned int __bswapsi2(unsigned int x)
+{
+	return ((x & 0x000000ffUL) << 24) |
+	       ((x & 0x0000ff00UL) <<  8) |
+	       ((x & 0x00ff0000UL) >>  8) |
+	       ((x & 0xff000000UL) >> 24);
+}
+
+unsigned long long __bswapdi2(unsigned long long x)
+{
+	return ((x & 0x00000000000000ffULL) << 56) |
+	       ((x & 0x000000000000ff00ULL) << 40) |
+	       ((x & 0x0000000000ff0000ULL) << 24) |
+	       ((x & 0x00000000ff000000ULL) <<  8) |
+	       ((x & 0x000000ff00000000ULL) >>  8) |
+	       ((x & 0x0000ff0000000000ULL) >> 24) |
+	       ((x & 0x00ff000000000000ULL) >> 40) |
+	       ((x & 0xff00000000000000ULL) >> 56);
+}


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-09  1:12                                       ` Kim Phillips
@ 2013-02-09  3:16                                         ` Nicolas Pitre
  2013-02-20  2:31                                           ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-09  3:16 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 8 Feb 2013, Kim Phillips wrote:

> On Fri, 8 Feb 2013 17:47:33 -0500
> Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > 
> > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > > 
> > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > > > 
> > > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > > use of 64 x 64bit division.
> > > > > 
> > > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > > > 
> > > > Absolutely.
> > > 
> > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > > architectures do, right?
> > 
> > If that turns out to be beneficial over what we have now, then yes.
> > I didn't read back the whole thread to form an opinion though.
> 
> The diff below implements __bswap[sd]i2 in arch/arm/lib, and
> results in the following savings in vmlinux size:
> 
> column 1: name of defconfig
> column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
> column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
> column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)
> 
[...]
> imx_v6_v7_defconfig: 	7672373 	7667089 	-5284
> lart_defconfig:	2941150		2941054         -96
> mxs_defconfig: 	11091983 	11095679 	3696

The savings are good, with some impressive cases.  However the 
mxs_defconfig is completely the opposite and by far.  Any idea?

> gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
> however:

Not only that, but in many cases the results are wildly different given 
the same config:

> imx_v6_v7_defconfig:	7637605		7636935		-670
> lart_defconfig: 	2922550 	2926600 	4050
> mxs_defconfig: 	11071139 	11070893 	-246

The mxs_defconfig became much better while lart_defconfig regressed a 
lot.

> Haven't looked at why.

Would be a good idea since this is rather weird and gcc could benefit 
from your findings.

> In any case, some questions I have are:
> 
> (a) are the __bswap[sd]i2 implementations acceptable written in C,
> as in the diff?  I don't speak ARM asm (yet at least).  The
> generated code looks pretty optimal in both armv5 and 6+.

It looks pretty nice indeed:

__bswapsi2:
        eor     r2, r0, r0, ror #16
        mov     r1, r2, lsr #8
        bic     r3, r1, #65280
        eor     r0, r3, r0, ror #8
        bx      lr

There is no way to do better than that.  But that's true only if -Os is 
_not_ used.  With -Os we get the following output:

__bswapsi2:
        mov     r3, r0, asl #24
        and     r2, r0, #65280
        orr     r3, r3, r0, lsr #24
        orr     r3, r3, r2, asl #8
        and     r0, r0, #16711680
        orr     r0, r3, r0, lsr #8
        bx      lr

I really don't get why gcc thinks the above is shorter.  I'm saving you 
from pasting the __bswapdi2 result which is also way way worse.
That was with Linaro gcc v4.6.2.

With Sourcery gcc v4.5.1 we get:

__bswapsi2:
        stmfd   sp!, {r3, lr}
        bl      __bswapsi2
        ldmfd   sp!, {r3, pc}

This is indeed shorter, but much less useful.  So you better enforce -O2 
for this file.  And the nice thing with C code is that it is fully 
optimized with the rev instruction when compiling for ARMv6+ if it is 
ever used in that case.

> (b) would adding __bswap[sd]i2 to the kernel build need to be
> disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the
> 8-byte swap, even with -Os, on armv6+.

I wouldn't bother.  That would save only 6 instructions total.  And who 
knows if some gcc flavor start calling them for some reason eventually.

> (c) testing allyesconfigs is proving to be a pain - lots of
> breakeage - other than defconfig testing, is there any more I can do?

The defconfigs provide wildly different results and that is a good 
thing for further investigation.  You may concentrate on a small 
interesting sample such as those I kept above.

With allyesconfig the good configs would cancel out the bad ones making 
the bad ones invisible.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-09  3:16                                         ` Nicolas Pitre
@ 2013-02-20  2:31                                           ` Kim Phillips
  2013-02-20  2:38                                             ` Stephen Boyd
                                                               ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Kim Phillips @ 2013-02-20  2:31 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 8 Feb 2013 22:16:47 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Fri, 8 Feb 2013, Kim Phillips wrote:
> 
> > On Fri, 8 Feb 2013 17:47:33 -0500
> > Nicolas Pitre <nico@fluxnic.net> wrote:
> > 
> > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > 
> > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > > > 
> > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > > > > 
> > > > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > > > use of 64 x 64bit division.
> > > > > > 
> > > > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > > > > 
> > > > > Absolutely.
> > > > 
> > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > > > architectures do, right?
> > > 
> > > If that turns out to be beneficial over what we have now, then yes.
> > > I didn't read back the whole thread to form an opinion though.
> > 
> > The diff below implements __bswap[sd]i2 in arch/arm/lib, and
> > results in the following savings in vmlinux size:
> > 
> > column 1: name of defconfig
> > column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
> > column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
> > column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)
> > 
> [...]
> > imx_v6_v7_defconfig: 	7672373 	7667089 	-5284
> > lart_defconfig:	2941150		2941054         -96
> > mxs_defconfig: 	11091983 	11095679 	3696
> 
> The savings are good, with some impressive cases.  However the 
> mxs_defconfig is completely the opposite and by far.  Any idea?

Unfortunately, I don't seem to be able to reproduce this anymore.
Same linux-next, with three different compilers always produces
smaller binaries:

   text	   data	    bss	    dec	    hex	filename
5239363	 280576	5569648	11089587	 a936b3	linux-next-mxs-orig-gcc4.7/vmlinux
5239169	 280556	5569648	11089373	 a935dd	linux-next-mxs-bswap-gcc4.7/vmlinux

5262223	 280592	5569648	11112463	 a9900f	linux-next-mxs-orig-gcc4.6.3/vmlinux
5261909	 280584	5569648	11112141	 a98ecd	linux-next-mxs-bswap-gcc4.6.3/vmlinux

5241379	 280580	5569648	11091607	 a93e97	linux-next-mxs-orig-gcc4.6ubuntu/vmlinux
5241189	 280600	5569648	11091437	 a93ded	linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux

So I've since made a more consistent cross-build environment, using
cross tools from Linaro [1,2] instead of via Ubuntu [3].

> > gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
> > however:
> 
> Not only that, but in many cases the results are wildly different given 
> the same config:
> 
> > imx_v6_v7_defconfig:	7637605		7636935		-670
> > lart_defconfig: 	2922550 	2926600 	4050
> > mxs_defconfig: 	11071139 	11070893 	-246
> 
> The mxs_defconfig became much better while lart_defconfig regressed a 
> lot.
> 
> > Haven't looked at why.
> 
> Would be a good idea since this is rather weird and gcc could benefit 
> from your findings.

The following is next-20130207 built with Linaro gcc 4.7.1 [1], and
before and after the diff at the bottom of this email (and with
normalized linux version string sizes):

lart_defconfig:		2752106 120864 56444 2929414 2cb306 vmlinux
lart_defconfig:		2756092 120864 56444 2933400 2cc298 vmlinux	#builtin-bswap

mxs_defconfig:		5229115 280572 5569648 11079335 a90ea7 vmlinux
mxs_defconfig:		5228969 280552 5569648 11079169 a90e01 vmlinux	#builtin-bswap

imx_v6_v7_defconfig:	6935025 356172 360648 7651845 74c205 vmlinux
imx_v6_v7_defconfig:	6934091 356180 360648 7650919 74be67 vmlinux	#builtin-bswap


so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_
3986 bytes to .text -> not good.

Getting a closer look at lart, bloat-o-meter says the code actually
shrunk:

add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58)
function                                     old     new   delta
inet_abc_len                                   -      96     +96
__bswapdi2                                     -      52     +52
__bswapsi2                                     -      32     +32
icmp_unreach                                 472     492     +20
xfrm_selector_match                          988    1000     +12
fib_table_insert                            2176    2188     +12
__kstrtab___bswapsi2                           -      11     +11
__kstrtab___bswapdi2                           -      11     +11
__ksymtab___bswapsi2                           -       8      +8
__ksymtab___bswapdi2                           -       8      +8
vermagic                                      51      57      +6
linux_banner                                 230     236      +6
xfrm_replay_check_esn                        320     324      +4
xfrm_replay_check_bmp                        200     204      +4
xfrm_replay_check                            152     156      +4
static.tcp_parse_aligned_timestamp            80      84      +4
fib_table_delete                             708     712      +4
cookie_v4_check                             1316    1320      +4
tcp_tso_segment                              728     724      -4
tcp_options_write                            724     720      -4
ip_rt_ioctl                                 1152    1148      -4
fib_trie_seq_show                            724     720      -4
crc32_be                                     448     444      -4
xfrm_stateonly_find                          640     632      -8
tcp_finish_connect                           276     268      -8
static.tcp_v4_send_ack                       480     472      -8
__xfrm_state_lookup                          356     348      -8
__xfrm_state_bump_genids                     436     428      -8
__find_acq_core                             1256    1248      -8
cookie_v4_init_sequence                      272     260     -12
__xfrm_state_insert                          616     600     -16
sys_swapon                                  2500    2480     -20
xfrm_state_find                             2420    2396     -24
xfrm_hash_resize                            1620    1596     -24
fib_route_seq_show                           560     536     -24
fib_table_dump                               704     676     -28
devinet_ioctl                               1856    1796     -60
static.inet_abc_len                           80       -     -80

Comparing System.maps, .rodata starts at the same address:

c020a000 R __start_rodata
c020a000 R __start_rodata	#builtin-bswap

however, changes including the __bswap[sd]i2 implementation pushes
the .rodata section size just over the 4KiB alignment boundary
specified in arm/kernel/vmlinux.lds:

no builtin_bswap:

c028ffc4 R __stop___modver
c0290000 R __end_rodata
c0290000 R __start___ex_table

with builtin_bswap:

c0290068 R __stop___modver
c0291000 R __end_rodata
c0291000 R __start___ex_table

So, AFAICT, that's why we see a total increase in .text for lart,
and, looking at both numbers being a little less than 4KiB, I
suspect the same with whatever happened with mxs above.

FYI, here are the same platforms with Linaro gcc 4.7.3 [2]:

lart_defconfig:		2745218 120888 56444 2922550 2c9836 vmlinux
lart_defconfig:		2749300 120888 56444 2926632 2ca828 vmlinux	#builtin-bswap

mxs_defconfig: 		5220919 280572 5569648 11071139 a8eea3 vmlinux
mxs_defconfig:		5220725 280552 5569648 11070925 a8edcd vmlinux	#builtin-bswap

imx_v6_v7_defconfig:	6920769 356188 360648 7637605 748a65 vmlinux
imx_v6_v7_defconfig:	6920091 356196 360648 7636935 7487c7 vmlinux	#builtin-bswap

so lart is still mostly affected by the .rodata alignment:

add/remove: 7/1 grow/shrink: 11/16 up/down: 330/-308 (22)
function                                     old     new   delta
inet_abc_len                                   -      96     +96
__bswapdi2                                     -      52     +52
fib_table_insert                            2152    2196     +44
__bswapsi2                                     -      32     +32
icmp_unreach                                 472     492     +20
xfrm_selector_match                          988    1000     +12
__kstrtab___bswapsi2                           -      11     +11
__kstrtab___bswapdi2                           -      11     +11
__ksymtab___bswapsi2                           -       8      +8
__ksymtab___bswapdi2                           -       8      +8
vermagic                                      51      57      +6
linux_banner                                 232     238      +6
xfrm_replay_check_esn                        320     324      +4
xfrm_replay_check_bmp                        200     204      +4
xfrm_replay_check                            152     156      +4
static.tcp_parse_aligned_timestamp            80      84      +4
fib_table_delete                             708     712      +4
cookie_v4_check                             1316    1320      +4
tcp_options_write                            724     720      -4
ip_rt_ioctl                                 1152    1148      -4
fib_trie_seq_show                            724     720      -4
crc32_be                                     448     444      -4
xfrm_stateonly_find                          640     632      -8
tcp_finish_connect                           276     268      -8
static.tcp_v4_send_ack                       480     472      -8
__xfrm_state_lookup                          356     348      -8
__xfrm_state_bump_genids                     436     428      -8
__find_acq_core                             1252    1244      -8
cookie_v4_init_sequence                      272     260     -12
__xfrm_state_insert                          616     600     -16
xfrm_state_find                             2420    2396     -24
xfrm_hash_resize                            1620    1596     -24
fib_table_dump                               704     676     -28
devinet_ioctl                               1856    1796     -60
static.inet_abc_len                           80       -     -80

> > In any case, some questions I have are:
> > 
> > (a) are the __bswap[sd]i2 implementations acceptable written in C,
> > as in the diff?  I don't speak ARM asm (yet at least).  The
> > generated code looks pretty optimal in both armv5 and 6+.
> 
> It looks pretty nice indeed:
> 
> __bswapsi2:
>         eor     r2, r0, r0, ror #16
>         mov     r1, r2, lsr #8
>         bic     r3, r1, #65280
>         eor     r0, r3, r0, ror #8
>         bx      lr
> 
> There is no way to do better than that.  But that's true only if -Os is 
> _not_ used.  With -Os we get the following output:
> 
> __bswapsi2:
>         mov     r3, r0, asl #24
>         and     r2, r0, #65280
>         orr     r3, r3, r0, lsr #24
>         orr     r3, r3, r2, asl #8
>         and     r0, r0, #16711680
>         orr     r0, r3, r0, lsr #8
>         bx      lr
> 
> I really don't get why gcc thinks the above is shorter.  I'm saving you 
> from pasting the __bswapdi2 result which is also way way worse.
> That was with Linaro gcc v4.6.2.
> 
> With Sourcery gcc v4.5.1 we get:
> 
> __bswapsi2:
>         stmfd   sp!, {r3, lr}
>         bl      __bswapsi2
>         ldmfd   sp!, {r3, pc}
> 
> This is indeed shorter, but much less useful.  So you better enforce -O2 
> for this file.  And the nice thing with C code is that it is fully 
> optimized with the rev instruction when compiling for ARMv6+ if it is 
> ever used in that case.

ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o.

> > (c) testing allyesconfigs is proving to be a pain - lots of
> > breakeage - other than defconfig testing, is there any more I can do?
> 
> The defconfigs provide wildly different results and that is a good 
> thing for further investigation.  You may concentrate on a small 
> interesting sample such as those I kept above.
> 
> With allyesconfig the good configs would cancel out the bad ones making 
> the bad ones invisible.

ok, thanks for your comments, Nicolas.

Here's the new diff:

changes from last diff:
- enforce -O2 for bswapsdi2.o
- fix building out-of-source tree

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4265a26..5e8b735 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index c9865f6..8ef97c4 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -111,12 +111,12 @@ endif
 
 targets       := vmlinux vmlinux.lds \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
-		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
 		 font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
-		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+		 lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
 	$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
+	$(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+		$(bswapsdi2) FORCE
 	@$(check_for_multiple_zreladdr)
 	$(call if_changed,ld)
 	@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
 EXPORT_SYMBOL(__udivsi3);
 EXPORT_SYMBOL(__umodsi3);
 EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);
 
 #ifdef CONFIG_AEABI
 EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..dbee639 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
 		   ucmpdi2.o lib1funcs.o div64.o                      \
 		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
-		   call_with_stack.o
+		   call_with_stack.o bswapsdi2.o
 
 mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
 
@@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK)	+= io-shark.o
 
 $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
 $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
+
+CFLAGS_bswapsdi2.o = -O2
diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c
new file mode 100644
index 0000000..1923b2f
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.c
@@ -0,0 +1,19 @@
+unsigned int __bswapsi2(unsigned int x)
+{
+	return ((x & 0x000000ffUL) << 24) |
+	       ((x & 0x0000ff00UL) <<  8) |
+	       ((x & 0x00ff0000UL) >>  8) |
+	       ((x & 0xff000000UL) >> 24);
+}
+
+unsigned long long __bswapdi2(unsigned long long x)
+{
+	return ((x & 0x00000000000000ffULL) << 56) |
+	       ((x & 0x000000000000ff00ULL) << 40) |
+	       ((x & 0x0000000000ff0000ULL) << 24) |
+	       ((x & 0x00000000ff000000ULL) <<  8) |
+	       ((x & 0x000000ff00000000ULL) >>  8) |
+	       ((x & 0x0000ff0000000000ULL) >> 24) |
+	       ((x & 0x00ff000000000000ULL) >> 40) |
+	       ((x & 0xff00000000000000ULL) >> 56);
+}

Thanks,

Kim

[1] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-2012.06-20120625 - Linaro GCC 2012.06) 4.7.1 20120531 (prerelease)

[2] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.01-20130125 - Linaro GCC 2013.01) 4.7.3 20130102 (prerelease)

[3] gcc version 4.6.3 20120624 (prerelease) (Ubuntu/Linaro 4.6.3-8ubuntu1)
 (deb http://mirrors.us.kernel.org/ubuntu/ precise main universe)


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20  2:31                                           ` Kim Phillips
@ 2013-02-20  2:38                                             ` Stephen Boyd
  2013-02-20  3:17                                             ` Nicolas Pitre
  2013-03-13 13:35                                             ` Woodhouse, David
  2 siblings, 0 replies; 63+ messages in thread
From: Stephen Boyd @ 2013-02-20  2:38 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Nicolas Pitre, Russell King - ARM Linux, Woodhouse, David,
	Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov,
	David Rientjes, Andrew Morton, linux-arm-kernel

On 2/19/2013 6:31 PM, Kim Phillips wrote:
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..dbee639 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK)	+= io-shark.o
>  
>  $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
>  $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
> +
> +CFLAGS_bswapsdi2.o = -O2

Please put a comment here so we don't have to dig through git history
and/or emails to understand why this has -O2 forced on for this file.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20  2:31                                           ` Kim Phillips
  2013-02-20  2:38                                             ` Stephen Boyd
@ 2013-02-20  3:17                                             ` Nicolas Pitre
  2013-02-20 10:38                                               ` Woodhouse, David
  2013-03-13 13:35                                             ` Woodhouse, David
  2 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-20  3:17 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Tue, 19 Feb 2013, Kim Phillips wrote:

> On Fri, 8 Feb 2013 22:16:47 -0500
> Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > Not only that, but in many cases the results are wildly different given 
> > the same config:
> > 
> > > imx_v6_v7_defconfig:	7637605		7636935		-670
> > > lart_defconfig: 	2922550 	2926600 	4050
> > > mxs_defconfig: 	11071139 	11070893 	-246
> > 
> > The mxs_defconfig became much better while lart_defconfig regressed a 
> > lot.
> > 
> > > Haven't looked at why.
> > 
> > Would be a good idea since this is rather weird and gcc could benefit 
> > from your findings.
> 
> The following is next-20130207 built with Linaro gcc 4.7.1 [1], and
> before and after the diff at the bottom of this email (and with
> normalized linux version string sizes):
> 
> lart_defconfig:		2752106 120864 56444 2929414 2cb306 vmlinux
> lart_defconfig:		2756092 120864 56444 2933400 2cc298 vmlinux	#builtin-bswap
> 
> mxs_defconfig:		5229115 280572 5569648 11079335 a90ea7 vmlinux
> mxs_defconfig:		5228969 280552 5569648 11079169 a90e01 vmlinux	#builtin-bswap
> 
> imx_v6_v7_defconfig:	6935025 356172 360648 7651845 74c205 vmlinux
> imx_v6_v7_defconfig:	6934091 356180 360648 7650919 74be67 vmlinux	#builtin-bswap
> 
> 
> so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_
> 3986 bytes to .text -> not good.
> 
> Getting a closer look at lart, bloat-o-meter says the code actually
> shrunk:
> 
> add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58)
> function                                     old     new   delta
> inet_abc_len                                   -      96     +96
> __bswapdi2                                     -      52     +52
> __bswapsi2                                     -      32     +32
> icmp_unreach                                 472     492     +20
> xfrm_selector_match                          988    1000     +12
> fib_table_insert                            2176    2188     +12
> __kstrtab___bswapsi2                           -      11     +11
> __kstrtab___bswapdi2                           -      11     +11
> __ksymtab___bswapsi2                           -       8      +8
> __ksymtab___bswapdi2                           -       8      +8
> vermagic                                      51      57      +6
> linux_banner                                 230     236      +6
> xfrm_replay_check_esn                        320     324      +4
> xfrm_replay_check_bmp                        200     204      +4
> xfrm_replay_check                            152     156      +4
> static.tcp_parse_aligned_timestamp            80      84      +4
> fib_table_delete                             708     712      +4
> cookie_v4_check                             1316    1320      +4
> tcp_tso_segment                              728     724      -4
> tcp_options_write                            724     720      -4
> ip_rt_ioctl                                 1152    1148      -4
> fib_trie_seq_show                            724     720      -4
> crc32_be                                     448     444      -4
> xfrm_stateonly_find                          640     632      -8
> tcp_finish_connect                           276     268      -8
> static.tcp_v4_send_ack                       480     472      -8
> __xfrm_state_lookup                          356     348      -8
> __xfrm_state_bump_genids                     436     428      -8
> __find_acq_core                             1256    1248      -8
> cookie_v4_init_sequence                      272     260     -12
> __xfrm_state_insert                          616     600     -16
> sys_swapon                                  2500    2480     -20
> xfrm_state_find                             2420    2396     -24
> xfrm_hash_resize                            1620    1596     -24
> fib_route_seq_show                           560     536     -24
> fib_table_dump                               704     676     -28
> devinet_ioctl                               1856    1796     -60
> static.inet_abc_len                           80       -     -80
> 
> Comparing System.maps, .rodata starts at the same address:
> 
> c020a000 R __start_rodata
> c020a000 R __start_rodata	#builtin-bswap
> 
> however, changes including the __bswap[sd]i2 implementation pushes
> the .rodata section size just over the 4KiB alignment boundary
> specified in arm/kernel/vmlinux.lds:
> 
> no builtin_bswap:
> 
> c028ffc4 R __stop___modver
> c0290000 R __end_rodata
> c0290000 R __start___ex_table
> 
> with builtin_bswap:
> 
> c0290068 R __stop___modver
> c0291000 R __end_rodata
> c0291000 R __start___ex_table
> 
> So, AFAICT, that's why we see a total increase in .text for lart,
> and, looking at both numbers being a little less than 4KiB, I
> suspect the same with whatever happened with mxs above.

OK.  At least we do have a plausible explanation now.  The actual code 
being smaller should compensate for section alignment loss.

> ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o.

Not only recursion, but the horrible assembly output from -Os.

> Here's the new diff:
> 
> changes from last diff:
> - enforce -O2 for bswapsdi2.o
> - fix building out-of-source tree
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 4265a26..5e8b735 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -58,6 +58,7 @@ config ARM
>  	select CLONE_BACKWARDS
>  	select OLD_SIGSUSPEND3
>  	select OLD_SIGACTION
> +	select ARCH_USE_BUILTIN_BSWAP
>  	help
>  	  The ARM series is a line of low-power-consumption RISC chip designs
>  	  licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index c9865f6..8ef97c4 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -111,12 +111,12 @@ endif
>  
>  targets       := vmlinux vmlinux.lds \
>  		 piggy.$(suffix_y) piggy.$(suffix_y).o \
> -		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> +		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
>  		 font.o font.c head.o misc.o $(OBJS)
>  
>  # Make sure files are removed during clean
>  extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> -		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +		 lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
>  
>  ifeq ($(CONFIG_FUNCTION_TRACER),y)
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
>  $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
>  	$(call cmd,shipped)
>  
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> +	$(call cmd,shipped)
> +

I don't think you can get away with this.  The decompressor code is 
compiled with -fpic and the main kernel is not.  Most toolchains do mark 
object files with some flags to prevent the link of incompatible objects 
together (normally pic and non pic objects are not compatible even if in 
this very simple case that would not matter).  Maybe you are able to 
link zImage successfully simply because no references to __bswap* needed 
to be resolved and therefore the linker didn't need to search/include 
that object?

> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..dbee639 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
>  		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
>  		   ucmpdi2.o lib1funcs.o div64.o                      \
>  		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
> -		   call_with_stack.o
> +		   call_with_stack.o bswapsdi2.o
>  
>  mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
>  
> @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK)	+= io-shark.o
>  
>  $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
>  $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
> +
> +CFLAGS_bswapsdi2.o = -O2

Please insert a small comment to explain why this is done.  Adding some 
more elaborate explanation in the commit log would be good too.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20  3:17                                             ` Nicolas Pitre
@ 2013-02-20 10:38                                               ` Woodhouse, David
  2013-02-20 13:36                                                 ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-20 10:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote:
> 
> > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> > +     $(call cmd,shipped)
> > +
> 
> I don't think you can get away with this.  The decompressor code is 
> compiled with -fpic and the main kernel is not.  Most toolchains do mark 
> object files with some flags to prevent the link of incompatible objects 
> together (normally pic and non pic objects are not compatible even if in 
> this very simple case that would not matter).  Maybe you are able to 
> link zImage successfully simply because no references to __bswap* needed 
> to be resolved and therefore the linker didn't need to search/include 
> that object?

This, and the issue with -Os vs. -O2, make me inclined to suggest that
we should just provide an asm version of these functions instead of
using the compiler.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6242 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 10:38                                               ` Woodhouse, David
@ 2013-02-20 13:36                                                 ` Nicolas Pitre
  2013-02-20 13:44                                                   ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-20 13:36 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote:
> > 
> > > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> > > +     $(call cmd,shipped)
> > > +
> > 
> > I don't think you can get away with this.  The decompressor code is 
> > compiled with -fpic and the main kernel is not.  Most toolchains do mark 
> > object files with some flags to prevent the link of incompatible objects 
> > together (normally pic and non pic objects are not compatible even if in 
> > this very simple case that would not matter).  Maybe you are able to 
> > link zImage successfully simply because no references to __bswap* needed 
> > to be resolved and therefore the linker didn't need to search/include 
> > that object?
> 
> This, and the issue with -Os vs. -O2, make me inclined to suggest that
> we should just provide an asm version of these functions instead of
> using the compiler.

You'll have the same issue wrt the above whether or not the source file 
is C or assembly.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 13:36                                                 ` Nicolas Pitre
@ 2013-02-20 13:44                                                   ` Woodhouse, David
  2013-02-20 14:06                                                     ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-20 13:44 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> You'll have the same issue wrt the above whether or not the source
> file is C or assembly.

Hm, true. I was thinking of the code itself (which is
position-independent anyway), rather than the flags in the object file.

So just ship a .S file and for the decompressor (if we need it at all)
rebuild it just the same as we do the *other* libgcc code like ashldi3.S
etc.


-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6242 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 13:44                                                   ` Woodhouse, David
@ 2013-02-20 14:06                                                     ` Nicolas Pitre
  2013-02-20 14:53                                                       ` Woodhouse, David
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-20 14:06 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > You'll have the same issue wrt the above whether or not the source
> > file is C or assembly.
> 
> Hm, true. I was thinking of the code itself (which is
> position-independent anyway), rather than the flags in the object file.
> 
> So just ship a .S file and for the decompressor (if we need it at all)
> rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> etc.

... in which case there is no harm shipping a .c file and trivially 
enforcing -O2, the rest being equal.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 14:06                                                     ` Nicolas Pitre
@ 2013-02-20 14:53                                                       ` Woodhouse, David
  2013-02-20 15:43                                                         ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-20 14:53 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> On Wed, 20 Feb 2013, Woodhouse, David wrote:
> 
> > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > > You'll have the same issue wrt the above whether or not the source
> > > file is C or assembly.
> > 
> > Hm, true. I was thinking of the code itself (which is
> > position-independent anyway), rather than the flags in the object file.
> > 
> > So just ship a .S file and for the decompressor (if we need it at all)
> > rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> > etc.
> 
> ... in which case there is no harm shipping a .c file and trivially 
> enforcing -O2, the rest being equal.

For today's compilers, unless the wind changes.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6242 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 14:53                                                       ` Woodhouse, David
@ 2013-02-20 15:43                                                         ` Nicolas Pitre
  2013-02-21  3:49                                                           ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-20 15:43 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > 
> > > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > > > You'll have the same issue wrt the above whether or not the source
> > > > file is C or assembly.
> > > 
> > > Hm, true. I was thinking of the code itself (which is
> > > position-independent anyway), rather than the flags in the object file.
> > > 
> > > So just ship a .S file and for the decompressor (if we need it at all)
> > > rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> > > etc.
> > 
> > ... in which case there is no harm shipping a .c file and trivially 
> > enforcing -O2, the rest being equal.
> 
> For today's compilers, unless the wind changes.

We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20 15:43                                                         ` Nicolas Pitre
@ 2013-02-21  3:49                                                           ` Kim Phillips
  2013-02-21  4:29                                                             ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-21  3:49 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013 10:43:18 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > ... in which case there is no harm shipping a .c file and trivially 
> > > enforcing -O2, the rest being equal.
> > 
> > For today's compilers, unless the wind changes.
> 
> We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.

Alas, not so for gcc 4.4 - I had forgotten I had tested
Ubuntu/Linaro 4.4.7-1ubuntu2 here:

https://patchwork.kernel.org/patch/2101491/

add -O2 to that test script and gcc 4.4 *always* emits calls to
__bswap[sd]i2, even with -march=armv6k+.

I'll try working on an assembly version given it probably
makes more sense, future-gcc-immunity-wise.

Otherwise we're back to the old 'if GCC_VERSION >= 40500' in
arch/arm/include/asm/swab.h...

Kim


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21  3:49                                                           ` Kim Phillips
@ 2013-02-21  4:29                                                             ` Nicolas Pitre
  2013-02-21  6:52                                                               ` Kim Phillips
  2013-02-21 16:37                                                               ` [RFC] " Woodhouse, David
  0 siblings, 2 replies; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-21  4:29 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 10:43:18 -0500
> Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > enforcing -O2, the rest being equal.
> > > 
> > > For today's compilers, unless the wind changes.
> > 
> > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> 
> Alas, not so for gcc 4.4 - I had forgotten I had tested
> Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> 
> https://patchwork.kernel.org/patch/2101491/
> 
> add -O2 to that test script and gcc 4.4 *always* emits calls to
> __bswap[sd]i2, even with -march=armv6k+.

Crap.  OK, assembly code is the way to go then.

> I'll try working on an assembly version given it probably
> makes more sense, future-gcc-immunity-wise.

Agreed.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21  4:29                                                             ` Nicolas Pitre
@ 2013-02-21  6:52                                                               ` Kim Phillips
  2013-02-21 16:40                                                                 ` Nicolas Pitre
  2013-02-21 16:37                                                               ` [RFC] " Woodhouse, David
  1 sibling, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-21  6:52 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Wed, 20 Feb 2013 23:29:58 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Wed, 20 Feb 2013, Kim Phillips wrote:
> 
> > On Wed, 20 Feb 2013 10:43:18 -0500
> > Nicolas Pitre <nico@fluxnic.net> wrote:
> > 
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > > enforcing -O2, the rest being equal.
> > > > 
> > > > For today's compilers, unless the wind changes.
> > > 
> > > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> > 
> > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > 
> > https://patchwork.kernel.org/patch/2101491/
> > 
> > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > __bswap[sd]i2, even with -march=armv6k+.

argh, sorry - that script was testing support for 
__builtin_bswap{16,32,64} directly, which isn't the same as testing
code generation of a byte swap pattern in C.

> Crap.  OK, assembly code is the way to go then.
> 
> > I'll try working on an assembly version given it probably
> > makes more sense, future-gcc-immunity-wise.
> 
> Agreed.

I'll still try the assembly approach - gcc 4.4's armv6 output looks
worse than both the pre-armv6 and post-armv6 __arch_swab32
implementations currently in use:

mov     ip, sp
push    {fp, ip, lr, pc}
sub     fp, ip, #4
and     r2, r0, #65280  ; 0xff00
lsl     ip, r0, #24
orr     r1, ip, r0, lsr #24
and     r0, r0, #16711680       ; 0xff0000
orr     r3, r1, r2, lsl #8
orr     r0, r3, r0, lsr #8
ldm     sp, {fp, sp, pc}

Kim


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21  4:29                                                             ` Nicolas Pitre
  2013-02-21  6:52                                                               ` Kim Phillips
@ 2013-02-21 16:37                                                               ` Woodhouse, David
  2013-02-21 17:27                                                                 ` Nicolas Pitre
  1 sibling, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-21 16:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 513 bytes --]

On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote:
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > For today's compilers, unless the wind changes.
>  … 
> Crap.  OK, assembly code is the way to go then.

How quickly the wind changes, these days. I blame global warming. :)

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6242 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21  6:52                                                               ` Kim Phillips
@ 2013-02-21 16:40                                                                 ` Nicolas Pitre
  2013-02-22  2:33                                                                   ` Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-21 16:40 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 21 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 23:29:58 -0500
> Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > On Wed, 20 Feb 2013, Kim Phillips wrote:
> > 
> > > On Wed, 20 Feb 2013 10:43:18 -0500
> > > Nicolas Pitre <nico@fluxnic.net> wrote:
> > > 
> > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > > > enforcing -O2, the rest being equal.
> > > > > 
> > > > > For today's compilers, unless the wind changes.
> > > > 
> > > > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> > > 
> > > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > > 
> > > https://patchwork.kernel.org/patch/2101491/
> > > 
> > > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > > __bswap[sd]i2, even with -march=armv6k+.
> 
> argh, sorry - that script was testing support for 
> __builtin_bswap{16,32,64} directly, which isn't the same as testing
> code generation of a byte swap pattern in C.

Still, I'm not as confident as I was about this.

> I'll still try the assembly approach - gcc 4.4's armv6 output looks
> worse than both the pre-armv6 and post-armv6 __arch_swab32
> implementations currently in use:
> 
> mov     ip, sp
> push    {fp, ip, lr, pc}
> sub     fp, ip, #4

You should use -fomit-frame-pointer to compile this.  We don't need a 
frame pointer here, especially for a leaf function that the compiler 
decides to call on its own.

> and     r2, r0, #65280  ; 0xff00
> lsl     ip, r0, #24
> orr     r1, ip, r0, lsr #24
> and     r0, r0, #16711680       ; 0xff0000
> orr     r3, r1, r2, lsl #8
> orr     r0, r3, r0, lsr #8

Other than that, it is true that the above is slightly suboptimal.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21 16:37                                                               ` [RFC] " Woodhouse, David
@ 2013-02-21 17:27                                                                 ` Nicolas Pitre
  0 siblings, 0 replies; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-21 17:27 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: TEXT/PLAIN, Size: 403 bytes --]

On Thu, 21 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote:
> > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > For today's compilers, unless the wind changes.
> >  … 
> > Crap.  OK, assembly code is the way to go then.
> 
> How quickly the wind changes, these days. I blame global warming. :)

Only fools don't believe global warming.


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-21 16:40                                                                 ` Nicolas Pitre
@ 2013-02-22  2:33                                                                   ` Kim Phillips
  2013-02-22  3:40                                                                     ` Nicolas Pitre
  0 siblings, 1 reply; 63+ messages in thread
From: Kim Phillips @ 2013-02-22  2:33 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 21 Feb 2013 11:40:54 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Thu, 21 Feb 2013, Kim Phillips wrote:
> 
> > On Wed, 20 Feb 2013 23:29:58 -0500
> > Nicolas Pitre <nico@fluxnic.net> wrote:
> > 
> > > On Wed, 20 Feb 2013, Kim Phillips wrote:
> > > 
> > > > On Wed, 20 Feb 2013 10:43:18 -0500
> > > > Nicolas Pitre <nico@fluxnic.net> wrote:
> > > > 
> > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > > > ... in which case there is no harm shipping a .c file and trivially 
> > > > > > > enforcing -O2, the rest being equal.
> > > > > > 
> > > > > > For today's compilers, unless the wind changes.
> > > > > 
> > > > > We'll adapt if necessary.  Going with -O2 should remain pretty safe anyway.
> > > > 
> > > > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > > > 
> > > > https://patchwork.kernel.org/patch/2101491/
> > > > 
> > > > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > > > __bswap[sd]i2, even with -march=armv6k+.
> > 
> > argh, sorry - that script was testing support for 
> > __builtin_bswap{16,32,64} directly, which isn't the same as testing
> > code generation of a byte swap pattern in C.
> 
> Still, I'm not as confident as I was about this.

which part exactly?  Having -O2 as "protection"?  Yes, me neither.

> > I'll still try the assembly approach - gcc 4.4's armv6 output looks
> > worse than both the pre-armv6 and post-armv6 __arch_swab32
> > implementations currently in use:
> > 
> > mov     ip, sp
> > push    {fp, ip, lr, pc}
> > sub     fp, ip, #4
> 
> You should use -fomit-frame-pointer to compile this.  We don't need a 
> frame pointer here, especially for a leaf function that the compiler 
> decides to call on its own.
> 
> > and     r2, r0, #65280  ; 0xff00
> > lsl     ip, r0, #24
> > orr     r1, ip, r0, lsr #24
> > and     r0, r0, #16711680       ; 0xff0000
> > orr     r3, r1, r2, lsl #8
> > orr     r0, r3, r0, lsr #8
> 
> Other than that, it is true that the above is slightly suboptimal.

Here's the asm version I'm working on now, based on compiler
output of the C version.  Haven't tested beyond defconfig builds,
which pass ok.

Is there anything I have to do for thumb mode?  If so, how to test?

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..e8a41d0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..a277e97 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -108,12 +108,12 @@ endif
 
 targets       := vmlinux vmlinux.lds \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
-		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
 		 font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
-		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+		 lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
 	$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+	$(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+		$(bswapsdi2) FORCE
 	@$(check_for_multiple_zreladdr)
 	$(call if_changed,ld)
 	@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
 EXPORT_SYMBOL(__udivsi3);
 EXPORT_SYMBOL(__umodsi3);
 EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);
 
 #ifdef CONFIG_AEABI
 EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
 		   ucmpdi2.o lib1funcs.o div64.o                      \
 		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
-		   call_with_stack.o
+		   call_with_stack.o bswapsdi2.o
 
 mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
 
diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..e9c8ca7
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+	rev	r0, r0
+	bx	lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	rev	r3, r0
+	rev	r0, r1
+	mov	r1, r3
+	bx	lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+	eor     r3, r0, r0, ror #16
+	lsr     r3, r3, #8
+	bic     r3, r3, #65280  @ 0xff00
+	eor     r0, r3, r0, ror #8
+	mov     pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	mov     ip, r1
+	eor     r3, ip, ip, ror #16
+	eor     r1, r0, r0, ror #16
+	lsr     r1, r1, #8
+	lsr     r3, r3, #8
+	bic     r3, r3, #65280  @ 0xff00
+	bic     r1, r1, #65280  @ 0xff00
+	eor     r1, r1, r0, ror #8
+	eor     r0, r3, ip, ror #8
+	mov     pc, lr
+ENDPROC(__bswapdi2)
+#endif

Thanks,

Kim


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-22  2:33                                                                   ` Kim Phillips
@ 2013-02-22  3:40                                                                     ` Nicolas Pitre
  2013-02-23  1:40                                                                       ` [PATCH v6] " Kim Phillips
  0 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-22  3:40 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 21 Feb 2013, Kim Phillips wrote:

> Here's the asm version I'm working on now, based on compiler
> output of the C version.  Haven't tested beyond defconfig builds,
> which pass ok.
> 
> Is there anything I have to do for thumb mode?  If so, how to test?

You just need to pick a config that uses some ARMv7 processor, and 
enable CONFIG_THUMB2_KERNEL.  I don't see any problem with your patch 
wrt Thumb2.

Still, I have minor comments below.

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dedf02b..e8a41d0 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
>  	select CLONE_BACKWARDS
>  	select OLD_SIGSUSPEND3
>  	select OLD_SIGACTION
> +	select ARCH_USE_BUILTIN_BSWAP
>  	help
>  	  The ARM series is a line of low-power-consumption RISC chip designs
>  	  licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..a277e97 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -108,12 +108,12 @@ endif
>  
>  targets       := vmlinux vmlinux.lds \
>  		 piggy.$(suffix_y) piggy.$(suffix_y).o \
> -		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> +		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \

Should be both bswapsdi2.o bswapsdi2.S

>  		 font.o font.c head.o misc.o $(OBJS)
>  
>  # Make sure files are removed during clean
>  extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> -		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +		 lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

Should be bswapsdi2.S.

>  ifeq ($(CONFIG_FUNCTION_TRACER),y)
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
>  $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
>  	$(call cmd,shipped)
>  
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> +	$(call cmd,shipped)
> +
>  # We need to prevent any GOTOFF relocs being used with references
>  # to symbols in the .bss section since we cannot relocate them
>  # independently from the rest at run time.  This can be achieved by
> @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
>  fi
>  
>  $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> -		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> +		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> +		$(bswapsdi2) FORCE
>  	@$(check_for_multiple_zreladdr)
>  	$(call if_changed,ld)
>  	@$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
>  extern void __udivsi3(void);
>  extern void __umodsi3(void);
>  extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>  
>  extern void __aeabi_idiv(void);
>  extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
>  EXPORT_SYMBOL(__udivsi3);
>  EXPORT_SYMBOL(__umodsi3);
>  EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>  
>  #ifdef CONFIG_AEABI
>  EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
>  		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
>  		   ucmpdi2.o lib1funcs.o div64.o                      \
>  		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
> -		   call_with_stack.o
> +		   call_with_stack.o bswapsdi2.o
>  
>  mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
>  
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..e9c8ca7
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> +	rev	r0, r0
> +	bx	lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	rev	r3, r0
> +	rev	r0, r1
> +	mov	r1, r3
> +	bx	lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> +	eor     r3, r0, r0, ror #16
> +	lsr     r3, r3, #8

Some older binutils used with pre ARMv6 platforms don't understand the 
latest unified syntax.  So in this case it is better to use:

	mov	r3, r3, lsr #8

> +	bic     r3, r3, #65280  @ 0xff00

Please use #0xff00 directly rather than keeping it as a comment.

> +	eor     r0, r3, r0, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	mov     ip, r1
> +	eor     r3, ip, ip, ror #16
> +	eor     r1, r0, r0, ror #16
> +	lsr     r1, r1, #8
> +	lsr     r3, r3, #8
> +	bic     r3, r3, #65280  @ 0xff00
> +	bic     r1, r1, #65280  @ 0xff00

Same comments for the 4 instructions above.

> +	eor     r1, r1, r0, ror #8
> +	eor     r0, r3, ip, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
> 
> Thanks,
> 
> Kim
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v6] arm: use built-in byte swap function
  2013-02-22  3:40                                                                     ` Nicolas Pitre
@ 2013-02-23  1:40                                                                       ` Kim Phillips
  2013-02-23  2:40                                                                         ` Nicolas Pitre
  2013-02-23 23:20                                                                         ` Woodhouse, David
  0 siblings, 2 replies; 63+ messages in thread
From: Kim Phillips @ 2013-02-23  1:40 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 21 Feb 2013 22:40:08 -0500
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Thu, 21 Feb 2013, Kim Phillips wrote:
> 
> > Here's the asm version I'm working on now, based on compiler
> > output of the C version.  Haven't tested beyond defconfig builds,
> > which pass ok.
> > 
> > Is there anything I have to do for thumb mode?  If so, how to test?
> 
> You just need to pick a config that uses some ARMv7 processor, and 
> enable CONFIG_THUMB2_KERNEL.  I don't see any problem with your patch 
> wrt Thumb2.

ok, I've addressed your comments and tested both pre-armv6 and armv6
+ bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and
CONFIG_THUMB2_KERNEL set:

>From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001
From: Kim Phillips <kim.phillips@freescale.com>
Date: Tue, 19 Feb 2013 17:16:11 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):

   text	   data	    bss	    dec	    hex	filename
2754100	 121144	  56520	2931764	 2cbc34	vmlinux-lart #orig
2754050	 121144	  56520	2931714	 2cbc02	vmlinux-lart #builtin-bswap
6282699	 307852	5578076	12168627 b9adb3	vmlinux-mxs #orig
6282241	 307832	5578076	12168149 b9abd5	vmlinux-mxs #builtin-bswap
7200193	 364180	 361748	7926121	 78f169	vmlinux-imx_v6_v7 #orig
7199515	 364188	 361748	7925451	 78eecb	vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130221.

changes from last diff:
- addressed Nicolas' comments
- updated commit text figures and reformatted as a patch

changes from diff before that:
- 1st asm version

changes from diff before that:
- enforce -O2 for bswapsdi2.o
- fix building out-of-source tree

changes from diff before that:
- implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c

v5: re-work based on new gcc version test data:
  - moved outside armv6 protection
  - check for gcc 4.6+ demoted to gcc 4.5+ with:
    !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)

v4:
- undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
  and David - object is to find arches that define _HAVE_BSWAP
  and clean it up in the future: patch is much less intrusive. :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
  (i.e., ARM cores that have support for the 'rev' instruction).
  Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
  these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
  All ARM defconfigs now have the same build status as they did
  without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
  - pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
  - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
  - not too sure about this having to be a new CONFIG_, but it's hard
    to find a place for it given linux/compiler.h doesn't include any
    arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
  as is done in David Woodhouse's original patchseries for ppc/x86.

 arch/arm/Kconfig                  |  1 +
 arch/arm/boot/compressed/Makefile | 15 +++++++++++----
 arch/arm/kernel/armksyms.c        |  4 ++++
 arch/arm/lib/Makefile             |  2 +-
 arch/arm/lib/bswapsdi2.S          | 36 ++++++++++++++++++++++++++++++++++++
 5 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/lib/bswapsdi2.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..e8a41d0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
 	select CLONE_BACKWARDS
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
+	select ARCH_USE_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..d9b5ee5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -108,12 +108,12 @@ endif
 
 targets       := vmlinux vmlinux.lds \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
-		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
-		 font.o font.c head.o misc.o $(OBJS)
+		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+		 bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
-		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+		 lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
 	$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+	$(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+		$(bswapsdi2) FORCE
 	@$(check_for_multiple_zreladdr)
 	$(call if_changed,ld)
 	@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
 EXPORT_SYMBOL(__udivsi3);
 EXPORT_SYMBOL(__umodsi3);
 EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);
 
 #ifdef CONFIG_AEABI
 EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
 		   ucmpdi2.o lib1funcs.o div64.o                      \
 		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
-		   call_with_stack.o
+		   call_with_stack.o bswapsdi2.o
 
 mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
 
diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..2ba43a0
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+	rev	r0, r0
+	bx	lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	rev	r3, r0
+	rev	r0, r1
+	mov	r1, r3
+	bx	lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+	eor     r3, r0, r0, ror #16
+	mov     r3, r3, lsr #8
+	bic     r3, r3, #0xff00
+	eor     r0, r3, r0, ror #8
+	mov     pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	mov     ip, r1
+	eor     r3, ip, ip, ror #16
+	eor     r1, r0, r0, ror #16
+	mov     r1, r1, lsr #8
+	mov     r3, r3, lsr #8
+	bic     r3, r3, #0xff00
+	bic     r1, r1, #0xff00
+	eor     r1, r1, r0, ror #8
+	eor     r0, r3, ip, ror #8
+	mov     pc, lr
+ENDPROC(__bswapdi2)
+#endif
-- 
1.8.1.4



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v6] arm: use built-in byte swap function
  2013-02-23  1:40                                                                       ` [PATCH v6] " Kim Phillips
@ 2013-02-23  2:40                                                                         ` Nicolas Pitre
  2013-02-23 23:20                                                                         ` Woodhouse, David
  1 sibling, 0 replies; 63+ messages in thread
From: Nicolas Pitre @ 2013-02-23  2:40 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Fri, 22 Feb 2013, Kim Phillips wrote:

> On Thu, 21 Feb 2013 22:40:08 -0500
> Nicolas Pitre <nico@fluxnic.net> wrote:
> 
> > On Thu, 21 Feb 2013, Kim Phillips wrote:
> > 
> > > Here's the asm version I'm working on now, based on compiler
> > > output of the C version.  Haven't tested beyond defconfig builds,
> > > which pass ok.
> > > 
> > > Is there anything I have to do for thumb mode?  If so, how to test?
> > 
> > You just need to pick a config that uses some ARMv7 processor, and 
> > enable CONFIG_THUMB2_KERNEL.  I don't see any problem with your patch 
> > wrt Thumb2.
> 
> ok, I've addressed your comments and tested both pre-armv6 and armv6
> + bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and
> CONFIG_THUMB2_KERNEL set:
> 
> >From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001
> From: Kim Phillips <kim.phillips@freescale.com>
> Date: Tue, 19 Feb 2013 17:16:11 -0600
> Subject: [PATCH] arm: use built-in byte swap function
> 
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):
> 
>    text	   data	    bss	    dec	    hex	filename
> 2754100	 121144	  56520	2931764	 2cbc34	vmlinux-lart #orig
> 2754050	 121144	  56520	2931714	 2cbc02	vmlinux-lart #builtin-bswap
> 6282699	 307852	5578076	12168627 b9adb3	vmlinux-mxs #orig
> 6282241	 307832	5578076	12168149 b9abd5	vmlinux-mxs #builtin-bswap
> 7200193	 364180	 361748	7926121	 78f169	vmlinux-imx_v6_v7 #orig
> 7199515	 364188	 361748	7925451	 78eecb	vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>

Reviewed-by: Nicolas Pitre <nico@linaro.org>


> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
> 
> based on linux-next-20130221.
> 
> changes from last diff:
> - addressed Nicolas' comments
> - updated commit text figures and reformatted as a patch
> 
> changes from diff before that:
> - 1st asm version
> 
> changes from diff before that:
> - enforce -O2 for bswapsdi2.o
> - fix building out-of-source tree
> 
> changes from diff before that:
> - implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c
> 
> v5: re-work based on new gcc version test data:
>   - moved outside armv6 protection
>   - check for gcc 4.6+ demoted to gcc 4.5+ with:
>     !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)
> 
> v4:
> - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
>   and David - object is to find arches that define _HAVE_BSWAP
>   and clean it up in the future: patch is much less intrusive. :)
> 
> v3:
> - moved out of uapi swab.h into arch/arm/include/asm/swab.h
> - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
> - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block
> 
> v2:
> - at91 and lpd270 builds fixed by limiting to ARMv6 and above
>   (i.e., ARM cores that have support for the 'rev' instruction).
>   Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
>   these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
>   All ARM defconfigs now have the same build status as they did
>   without this patch (some are broken on linux-next).
> 
> - move ARM check from generic compiler.h to arch ARM's swab.h.
>   - pretty sure it should be limited to __KERNEL__ builds
> 
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
>   - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
>   - not too sure about this having to be a new CONFIG_, but it's hard
>     to find a place for it given linux/compiler.h doesn't include any
>     arch-specific files.
> 
> - move new selects to end of CONFIG_ARM's Kconfig select list,
>   as is done in David Woodhouse's original patchseries for ppc/x86.
> 
>  arch/arm/Kconfig                  |  1 +
>  arch/arm/boot/compressed/Makefile | 15 +++++++++++----
>  arch/arm/kernel/armksyms.c        |  4 ++++
>  arch/arm/lib/Makefile             |  2 +-
>  arch/arm/lib/bswapsdi2.S          | 36 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 53 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm/lib/bswapsdi2.S
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dedf02b..e8a41d0 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
>  	select CLONE_BACKWARDS
>  	select OLD_SIGSUSPEND3
>  	select OLD_SIGACTION
> +	select ARCH_USE_BUILTIN_BSWAP
>  	help
>  	  The ARM series is a line of low-power-consumption RISC chip designs
>  	  licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..d9b5ee5 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -108,12 +108,12 @@ endif
>  
>  targets       := vmlinux vmlinux.lds \
>  		 piggy.$(suffix_y) piggy.$(suffix_y).o \
> -		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> -		 font.o font.c head.o misc.o $(OBJS)
> +		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> +		 bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
>  
>  # Make sure files are removed during clean
>  extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> -		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +		 lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
>  
>  ifeq ($(CONFIG_FUNCTION_TRACER),y)
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
>  $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
>  	$(call cmd,shipped)
>  
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> +	$(call cmd,shipped)
> +
>  # We need to prevent any GOTOFF relocs being used with references
>  # to symbols in the .bss section since we cannot relocate them
>  # independently from the rest at run time.  This can be achieved by
> @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
>  fi
>  
>  $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> -		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> +		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> +		$(bswapsdi2) FORCE
>  	@$(check_for_multiple_zreladdr)
>  	$(call if_changed,ld)
>  	@$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
>  extern void __udivsi3(void);
>  extern void __umodsi3(void);
>  extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>  
>  extern void __aeabi_idiv(void);
>  extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
>  EXPORT_SYMBOL(__udivsi3);
>  EXPORT_SYMBOL(__umodsi3);
>  EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>  
>  #ifdef CONFIG_AEABI
>  EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
>  		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
>  		   ucmpdi2.o lib1funcs.o div64.o                      \
>  		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
> -		   call_with_stack.o
> +		   call_with_stack.o bswapsdi2.o
>  
>  mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
>  
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..2ba43a0
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> +	rev	r0, r0
> +	bx	lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	rev	r3, r0
> +	rev	r0, r1
> +	mov	r1, r3
> +	bx	lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> +	eor     r3, r0, r0, ror #16
> +	mov     r3, r3, lsr #8
> +	bic     r3, r3, #0xff00
> +	eor     r0, r3, r0, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	mov     ip, r1
> +	eor     r3, ip, ip, ror #16
> +	eor     r1, r0, r0, ror #16
> +	mov     r1, r1, lsr #8
> +	mov     r3, r3, lsr #8
> +	bic     r3, r3, #0xff00
> +	bic     r1, r1, #0xff00
> +	eor     r1, r1, r0, ror #8
> +	eor     r0, r3, ip, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
> -- 
> 1.8.1.4
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v6] arm: use built-in byte swap function
  2013-02-23  1:40                                                                       ` [PATCH v6] " Kim Phillips
  2013-02-23  2:40                                                                         ` Nicolas Pitre
@ 2013-02-23 23:20                                                                         ` Woodhouse, David
  2013-05-23 16:46                                                                           ` [PATCH v7] " Kim Phillips
  1 sibling, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2013-02-23 23:20 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Nicolas Pitre, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Fri, 2013-02-22 at 19:40 -0600, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):
> 
>    text    data     bss     dec     hex filename
> 2754100  121144   56520 2931764  2cbc34 vmlinux-lart #orig
> 2754050  121144   56520 2931714  2cbc02 vmlinux-lart #builtin-bswap
> 6282699  307852 5578076 12168627 b9adb3 vmlinux-mxs #orig
> 6282241  307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap
> 7200193  364180  361748 7926121  78f169 vmlinux-imx_v6_v7 #orig
> 7199515  364188  361748 7925451  78eecb vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>

Looks good, thanks.

Acked-by: David Woodhouse <David.Woodhouse@intel.com>


-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation




[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4370 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
  2013-02-20  2:31                                           ` Kim Phillips
  2013-02-20  2:38                                             ` Stephen Boyd
  2013-02-20  3:17                                             ` Nicolas Pitre
@ 2013-03-13 13:35                                             ` Woodhouse, David
  2 siblings, 0 replies; 63+ messages in thread
From: Woodhouse, David @ 2013-03-13 13:35 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Nicolas Pitre, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On Tue, 2013-02-19 at 20:31 -0600, Kim Phillips wrote:
> 
> > > imx_v6_v7_defconfig:        7672373         7667089         -5284
> > > lart_defconfig:     2941150         2941054         -96
> > > mxs_defconfig:      11091983        11095679        3696
> > 
> > The savings are good, with some impressive cases.  However the 
> > mxs_defconfig is completely the opposite and by far.  Any idea?
> 
> Unfortunately, I don't seem to be able to reproduce this anymore.
> Same linux-next, with three different compilers always produces
> smaller binaries:
> 
>    text    data     bss     dec     hex filename
> 5239363  280576 5569648 11089587         a936b3 linux-next-mxs-orig-gcc4.7/vmlinux
> 5239169  280556 5569648 11089373         a935dd linux-next-mxs-bswap-gcc4.7/vmlinux
> 
> 5262223  280592 5569648 11112463         a9900f linux-next-mxs-orig-gcc4.6.3/vmlinux
> 5261909  280584 5569648 11112141         a98ecd linux-next-mxs-bswap-gcc4.6.3/vmlinux
> 
> 5241379  280580 5569648 11091607         a93e97 linux-next-mxs-orig-gcc4.6ubuntu/vmlinux
> 5241189  280600 5569648 11091437         a93ded linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux
> 
> So I've since made a more consistent cross-build environment, using
> cross tools from Linaro [1,2] instead of via Ubuntu [3].

Andrew Pinski has done some work on GCC to support further
optimisations: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177

If you feel like building with the branch at
http://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/pinskia/bytewiseunop and seeing how that affects the results, that could be interesting.

-- 
                   Sent with MeeGo's ActiveSync support.

David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6242 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH v7] arm: use built-in byte swap function
  2013-02-23 23:20                                                                         ` Woodhouse, David
@ 2013-05-23 16:46                                                                           ` Kim Phillips
  2013-05-23 20:09                                                                             ` Nicolas Pitre
                                                                                               ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Kim Phillips @ 2013-05-23 16:46 UTC (permalink / raw)
  To: Woodhouse, David, Russell King - ARM Linux
  Cc: Nicolas Pitre, Borislav Petkov, Andrew Morton, Daniel Santos,
	David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel,
	Rob Herring

Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

   text	   data	    bss	    dec	    hex	filename
2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap

6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap

7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
---
resending as v6 appears to have fallen though the cracks.  Russell?

v7: rebased onto next-20130521, re-ran above vmlinux sizes with
    Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by.
v6 and prior version information:
    https://lkml.org/lkml/2013/2/22/475

 arch/arm/Kconfig                  |  1 +
 arch/arm/boot/compressed/Makefile | 15 +++++++++++----
 arch/arm/kernel/armksyms.c        |  4 ++++
 arch/arm/lib/Makefile             |  2 +-
 arch/arm/lib/bswapsdi2.S          | 36 ++++++++++++++++++++++++++++++++++++
 5 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/lib/bswapsdi2.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7fc5ea..c2fe04d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -63,6 +63,7 @@ config ARM
 	select OLD_SIGSUSPEND3
 	select OLD_SIGACTION
 	select HAVE_CONTEXT_TRACKING
+	select ARCH_USE_BUILTIN_BSWAP
 	help
 	  The ARM series is a line of low-power-consumption RISC chip designs
 	  licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 198a4ad..bd8a176 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -112,12 +112,12 @@ endif
 
 targets       := vmlinux vmlinux.lds \
 		 piggy.$(suffix_y) piggy.$(suffix_y).o \
-		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
-		 font.o font.c head.o misc.o $(OBJS)
+		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+		 bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
 
 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
-		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+		 lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
 
 ifeq ($(CONFIG_FUNCTION_TRACER),y)
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o
 $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
 	$(call cmd,shipped)
 
+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+	$(call cmd,shipped)
+
 # We need to prevent any GOTOFF relocs being used with references
 # to symbols in the .bss section since we cannot relocate them
 # independently from the rest at run time.  This can be achieved by
@@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
 fi
 
 $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
-		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+		$(bswapsdi2) FORCE
 	@$(check_for_multiple_zreladdr)
 	$(call if_changed,ld)
 	@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
 extern void __udivsi3(void);
 extern void __umodsi3(void);
 extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);
 
 extern void __aeabi_idiv(void);
 extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
 EXPORT_SYMBOL(__udivsi3);
 EXPORT_SYMBOL(__umodsi3);
 EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);
 
 #ifdef CONFIG_AEABI
 EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
 		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
 		   ucmpdi2.o lib1funcs.o div64.o                      \
 		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
-		   call_with_stack.o
+		   call_with_stack.o bswapsdi2.o
 
 mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
 
diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..2ba43a0
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+	rev	r0, r0
+	bx	lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	rev	r3, r0
+	rev	r0, r1
+	mov	r1, r3
+	bx	lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+	eor     r3, r0, r0, ror #16
+	mov     r3, r3, lsr #8
+	bic     r3, r3, #0xff00
+	eor     r0, r3, r0, ror #8
+	mov     pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+	mov     ip, r1
+	eor     r3, ip, ip, ror #16
+	eor     r1, r0, r0, ror #16
+	mov     r1, r1, lsr #8
+	mov     r3, r3, lsr #8
+	bic     r3, r3, #0xff00
+	bic     r1, r1, #0xff00
+	eor     r1, r1, r0, ror #8
+	eor     r0, r3, ip, ror #8
+	mov     pc, lr
+ENDPROC(__bswapdi2)
+#endif
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-05-23 16:46                                                                           ` [PATCH v7] " Kim Phillips
@ 2013-05-23 20:09                                                                             ` Nicolas Pitre
  2013-05-23 23:13                                                                             ` Russell King - ARM Linux
  2013-10-27  2:41                                                                             ` Nicolas Pitre
  2 siblings, 0 replies; 63+ messages in thread
From: Nicolas Pitre @ 2013-05-23 20:09 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 23 May 2013, Kim Phillips wrote:

> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
> 
>    text	   data	    bss	    dec	    hex	filename
> 2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
> 2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap
> 
> 6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
> 6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap
> 
> 7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
> 7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> Acked-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> resending as v6 appears to have fallen though the cracks.  Russell?

Please send your patch to Russell's patch system:

http://www.arm.linux.org.uk/developer/patches/


Nicolas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-05-23 16:46                                                                           ` [PATCH v7] " Kim Phillips
  2013-05-23 20:09                                                                             ` Nicolas Pitre
@ 2013-05-23 23:13                                                                             ` Russell King - ARM Linux
  2013-06-06 22:12                                                                               ` Russell King - ARM Linux
  2013-10-27  2:41                                                                             ` Nicolas Pitre
  2 siblings, 1 reply; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-05-23 23:13 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Nicolas Pitre, Borislav Petkov, Andrew Morton,
	Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel,
	linux-kernel, Rob Herring

On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
> 
>    text	   data	    bss	    dec	    hex	filename
> 2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
> 2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap
> 
> 6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
> 6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap
> 
> 7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
> 7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> Acked-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> resending as v6 appears to have fallen though the cracks.  Russell?

Please put it in the patch system (otherwise I do drop patches.)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-05-23 23:13                                                                             ` Russell King - ARM Linux
@ 2013-06-06 22:12                                                                               ` Russell King - ARM Linux
  2013-06-06 22:23                                                                                 ` Borislav Petkov
  2013-06-07  0:03                                                                                 ` Stephen Rothwell
  0 siblings, 2 replies; 63+ messages in thread
From: Russell King - ARM Linux @ 2013-06-06 22:12 UTC (permalink / raw)
  To: Kim Phillips, Arnd Bergmann, Stephen Rothwell
  Cc: Woodhouse, David, Nicolas Pitre, Rusty Russell, linux-kernel,
	Daniel Santos, Borislav Petkov, David Rientjes, Andrew Morton,
	linux-arm-kernel

On Fri, May 24, 2013 at 12:13:36AM +0100, Russell King - ARM Linux wrote:
> On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote:
> > Enable the compiler intrinsic for byte swapping on arch ARM.  This
> > allows the compiler to detect and be able to optimize out byte
> > swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> > 4.8):
> > 
> >    text	   data	    bss	    dec	    hex	filename
> > 2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
> > 2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap
> > 
> > 6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
> > 6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap
> > 
> > 7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
> > 7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap
> > 
> > Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> > Reviewed-by: Nicolas Pitre <nico@linaro.org>
> > Acked-by: David Woodhouse <David.Woodhouse@intel.com>
> > ---
> > resending as v6 appears to have fallen though the cracks.  Russell?
> 
> Please put it in the patch system (otherwise I do drop patches.)

(Added Arnd/SFR in case they have comments.)

So, we have a problem here - the kind which appears when people stuff
things into the -next tree which aren't destined for the next merge
window.  This is the relevant context from your patch, which is
against linux-next:

-                lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
-                font.o font.c head.o misc.o $(OBJS)
+                lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+                bswapsdi2.S font.o font.c head.o misc.o $(OBJS)

 # Make sure files are removed during clean
 extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
                                                               ^^^^^^^^^
-                lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+                lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)

the underlined bit - piggy.lz4 for those who read mail with proportional
fonts.

That is not in any kernel I have, and if it _is_ something that is
destined for the next merge window, it should be in my tree as it's
a core ARM feature, not in some random other tree.

Short of hand-editing and manually applying the patch, a solution would
be to rebase it on a mainline kernel version, like -rc4, and resubmit
that version instead.  That will ultimately then give sfr a conflict
which should be trivial to resolve - and hopefully we'll find out who's
carrying the LZ4 patch and putting it into linux-next.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-06-06 22:12                                                                               ` Russell King - ARM Linux
@ 2013-06-06 22:23                                                                                 ` Borislav Petkov
  2013-06-07  0:03                                                                                 ` Stephen Rothwell
  1 sibling, 0 replies; 63+ messages in thread
From: Borislav Petkov @ 2013-06-06 22:23 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kim Phillips, Arnd Bergmann, Stephen Rothwell, Woodhouse, David,
	Nicolas Pitre, Rusty Russell, linux-kernel, Daniel Santos,
	David Rientjes, Andrew Morton, linux-arm-kernel

On Thu, Jun 06, 2013 at 11:12:34PM +0100, Russell King - ARM Linux wrote:
> That will ultimately then give sfr a conflict which should be trivial
> to resolve - and hopefully we'll find out who's carrying the LZ4 patch
> and putting it into linux-next.

That should be akpm:

http://ozlabs.org/~akpm/mmotm/broken-out/arm-add-support-for-lz4-compressed-kernel.patch

AFAICT.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-06-06 22:12                                                                               ` Russell King - ARM Linux
  2013-06-06 22:23                                                                                 ` Borislav Petkov
@ 2013-06-07  0:03                                                                                 ` Stephen Rothwell
  1 sibling, 0 replies; 63+ messages in thread
From: Stephen Rothwell @ 2013-06-07  0:03 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kim Phillips, Arnd Bergmann, Woodhouse, David, Nicolas Pitre,
	Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov,
	David Rientjes, Andrew Morton, linux-arm-kernel, Kyungsik Lee

[-- Attachment #1: Type: text/plain, Size: 2386 bytes --]

Hi Russell,

On Thu, 6 Jun 2013 23:12:34 +0100 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
>
> So, we have a problem here - the kind which appears when people stuff
> things into the -next tree which aren't destined for the next merge
> window.  This is the relevant context from your patch, which is
> against linux-next:
> 
> -                lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> -                font.o font.c head.o misc.o $(OBJS)
> +                lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> +                bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
> 
>  # Make sure files are removed during clean
>  extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
>                                                                ^^^^^^^^^
> -                lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +                lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
> 
> the underlined bit - piggy.lz4 for those who read mail with proportional
> fonts.
> 
> That is not in any kernel I have, and if it _is_ something that is
> destined for the next merge window, it should be in my tree as it's
> a core ARM feature, not in some random other tree.

That is commit d8a6bf1b25bd ("arm: add support for LZ4-compressed
kernel") from next-20130606 from the akpm tree.  (adding author cc)  That
patch was cc'd to you, and is part of a series that adds LZ4 compression
to the kernel, so would not work on its own.  The first patch in the
series is "decompressor: add LZ4 decompressor module".


> Short of hand-editing and manually applying the patch, a solution would
> be to rebase it on a mainline kernel version, like -rc4, and resubmit
> that version instead.  That will ultimately then give sfr a conflict
> which should be trivial to resolve - and hopefully we'll find out who's
> carrying the LZ4 patch and putting it into linux-next.

People should *never, ever* submit patches based on linux-next (unless,
of course they are to me to help fix merge conflicts in linux-next, etc).
Patches submitted to a particular maintainer should be based on (an
ancestor of) that maintainer's current tree.

Sure, test new code before and after merging linux-next, but don;t base
new code on it.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-05-23 16:46                                                                           ` [PATCH v7] " Kim Phillips
  2013-05-23 20:09                                                                             ` Nicolas Pitre
  2013-05-23 23:13                                                                             ` Russell King - ARM Linux
@ 2013-10-27  2:41                                                                             ` Nicolas Pitre
  2013-11-05 21:45                                                                               ` Kim Phillips
  2 siblings, 1 reply; 63+ messages in thread
From: Nicolas Pitre @ 2013-10-27  2:41 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Thu, 23 May 2013, Kim Phillips wrote:

> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
> 
>    text	   data	    bss	    dec	    hex	filename
> 2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
> 2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap
> 
> 6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
> 6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap
> 
> 7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
> 7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> Acked-by: David Woodhouse <David.Woodhouse@intel.com>

Did this ever go somewhere?

Russell suggested at the time to base it against a mainline kernel 
(since it was patching files which apparently were already patched with 
out-of-tree lz4 patches) and send it to his patch system.


> ---
> resending as v6 appears to have fallen though the cracks.  Russell?
> 
> v7: rebased onto next-20130521, re-ran above vmlinux sizes with
>     Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by.
> v6 and prior version information:
>     https://lkml.org/lkml/2013/2/22/475
> 
>  arch/arm/Kconfig                  |  1 +
>  arch/arm/boot/compressed/Makefile | 15 +++++++++++----
>  arch/arm/kernel/armksyms.c        |  4 ++++
>  arch/arm/lib/Makefile             |  2 +-
>  arch/arm/lib/bswapsdi2.S          | 36 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 53 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm/lib/bswapsdi2.S
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index a7fc5ea..c2fe04d 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -63,6 +63,7 @@ config ARM
>  	select OLD_SIGSUSPEND3
>  	select OLD_SIGACTION
>  	select HAVE_CONTEXT_TRACKING
> +	select ARCH_USE_BUILTIN_BSWAP
>  	help
>  	  The ARM series is a line of low-power-consumption RISC chip designs
>  	  licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 198a4ad..bd8a176 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -112,12 +112,12 @@ endif
>  
>  targets       := vmlinux vmlinux.lds \
>  		 piggy.$(suffix_y) piggy.$(suffix_y).o \
> -		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> -		 font.o font.c head.o misc.o $(OBJS)
> +		 lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> +		 bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
>  
>  # Make sure files are removed during clean
>  extra-y       += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
> -		 lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> +		 lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
>  
>  ifeq ($(CONFIG_FUNCTION_TRACER),y)
>  ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o
>  $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
>  	$(call cmd,shipped)
>  
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> +	$(call cmd,shipped)
> +
>  # We need to prevent any GOTOFF relocs being used with references
>  # to symbols in the .bss section since we cannot relocate them
>  # independently from the rest at run time.  This can be achieved by
> @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
>  fi
>  
>  $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> -		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> +		$(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> +		$(bswapsdi2) FORCE
>  	@$(check_for_multiple_zreladdr)
>  	$(call if_changed,ld)
>  	@$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
>  extern void __udivsi3(void);
>  extern void __umodsi3(void);
>  extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>  
>  extern void __aeabi_idiv(void);
>  extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
>  EXPORT_SYMBOL(__udivsi3);
>  EXPORT_SYMBOL(__umodsi3);
>  EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>  
>  #ifdef CONFIG_AEABI
>  EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
>  		   ashldi3.o ashrdi3.o lshrdi3.o muldi3.o             \
>  		   ucmpdi2.o lib1funcs.o div64.o                      \
>  		   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
> -		   call_with_stack.o
> +		   call_with_stack.o bswapsdi2.o
>  
>  mmu-y	:= clear_user.o copy_page.o getuser.o putuser.o
>  
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..2ba43a0
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> +	rev	r0, r0
> +	bx	lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	rev	r3, r0
> +	rev	r0, r1
> +	mov	r1, r3
> +	bx	lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> +	eor     r3, r0, r0, ror #16
> +	mov     r3, r3, lsr #8
> +	bic     r3, r3, #0xff00
> +	eor     r0, r3, r0, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> +	mov     ip, r1
> +	eor     r3, ip, ip, ror #16
> +	eor     r1, r0, r0, ror #16
> +	mov     r1, r1, lsr #8
> +	mov     r3, r3, lsr #8
> +	bic     r3, r3, #0xff00
> +	bic     r1, r1, #0xff00
> +	eor     r1, r1, r0, ror #8
> +	eor     r0, r3, ip, ror #8
> +	mov     pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
> -- 
> 1.8.1.5
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH v7] arm: use built-in byte swap function
  2013-10-27  2:41                                                                             ` Nicolas Pitre
@ 2013-11-05 21:45                                                                               ` Kim Phillips
  0 siblings, 0 replies; 63+ messages in thread
From: Kim Phillips @ 2013-11-05 21:45 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov,
	Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell,
	linux-arm-kernel, linux-kernel, Rob Herring

On Sat, 26 Oct 2013 22:41:34 -0400
Nicolas Pitre <nico@fluxnic.net> wrote:

> On Thu, 23 May 2013, Kim Phillips wrote:
> 
> > Enable the compiler intrinsic for byte swapping on arch ARM.  This
> > allows the compiler to detect and be able to optimize out byte
> > swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> > 4.8):
> > 
> >    text	   data	    bss	    dec	    hex	filename
> > 2840310	 123932	  61960	3026202	 2e2d1a	vmlinux-lart #orig
> > 2840152	 123932	  61960	3026044	 2e2c7c	vmlinux-lart #builtin-bswap
> > 
> > 6473120	 314840	5616016	12403976 bd4508	vmlinux-mxs #orig
> > 6472586	 314848	5616016	12403450 bd42fa	vmlinux-mxs #builtin-bswap
> > 
> > 7419872	 318372	 379556	8117800	 7bde28	vmlinux-imx_v6_v7 #orig
> > 7419170	 318364	 379556	8117090	 7bdb62	vmlinux-imx_v6_v7 #builtin-bswap
> > 
> > Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> > Reviewed-by: Nicolas Pitre <nico@linaro.org>
> > Acked-by: David Woodhouse <David.Woodhouse@intel.com>
> 
> Did this ever go somewhere?
> 
> Russell suggested at the time to base it against a mainline kernel 
> (since it was patching files which apparently were already patched with 
> out-of-tree lz4 patches) and send it to his patch system.

I'll re-base and send it to his patch system.

Thanks,

Kim


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC] arm: use built-in byte swap function
@ 2013-01-31 11:44 Woodhouse, David
  0 siblings, 0 replies; 63+ messages in thread
From: Woodhouse, David @ 2013-01-31 11:44 UTC (permalink / raw)
  To: linux, kim.phillips
  Cc: Woodhouse, David, bp, akpm, daniel.santos, rientjes, rusty,
	linux-arm-kernel, linux-kernel, robherring2

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 355 bytes --]

A quick test showed I only dropped 20 bytes from net/ipv4/built-in.o but that's with no 16-bit swap support (gcc 4.7).


--
dwmw2

(Apologies for HTML and top-posting; Android mailer is broken.)
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2013-11-05 21:50 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-29  1:30 [RFC] arm: use built-in byte swap function Kim Phillips
2013-01-29  8:35 ` Borislav Petkov
2013-01-29 16:46   ` Woodhouse, David
2013-01-29 17:42     ` Borislav Petkov
2013-01-29 17:55       ` Woodhouse, David
2013-01-29 18:10         ` Borislav Petkov
2013-01-30 10:22           ` Woodhouse, David
2013-01-31  2:09             ` Kim Phillips
2013-01-31  6:44               ` Borislav Petkov
2013-01-31  9:28               ` Russell King - ARM Linux
2013-01-31 20:59                 ` Kim Phillips
2013-01-31 21:33                   ` Borislav Petkov
2013-01-31 22:11                   ` Woodhouse, David
2013-02-01  0:37                     ` [PATCH v4] " Kim Phillips
2013-02-01 10:46                       ` Russell King - ARM Linux
2013-02-01  1:17                   ` [RFC] " Russell King - ARM Linux
2013-02-01  7:33                     ` Woodhouse, David
2013-02-06  3:04                       ` Kim Phillips
2013-02-06  9:02                         ` Woodhouse, David
2013-02-07  1:19                           ` Kim Phillips
2013-02-07 10:19                             ` Will Newton
2013-02-07 10:43                               ` Catalin Marinas
2013-02-07 18:13                             ` Russell King - ARM Linux
2013-02-08 17:25                               ` Woodhouse, David
2013-02-08 20:04                                 ` Nicolas Pitre
2013-02-08 22:40                                   ` Woodhouse, David
2013-02-08 22:47                                     ` Nicolas Pitre
2013-02-09  1:12                                       ` Kim Phillips
2013-02-09  3:16                                         ` Nicolas Pitre
2013-02-20  2:31                                           ` Kim Phillips
2013-02-20  2:38                                             ` Stephen Boyd
2013-02-20  3:17                                             ` Nicolas Pitre
2013-02-20 10:38                                               ` Woodhouse, David
2013-02-20 13:36                                                 ` Nicolas Pitre
2013-02-20 13:44                                                   ` Woodhouse, David
2013-02-20 14:06                                                     ` Nicolas Pitre
2013-02-20 14:53                                                       ` Woodhouse, David
2013-02-20 15:43                                                         ` Nicolas Pitre
2013-02-21  3:49                                                           ` Kim Phillips
2013-02-21  4:29                                                             ` Nicolas Pitre
2013-02-21  6:52                                                               ` Kim Phillips
2013-02-21 16:40                                                                 ` Nicolas Pitre
2013-02-22  2:33                                                                   ` Kim Phillips
2013-02-22  3:40                                                                     ` Nicolas Pitre
2013-02-23  1:40                                                                       ` [PATCH v6] " Kim Phillips
2013-02-23  2:40                                                                         ` Nicolas Pitre
2013-02-23 23:20                                                                         ` Woodhouse, David
2013-05-23 16:46                                                                           ` [PATCH v7] " Kim Phillips
2013-05-23 20:09                                                                             ` Nicolas Pitre
2013-05-23 23:13                                                                             ` Russell King - ARM Linux
2013-06-06 22:12                                                                               ` Russell King - ARM Linux
2013-06-06 22:23                                                                                 ` Borislav Petkov
2013-06-07  0:03                                                                                 ` Stephen Rothwell
2013-10-27  2:41                                                                             ` Nicolas Pitre
2013-11-05 21:45                                                                               ` Kim Phillips
2013-02-21 16:37                                                               ` [RFC] " Woodhouse, David
2013-02-21 17:27                                                                 ` Nicolas Pitre
2013-03-13 13:35                                             ` Woodhouse, David
2013-01-29 14:13 ` Russell King - ARM Linux
2013-01-29 14:43   ` Woodhouse, David
2013-01-29 14:53 ` Rob Herring
2013-01-29 15:10   ` Woodhouse, David
2013-01-31 11:44 Woodhouse, David

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).