* [RFC] arm: use built-in byte swap function @ 2013-01-29 1:30 Kim Phillips 2013-01-29 8:35 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kim Phillips @ 2013-01-29 1:30 UTC (permalink / raw) To: Russell King, Andrew Morton Cc: Daniel Santos, Borislav Petkov, David Rientjes, Rusty Russell, David Woodhouse, linux-arm-kernel, linux-kernel Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, e.g. in big endian to big endian moves. AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, and for the 16-bit version in 4.8. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. RFC because of unfamiliarity with arch ARM, and that at91sam9rl, at91rm9200, and lpd270 (so far, at least) builds fail with: include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 20120303 (prerelease) arch/arm/Kconfig | 1 + include/linux/compiler-gcc4.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index eda8711..437d11a 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -3,6 +3,7 @@ config ARM default y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_USE_BUILTIN_BSWAP select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT if MMU diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h index 68b162d..da5f728 100644 --- a/include/linux/compiler-gcc4.h +++ b/include/linux/compiler-gcc4.h @@ -67,7 +67,8 @@ #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP -#if GCC_VERSION >= 40400 +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ + (defined(__arm__) && GCC_VERSION >= 40600) #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ #endif -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 1:30 [RFC] arm: use built-in byte swap function Kim Phillips @ 2013-01-29 8:35 ` Borislav Petkov 2013-01-29 16:46 ` Woodhouse, David 2013-01-29 14:13 ` Russell King - ARM Linux 2013-01-29 14:53 ` Rob Herring 2 siblings, 1 reply; 62+ messages in thread From: Borislav Petkov @ 2013-01-29 8:35 UTC (permalink / raw) To: Kim Phillips Cc: Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, David Woodhouse, linux-arm-kernel, linux-kernel On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, e.g. in big endian to big endian moves. > > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > --- > akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use > GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, > currently in the akpm branch. > > RFC because of unfamiliarity with arch ARM, and that at91sam9rl, > at91rm9200, and lpd270 (so far, at least) builds fail with: > > include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' > > I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 > 20120303 (prerelease) > > arch/arm/Kconfig | 1 + > include/linux/compiler-gcc4.h | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index eda8711..437d11a 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -3,6 +3,7 @@ config ARM > default y > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > + select ARCH_USE_BUILTIN_BSWAP > select ARCH_HAVE_CUSTOM_GPIO_H > select ARCH_WANT_IPC_PARSE_VERSION > select BUILDTIME_EXTABLE_SORT if MMU > diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h > index 68b162d..da5f728 100644 > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -67,7 +67,8 @@ > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > -#if GCC_VERSION >= 40400 > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > + (defined(__arm__) && GCC_VERSION >= 40600) There should be no arch-specific stuff in a generic header. I guess you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific compiler.h header after checking compiler version... Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 8:35 ` Borislav Petkov @ 2013-01-29 16:46 ` Woodhouse, David 2013-01-29 17:42 ` Borislav Petkov 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-01-29 16:46 UTC (permalink / raw) To: Borislav Petkov Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1385 bytes --] On Tue, 2013-01-29 at 09:35 +0100, Borislav Petkov wrote: > > > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > > -#if GCC_VERSION >= 40400 > > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > > + (defined(__arm__) && GCC_VERSION >= 40600) > > There should be no arch-specific stuff in a generic header. I guess > you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific > compiler.h header after checking compiler version... If we're really going to have many different architectures depending on different versions of GCC for this (if it wasn't sane to use it from 4.4/4.8 when it got introduced, and depends on some later arch-specific optimisation), then perhaps we'll have the arch provide the corresponding required GCC_VERSION for using each of 64/32/16 bit builtins, instead of just a yes/no flag? Or just define __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? In fact we could start by having just the problematic architectures set __HAVE_BUILTIN_BSWAPxx__ for themselves according to whatever criteria they want, and then if there's scope for consolidating that in the generic code then we can do so later. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 16:46 ` Woodhouse, David @ 2013-01-29 17:42 ` Borislav Petkov 2013-01-29 17:55 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Borislav Petkov @ 2013-01-29 17:42 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote: > If we're really going to have many different architectures depending > on different versions of GCC for this (if it wasn't sane to use > it from 4.4/4.8 when it got introduced, and depends on some later > arch-specific optimisation), then perhaps we'll have the arch > provide the corresponding required GCC_VERSION for using each of > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? Damn, there's already the __powerpc__ thing in there. Yeah, something like defininig __HAVE_BUILTIN_BSWAPxx__ makes sense and can keep the header arch-agnostic without growing all those different arch defines. But I liked your other suggestion better to get the offending compilers fixed. I dunno though, how generically is stuff like that getting implemented for every arch so probably single arches doing __HAVE* defines is probably going to be the realizable solution in the end. Hmmm. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 17:42 ` Borislav Petkov @ 2013-01-29 17:55 ` Woodhouse, David 2013-01-29 18:10 ` Borislav Petkov 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-01-29 17:55 UTC (permalink / raw) To: Borislav Petkov Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1508 bytes --] On Tue, 2013-01-29 at 18:42 +0100, Borislav Petkov wrote: > On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote: > > If we're really going to have many different architectures depending > > on different versions of GCC for this (if it wasn't sane to use > > it from 4.4/4.8 when it got introduced, and depends on some later > > arch-specific optimisation), then perhaps we'll have the arch > > provide the corresponding required GCC_VERSION for using each of > > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define > > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps? > > Damn, there's already the __powerpc__ thing in there. That's different though. That's because GCC didn't have a generic __builtin_bswap16() until 4.8, while PowerPC got it in 4.6. That's a relatively simple and manageable one-off arch-dependency. But once we get into a mass of "well, it wasn't actually *usable* for ARM until 4.9", we start wanting to push it into arch code. > But I liked your other suggestion better to get the offending > compilers fixed. That wasn't an *alternative*. It's required if the compiler is doing something suboptimal, whatever happens. And then we start to *use* the compiler from the first version that's known to be fixed. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 17:55 ` Woodhouse, David @ 2013-01-29 18:10 ` Borislav Petkov 2013-01-30 10:22 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Borislav Petkov @ 2013-01-29 18:10 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel On Tue, Jan 29, 2013 at 05:55:54PM +0000, Woodhouse, David wrote: > That's different though. That's because GCC didn't have a generic > __builtin_bswap16() until 4.8, while PowerPC got it in 4.6. > > That's a relatively simple and manageable one-off arch-dependency. But > once we get into a mass of "well, it wasn't actually *usable* for ARM > until 4.9", we start wanting to push it into arch code. Yeah, and once people see one arch mentioned, all of a sudden they think it is ok to add just one more ... > > But I liked your other suggestion better to get the offending > > compilers fixed. > > That wasn't an *alternative*. It's required if the compiler is doing > something suboptimal, whatever happens. And then we start to *use* the > compiler from the first version that's known to be fixed. So, IMHO it sounds to me like we want to explicitly state for each arch separately that it is ok to use the __builtin_bswapXX things. This also takes care of the case where the compiler is doing something suboptimal by excluding the affected versions. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 18:10 ` Borislav Petkov @ 2013-01-30 10:22 ` Woodhouse, David 2013-01-31 2:09 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-01-30 10:22 UTC (permalink / raw) To: Borislav Petkov Cc: Kim Phillips, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1370 bytes --] On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote: > So, IMHO it sounds to me like we want to explicitly state for each arch > separately that it is ok to use the __builtin_bswapXX things. This also > takes care of the case where the compiler is doing something suboptimal > by excluding the affected versions. Well, if it really does end up being different for every architecture, then that means I probably made the wrong decision when I chose to make it "generic", and override the __arch_swabXX() macros. I could have just pushed all the architectures to use the builtins in their __arch_swabXX macros instead, as appropriate. Let's see how many special cases we actually end up with, and perhaps we'll end up switching that round. For now, let's just make ARM set __HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>, according to whatever criteria it needs. It's not entirely clear how much of a win it is on ARM anyway; we don't have load-and-swap or store-and-swap instructions so there are only a few added opportunities for optimisation that we get by letting the compiler see what's going on. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-30 10:22 ` Woodhouse, David @ 2013-01-31 2:09 ` Kim Phillips 2013-01-31 6:44 ` Borislav Petkov 2013-01-31 9:28 ` Russell King - ARM Linux 0 siblings, 2 replies; 62+ messages in thread From: Kim Phillips @ 2013-01-31 2:09 UTC (permalink / raw) To: Woodhouse, David Cc: Borislav Petkov, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 30 Jan 2013 10:22:15 +0000 "Woodhouse, David" <david.woodhouse@intel.com> wrote: > On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote: > > So, IMHO it sounds to me like we want to explicitly state for each arch > > separately that it is ok to use the __builtin_bswapXX things. This also > > takes care of the case where the compiler is doing something suboptimal > > by excluding the affected versions. > > Well, if it really does end up being different for every architecture, > then that means I probably made the wrong decision when I chose to make > it "generic", and override the __arch_swabXX() macros. I could have just > pushed all the architectures to use the builtins in their __arch_swabXX > macros instead, as appropriate. > > Let's see how many special cases we actually end up with, and perhaps > we'll end up switching that round. For now, let's just make ARM set > __HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>, > according to whatever criteria it needs. thanks - I've attempted to do this - see v2 patch below. > It's not entirely clear how much of a win it is on ARM anyway; we don't > have load-and-swap or store-and-swap instructions so there are only a > few added opportunities for optimisation that we get by letting the > compiler see what's going on. I've added some text size figures to the patch description. They are indeed very small, but it should help drivers for big-endian devices. >From 31df859be202f00d017b707649e7709281994d15 Mon Sep 17 00:00:00 2001 From: Kim Phillips <kim.phillips@freescale.com> Date: Mon, 28 Jan 2013 19:30:33 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, e.g. in big endian to big endian moves. AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, and for the 16-bit version in 4.8. This has a tiny benefit on vmlinux text size: multi_v7_defconfig: text data bss dec hex filename 3135208 188396 203344 3526948 35d124 vmlinux multi_v7_defconfig with builtin_bswap: 3135112 188396 203344 3526852 35d0c4 vmlinux exynos_defconfig: text data bss dec hex filename 4286605 360564 223172 4870341 4a50c5 vmlinux exynos_defconfig with builtin_bswap: text data bss dec hex filename 4286405 360564 223172 4870141 4a4ffd vmlinux The savings come mostly from device-tree related code, and some from drivers. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/Kconfig | 10 ++++++++++ arch/arm/Kconfig | 2 ++ arch/arm/include/uapi/asm/swab.h | 10 ++++++++++ include/linux/compiler-gcc4.h | 3 ++- 4 files changed, 24 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 40e2b12..c8798b9 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP instructions should set this. And it shouldn't hurt to set it on architectures that don't have such instructions. +config ARCH_DEFINES_BUILTIN_BSWAP + depends on ARCH_USE_BUILTIN_BSWAP + bool + help + ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx + definitions over those in the generic compiler headers. It + can be dependent on a combination of byte swapping instruction + availability, the instruction set version, and the state + of support in different compiler versions. + config HAVE_SYSCALL_WRAPPERS bool diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 73027aa..b5868c2 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -57,6 +57,8 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP + select ARCH_DEFINES_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/include/uapi/asm/swab.h b/arch/arm/include/uapi/asm/swab.h index 6fcb32a..5d86ed0 100644 --- a/arch/arm/include/uapi/asm/swab.h +++ b/arch/arm/include/uapi/asm/swab.h @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) #endif +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6 +#if GCC_VERSION >= 40600 +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#endif +#if GCC_VERSION >= 40800 +#define __HAVE_BUILTIN_BSWAP16__ +#endif +#endif + #endif /* _UAPI__ASM_ARM_SWAB_H */ diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h index 68b162d..fce39cb 100644 --- a/include/linux/compiler-gcc4.h +++ b/include/linux/compiler-gcc4.h @@ -66,7 +66,8 @@ #endif -#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP +#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \ + !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP) #if GCC_VERSION >= 40400 #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 2:09 ` Kim Phillips @ 2013-01-31 6:44 ` Borislav Petkov 2013-01-31 9:28 ` Russell King - ARM Linux 1 sibling, 0 replies; 62+ messages in thread From: Borislav Petkov @ 2013-01-31 6:44 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote: > diff --git a/arch/Kconfig b/arch/Kconfig > index 40e2b12..c8798b9 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP > instructions should set this. And it shouldn't hurt to set it > on architectures that don't have such instructions. > > +config ARCH_DEFINES_BUILTIN_BSWAP > + depends on ARCH_USE_BUILTIN_BSWAP > + bool > + help > + ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx This is what threw me off: if ARCH selects this, I don't think we want to have a help text and the option be user visible? Normally such options are only bool and are selected automatically by hm.. ARCH, as you say above and do so below. :-) > + definitions over those in the generic compiler headers. It > + can be dependent on a combination of byte swapping instruction > + availability, the instruction set version, and the state > + of support in different compiler versions. > + > config HAVE_SYSCALL_WRAPPERS > bool > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 73027aa..b5868c2 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -57,6 +57,8 @@ config ARM > select CLONE_BACKWARDS > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > + select ARCH_USE_BUILTIN_BSWAP > + select ARCH_DEFINES_BUILTIN_BSWAP > help Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 2:09 ` Kim Phillips 2013-01-31 6:44 ` Borislav Petkov @ 2013-01-31 9:28 ` Russell King - ARM Linux 2013-01-31 20:59 ` Kim Phillips 1 sibling, 1 reply; 62+ messages in thread From: Russell King - ARM Linux @ 2013-01-31 9:28 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote: > The savings come mostly from device-tree related code, and some > from drivers. You forget that IP networking is all big endian, so these will be using the byte swapping too (search it for htons/ntohs/htonl/ntohl). > v2: > - at91 and lpd270 builds fixed by limiting to ARMv6 and above > (i.e., ARM cores that have support for the 'rev' instruction). > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit swap, so I doubt that any later compiler does. > --- a/arch/arm/include/uapi/asm/swab.h > +++ b/arch/arm/include/uapi/asm/swab.h > @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) > > #endif > > +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6 > +#if GCC_VERSION >= 40600 > +#define __HAVE_BUILTIN_BSWAP32__ > +#define __HAVE_BUILTIN_BSWAP64__ > +#endif > +#if GCC_VERSION >= 40800 > +#define __HAVE_BUILTIN_BSWAP16__ > +#endif > +#endif If this is __KERNEL__ only, it should not be in a uapi header. UAPI headers get exported to userland, this is not userland interface code. IT should be in arch/arm/include/asm/swab.h ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 9:28 ` Russell King - ARM Linux @ 2013-01-31 20:59 ` Kim Phillips 2013-01-31 21:33 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kim Phillips @ 2013-01-31 20:59 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 31 Jan 2013 09:28:01 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote: > > The savings come mostly from device-tree related code, and some > > from drivers. > > You forget that IP networking is all big endian, so these will be using > the byte swapping too (search it for htons/ntohs/htonl/ntohl). As David mentioned, there isn't much gain from net/ code. > > v2: > > - at91 and lpd270 builds fixed by limiting to ARMv6 and above > > (i.e., ARM cores that have support for the 'rev' instruction). > > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on > > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). > > Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit > swap, so I doubt that any later compiler does. I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to a 4.5.x, I'll try that, too, but as it stands now, if one moves the code added to swab.h below outside of its armv6 protection, gcc adds calls to __bswapsi2. > > --- a/arch/arm/include/uapi/asm/swab.h > > +++ b/arch/arm/include/uapi/asm/swab.h > > @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) > > > > #endif > > > > +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6 > > +#if GCC_VERSION >= 40600 > > +#define __HAVE_BUILTIN_BSWAP32__ > > +#define __HAVE_BUILTIN_BSWAP64__ > > +#endif > > +#if GCC_VERSION >= 40800 > > +#define __HAVE_BUILTIN_BSWAP16__ > > +#endif > > +#endif > > If this is __KERNEL__ only, it should not be in a uapi header. UAPI > headers get exported to userland, this is not userland interface code. > IT should be in arch/arm/include/asm/swab.h right, I've fixed this and Boris' remove the help text comment, and made a v3: >From 18c86580efba42d2680f2947867722705292f80a Mon Sep 17 00:00:00 2001 From: Kim Phillips <kim.phillips@freescale.com> Date: Mon, 28 Jan 2013 19:30:33 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, e.g. in big endian to big endian moves. A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over those in the generic compiler headers. It can be dependent on a combination of byte swapping instruction availability, the instruction set version, and the state of support in different compiler versions. AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, and for the 16-bit version in 4.8. This has a tiny benefit on vmlinux text size (gcc 4.6.4): multi_v7_defconfig: text data bss dec hex filename 3135208 188396 203344 3526948 35d124 vmlinux multi_v7_defconfig with builtin_bswap: text data bss dec hex filename 3135112 188396 203344 3526852 35d0c4 vmlinux exynos_defconfig: text data bss dec hex filename 4286605 360564 223172 4870341 4a50c5 vmlinux exynos_defconfig with builtin_bswap: text data bss dec hex filename 4286405 360564 223172 4870141 4a4ffd vmlinux The savings come mostly from device-tree related code, and some from drivers. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. v3: - moved out of uapi swab.h into arch/arm/include/asm/swab.h - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/Kconfig | 4 ++++ arch/arm/Kconfig | 2 ++ arch/arm/include/asm/swab.h | 8 ++++++++ include/linux/compiler-gcc4.h | 3 ++- 4 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 40e2b12..bc5ed77 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -141,6 +141,10 @@ config ARCH_USE_BUILTIN_BSWAP instructions should set this. And it shouldn't hurt to set it on architectures that don't have such instructions. +config ARCH_DEFINES_BUILTIN_BSWAP + depends on ARCH_USE_BUILTIN_BSWAP + bool + config HAVE_SYSCALL_WRAPPERS bool diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 73027aa..b5868c2 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -57,6 +57,8 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP + select ARCH_DEFINES_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h index 537fc9b..e56acff 100644 --- a/arch/arm/include/asm/swab.h +++ b/arch/arm/include/asm/swab.h @@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) } #define __arch_swab32 __arch_swab32 +#if GCC_VERSION >= 40600 +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#if GCC_VERSION >= 40800 +#define __HAVE_BUILTIN_BSWAP16__ +#endif +#endif + #endif #endif diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h index 68b162d..fce39cb 100644 --- a/include/linux/compiler-gcc4.h +++ b/include/linux/compiler-gcc4.h @@ -66,7 +66,8 @@ #endif -#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP +#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \ + !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP) #if GCC_VERSION >= 40400 #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ -- 1.7.9.7 Thanks, Kim [1] http://kernel.org/pub/tools/crosstool/files/bin/x86_64/ [2] http://ftp.denx.de/pub/eldk/5.2.1/targets/armv7a/ ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 20:59 ` Kim Phillips @ 2013-01-31 21:33 ` Borislav Petkov 2013-01-31 22:11 ` Woodhouse, David 2013-02-01 1:17 ` [RFC] " Russell King - ARM Linux 2 siblings, 0 replies; 62+ messages in thread From: Borislav Petkov @ 2013-01-31 21:33 UTC (permalink / raw) To: Kim Phillips Cc: Russell King - ARM Linux, Woodhouse, David, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote: > - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). > - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx > - not too sure about this having to be a new CONFIG_, but it's hard > to find a place for it given linux/compiler.h doesn't include any > arch-specific files. Yeah, me neither. It seems to me the whole deal can be simplified even further without introducing CONFIG_ARCH_DEFINES_BUILTIN_BSWAP. And, we don't even want to use CONFIG_ARCH_USE_BUILTIN_BSWAP on arm due to different compiler versions supporting it (correct me if I'm wrong here) vs the generic thing in include/linux/compiler-gcc4.h which we want off. If so, you'd need to simply put the following from below in arch/arm/include/asm/swab.h #if GCC_VERSION >= 40600 #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ #if GCC_VERSION >= 40800 #define __HAVE_BUILTIN_BSWAP16__ #endif /* GCC_VERSION >= 40800 */ #endif /* GCC_VERSION >= 40600 */ and that's it. Makes sense or am I over-simplifying this? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 20:59 ` Kim Phillips 2013-01-31 21:33 ` Borislav Petkov @ 2013-01-31 22:11 ` Woodhouse, David 2013-02-01 0:37 ` [PATCH v4] " Kim Phillips 2013-02-01 1:17 ` [RFC] " Russell King - ARM Linux 2 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-01-31 22:11 UTC (permalink / raw) To: Kim Phillips Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 797 bytes --] On Thu, 2013-01-31 at 14:59 -0600, Kim Phillips wrote: > > - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). Ick, no. > - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx It won't do that anyway if !ARCH_USE_BUILTIN_BSWAP. I don't see the point in adding a new config option just for this. If you want to define __HAVE_BUILTIN_BSWAPxx__ for yourself manually, just go ahead and do so. As I said, if lots of architectures end up doing it then we'll worry about cleaning things up when we've got a better picture of who needs what. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v4] arm: use built-in byte swap function 2013-01-31 22:11 ` Woodhouse, David @ 2013-02-01 0:37 ` Kim Phillips 2013-02-01 10:46 ` Russell King - ARM Linux 0 siblings, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-01 0:37 UTC (permalink / raw) To: Woodhouse, David Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring >From 1490bd8823c05e0dda982524bb70cb6c6427ddf9 Mon Sep 17 00:00:00 2001 From: Kim Phillips <kim.phillips@freescale.com> Date: Mon, 28 Jan 2013 19:30:33 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, e.g. in big endian to big endian moves. A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over those in the generic compiler headers. It can be dependent on a combination of byte swapping instruction availability, the instruction set version, and the state of support in different compiler versions. AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, and for the 16-bit version in 4.8. This has a tiny benefit on vmlinux text size (gcc 4.6.4): multi_v7_defconfig: text data bss dec hex filename 3135208 188396 203344 3526948 35d124 vmlinux multi_v7_defconfig with builtin_bswap: text data bss dec hex filename 3135112 188396 203344 3526852 35d0c4 vmlinux exynos_defconfig: text data bss dec hex filename 4286605 360564 223172 4870341 4a50c5 vmlinux exynos_defconfig with builtin_bswap: text data bss dec hex filename 4286405 360564 223172 4870141 4a4ffd vmlinux The savings come mostly from device-tree related code, and some from drivers. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. v4: - undo v2-3's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris and David - patch is much less intrusive :) v3: - moved out of uapi swab.h into arch/arm/include/asm/swab.h - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/arm/include/asm/swab.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h index 537fc9b..e56acff 100644 --- a/arch/arm/include/asm/swab.h +++ b/arch/arm/include/asm/swab.h @@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) } #define __arch_swab32 __arch_swab32 +#if GCC_VERSION >= 40600 +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#if GCC_VERSION >= 40800 +#define __HAVE_BUILTIN_BSWAP16__ +#endif +#endif + #endif #endif -- 1.7.9.7 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v4] arm: use built-in byte swap function 2013-02-01 0:37 ` [PATCH v4] " Kim Phillips @ 2013-02-01 10:46 ` Russell King - ARM Linux 0 siblings, 0 replies; 62+ messages in thread From: Russell King - ARM Linux @ 2013-02-01 10:46 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, Jan 31, 2013 at 06:37:47PM -0600, Kim Phillips wrote: > +#if GCC_VERSION >= 40600 > +#define __HAVE_BUILTIN_BSWAP32__ > +#define __HAVE_BUILTIN_BSWAP64__ You really aren't listening to anything that's been said to you. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-31 20:59 ` Kim Phillips 2013-01-31 21:33 ` Borislav Petkov 2013-01-31 22:11 ` Woodhouse, David @ 2013-02-01 1:17 ` Russell King - ARM Linux 2013-02-01 7:33 ` Woodhouse, David 2 siblings, 1 reply; 62+ messages in thread From: Russell King - ARM Linux @ 2013-02-01 1:17 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote: > On Thu, 31 Jan 2013 09:28:01 +0000 > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote: > > > v2: > > > - at91 and lpd270 builds fixed by limiting to ARMv6 and above > > > (i.e., ARM cores that have support for the 'rev' instruction). > > > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on > > > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). > > > > Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit > > swap, so I doubt that any later compiler does. > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to > a 4.5.x, I'll try that, too, but as it stands now, if one moves the > code added to swab.h below outside of its armv6 protection, > gcc adds calls to __bswapsi2. Take a look at the message I sent on the 29th towards the beginning of this thread for details of gcc 4.5.4 behaviour. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-01 1:17 ` [RFC] " Russell King - ARM Linux @ 2013-02-01 7:33 ` Woodhouse, David 2013-02-06 3:04 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-01 7:33 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 1040 bytes --] On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote: > > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the > > code added to swab.h below outside of its armv6 protection, > > gcc adds calls to __bswapsi2. > > Take a look at the message I sent on the 29th towards the beginning of > this thread for details of gcc 4.5.4 behaviour. I'd like to see a comment (with PR# if appropriate) explaining clearly *why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler. Russell's test also seemed to indicate that the 32-bit and 64-bit swap support was present and functional in GCC 4.5.4 (as indeed it should have been since 4.4), so I'm still not quite sure why you require 4.6 for that. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-01 7:33 ` Woodhouse, David @ 2013-02-06 3:04 ` Kim Phillips 2013-02-06 9:02 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-06 3:04 UTC (permalink / raw) To: Woodhouse, David Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 1 Feb 2013 07:33:17 +0000 "Woodhouse, David" <david.woodhouse@intel.com> wrote: > On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote: > > > > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to > > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the > > > code added to swab.h below outside of its armv6 protection, > > > gcc adds calls to __bswapsi2. > > > > Take a look at the message I sent on the 29th towards the beginning of > > this thread for details of gcc 4.5.4 behaviour. > > I'd like to see a comment (with PR# if appropriate) explaining clearly > *why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler. ok I think I've figured it out: the difference in the defconfigs that fail (at91_dt, at91sam9g45, and lpc32xx) is that they are ARM9's (armv4/5), have CC_OPTIMIZE_FOR_SIZE set, and have code with multiple swaps ready for space optimization: gcc -Os emits calls to __bswapsi2 on those platforms to save space because they don't have the single rev byte swap instruction. > Russell's test also seemed to indicate that the 32-bit and 64-bit swap > support was present and functional in GCC 4.5.4 (as indeed it should > have been since 4.4), so I'm still not quite sure why you require 4.6 > for that. initially it was based at looking at gcc commit history for the 'rev' instruction implementation, but now I've got 4.4, 4.5, 4.6 and 4.7 compilers to perform Russell's test: $ for cc in 4.4 4.5 4.6 4.7; do \ arm-linux-gnueabi-gcc-$cc --version | grep gcc ; \ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \ echo -n $a:; \ for f in 16 32 64; do \ echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | arm-linux-gnueabi-gcc-$cc -w -x c -S -o - - -march=$a | grep 'bl'; \ done; \ done; \ done whose output is: arm-linux-gnueabi-gcc-4.4 (Ubuntu/Linaro 4.4.7-1ubuntu2) 4.4.7 armv3: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv4: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv4t: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv5t: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv5te: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv6k: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv6: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv7-a: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3 armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 arm-linux-gnueabi-gcc-4.6 (Ubuntu/Linaro 4.6.3-8ubuntu1) 4.6.3 20120624 (prerelease) armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 arm-linux-gnueabi-gcc-4.7 (Ubuntu/Linaro 4.7.2-1ubuntu1) 4.7.2 armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 So 4.4 should be exempt from using the built-ins because it always emits __bswapsi2 calls: it doesn't matter whether or not -Os or -O2 are added as options in the test. gcc 4.5, 4.6, and 4.7 all support 32 & 64-bit versions, so we should check for gcc >= 4.5 instead of gcc >= 4.6. I've added a new check for !CC_OPTIMIZE_FOR_SIZE and build-tested all defconfigs with gcc 4.6.3 - here's v5: >From 11aa942a84fe94d204424a19b6b13fdb2b359ee6 Mon Sep 17 00:00:00 2001 From: Kim Phillips <kim.phillips@freescale.com> Date: Mon, 28 Jan 2013 19:30:33 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings. __builtin_bswap{32,64} support was added in gcc 4.4, but until gcc 4.5, it emitted calls to libgcc's __bswap[sd]i2 (even with -O2). All gcc versions tested (4.[4567]) emit calls to __bswap[sd]i2 when optimizing for size, so we add the !CC_OPTIMIZE_FOR_SIZE check. Support for 16-bit built-ins will be in gcc version 4.8. This has a tiny benefit on vmlinux text size (gcc 4.6.4): multi_v7_defconfig: text data bss dec hex filename 3135208 188396 203344 3526948 35d124 vmlinux multi_v7_defconfig with builtin_bswap: text data bss dec hex filename 3135112 188396 203344 3526852 35d0c4 vmlinux exynos_defconfig: text data bss dec hex filename 4286605 360564 223172 4870341 4a50c5 vmlinux exynos_defconfig with builtin_bswap: text data bss dec hex filename 4286405 360564 223172 4870141 4a4ffd vmlinux The savings come mostly from device-tree related code, and some from drivers. Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, currently in the akpm branch. v5: re-work based on new gcc version test data: - moved outside armv6 protection - check for gcc 4.6+ demoted to gcc 4.5+ with: !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) v4: - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris and David - object is to find arches that define _HAVE_BSWAP and clean it up in the future: patch is much less intrusive. :) v3: - moved out of uapi swab.h into arch/arm/include/asm/swab.h - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/arm/include/asm/swab.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h index 537fc9b..159ab16 100644 --- a/arch/arm/include/asm/swab.h +++ b/arch/arm/include/asm/swab.h @@ -35,4 +35,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) #define __arch_swab32 __arch_swab32 #endif + +#if !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) && GCC_VERSION >= 40500 +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#if GCC_VERSION >= 40800 +#define __HAVE_BUILTIN_BSWAP16__ +#endif +#endif + #endif -- 1.7.9.7 Thanks, Kim [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44392 [2] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-06 3:04 ` Kim Phillips @ 2013-02-06 9:02 ` Woodhouse, David 2013-02-07 1:19 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-06 9:02 UTC (permalink / raw) To: Kim Phillips Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 689 bytes --] On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote: > gcc -Os emits calls to __bswapsi2 on those platforms to save space > because they don't have the single rev byte swap instruction. Is that the right thing for GCC to do in that situation? If so, perhaps we should be *providing* __bswap[sd]i2 functions for it to use? If not, perhaps there should be a PR filed? Or is our use case justifiably different to the general case of '-Os'? If so, why? -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-06 9:02 ` Woodhouse, David @ 2013-02-07 1:19 ` Kim Phillips 2013-02-07 10:19 ` Will Newton 2013-02-07 18:13 ` Russell King - ARM Linux 0 siblings, 2 replies; 62+ messages in thread From: Kim Phillips @ 2013-02-07 1:19 UTC (permalink / raw) To: Woodhouse, David Cc: Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 6 Feb 2013 09:02:04 +0000 "Woodhouse, David" <david.woodhouse@intel.com> wrote: > On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote: > > gcc -Os emits calls to __bswapsi2 on those platforms to save space > > because they don't have the single rev byte swap instruction. > > Is that the right thing for GCC to do in that situation? if it saves space, why wouldn't it be? "Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted." [1] I see "(arm_arch6 || !optimize_size)" in gcc's define_expand "bswapsi2" source, so GCC considers size optimization as a legitimate one of those cases. > If so, perhaps we should be *providing* __bswap[sd]i2 functions for it > to use? either that, or link with libgcc - why does arch/arm64 do this and arch/arm not? It's not obvious from git log. > If not, perhaps there should be a PR filed? > > Or is our use case justifiably different to the general case of '-Os'? > If so, why? shouldn't be - a patch, such as this, that claims to reduce code size, and that only turns on the new built-in when CC_OPTIMIZE_FOR_SIZE is off, is generally not good :) OTOH, the target here is armv6+ performance - not armv4,5 code density - the OPTIMIZE_FOR_SIZE protection prevents armv4,5 build breakage. Kim [1] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-07 1:19 ` Kim Phillips @ 2013-02-07 10:19 ` Will Newton 2013-02-07 10:43 ` Catalin Marinas 2013-02-07 18:13 ` Russell King - ARM Linux 1 sibling, 1 reply; 62+ messages in thread From: Will Newton @ 2013-02-07 10:19 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <kim.phillips@freescale.com> wrote: > On Wed, 6 Feb 2013 09:02:04 +0000 > "Woodhouse, David" <david.woodhouse@intel.com> wrote: > >> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote: >> > gcc -Os emits calls to __bswapsi2 on those platforms to save space >> > because they don't have the single rev byte swap instruction. >> >> Is that the right thing for GCC to do in that situation? > > if it saves space, why wouldn't it be? > > "Many of these functions are only optimized in certain cases; if they > are not optimized in a particular case, a call to the library > function is emitted." [1] > > I see "(arm_arch6 || !optimize_size)" in gcc's define_expand > "bswapsi2" source, so GCC considers size optimization as a > legitimate one of those cases. > >> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it >> to use? > > either that, or link with libgcc - why does arch/arm64 do this and > arch/arm not? It's not obvious from git log. One reason I have found, I don't know if it is the canonical one, is that linking with libgcc allows people to use all intrinsics e.g. soft float routines in the kernel without noticing it. If you limit the intrinsics to the ones linked into the kernel explicitly then this cannot happen. I have also seen cases where the libgcc intrinsics are improved over time, having the code in the kernel allows these improvements to be rolled into the kernel even if the user has an older toolchain. A number of ports link against libgcc and a roughly equal number do not, so it isn't clear that there's any consensus on the issue. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-07 10:19 ` Will Newton @ 2013-02-07 10:43 ` Catalin Marinas 0 siblings, 0 replies; 62+ messages in thread From: Catalin Marinas @ 2013-02-07 10:43 UTC (permalink / raw) To: Will Newton Cc: Kim Phillips, Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On 7 February 2013 10:19, Will Newton <will.newton@gmail.com> wrote: > On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <kim.phillips@freescale.com> wrote: >> On Wed, 6 Feb 2013 09:02:04 +0000 >> "Woodhouse, David" <david.woodhouse@intel.com> wrote: >> >>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote: >>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space >>> > because they don't have the single rev byte swap instruction. >>> >>> Is that the right thing for GCC to do in that situation? >> >> if it saves space, why wouldn't it be? >> >> "Many of these functions are only optimized in certain cases; if they >> are not optimized in a particular case, a call to the library >> function is emitted." [1] >> >> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand >> "bswapsi2" source, so GCC considers size optimization as a >> legitimate one of those cases. >> >>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it >>> to use? >> >> either that, or link with libgcc - why does arch/arm64 do this and >> arch/arm not? It's not obvious from git log. > > One reason I have found, I don't know if it is the canonical one, is > that linking with libgcc allows people to use all intrinsics e.g. soft > float routines in the kernel without noticing it. If you limit the > intrinsics to the ones linked into the kernel explicitly then this > cannot happen. For arm64 we explicitly pass -mgeneral-regs-only to avoid any floating point generation. Soft-float is excluded by the ABI automatically. But we use other compiler intrinsics like __ffs and while they are currently generated inline, you can't guarantee, hence the linking with libgcc. > I have also seen cases where the libgcc intrinsics are improved over > time, having the code in the kernel allows these improvements to be > rolled into the kernel even if the user has an older toolchain. Indeed, the gcc guys do a lot benchmarking/optimisations on a wide range of processors, so we can take advantage of that in the kernel. But it's much easier on arm64 since the architecture is stable. On 32-bit arm we have to cope with a range of architecture versions with variations to the instruction set. -- Catalin ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-07 1:19 ` Kim Phillips 2013-02-07 10:19 ` Will Newton @ 2013-02-07 18:13 ` Russell King - ARM Linux 2013-02-08 17:25 ` Woodhouse, David 1 sibling, 1 reply; 62+ messages in thread From: Russell King - ARM Linux @ 2013-02-07 18:13 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, Feb 06, 2013 at 07:19:05PM -0600, Kim Phillips wrote: > either that, or link with libgcc - why does arch/arm64 do this and > arch/arm not? It's not obvious from git log. We want to be in control of what code the kernel runs, and that means not using libgcc. libgcc can do all sorts of stuff because it makes assumptions about the environment which it is executing in (glibc). For example, it assumes there may be a GOT, and that it can issue SWI calls... However, the biggest reason not to use libgcc is that we want to control what gets used in the kernel - for example, no floating point, and no use of 64 x 64bit division. So all round, using libgcc in AArch32 would bring us a number of issues that we don't get with the current approach. AArch64 uses it because it's maintained entirely separately and the decision has been made by other people - and some of these concerns do not exist on a 64-bit architecture. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-07 18:13 ` Russell King - ARM Linux @ 2013-02-08 17:25 ` Woodhouse, David 2013-02-08 20:04 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-08 17:25 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 754 bytes --] On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > However, the biggest reason not to use libgcc is that we want to control > what gets used in the kernel - for example, no floating point, and no > use of 64 x 64bit division. Which is all very sensible. But there's no particular reason we couldn't add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. If we're compiling with -Os on platforms without 'rev' and that's the thing that makes most sense, then we should make it work. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-08 17:25 ` Woodhouse, David @ 2013-02-08 20:04 ` Nicolas Pitre 2013-02-08 22:40 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-08 20:04 UTC (permalink / raw) To: Woodhouse, David Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 8 Feb 2013, Woodhouse, David wrote: > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > However, the biggest reason not to use libgcc is that we want to control > > what gets used in the kernel - for example, no floating point, and no > > use of 64 x 64bit division. > > Which is all very sensible. But there's no particular reason we couldn't > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. Absolutely. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-08 20:04 ` Nicolas Pitre @ 2013-02-08 22:40 ` Woodhouse, David 2013-02-08 22:47 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-08 22:40 UTC (permalink / raw) To: Nicolas Pitre Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 869 bytes --] On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > what gets used in the kernel - for example, no floating point, and no > > > use of 64 x 64bit division. > > > > Which is all very sensible. But there's no particular reason we couldn't > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > Absolutely. And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other architectures do, right? -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-08 22:40 ` Woodhouse, David @ 2013-02-08 22:47 ` Nicolas Pitre 2013-02-09 1:12 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-08 22:47 UTC (permalink / raw) To: Woodhouse, David Cc: Russell King - ARM Linux, Kim Phillips, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 8 Feb 2013, Woodhouse, David wrote: > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > > what gets used in the kernel - for example, no floating point, and no > > > > use of 64 x 64bit division. > > > > > > Which is all very sensible. But there's no particular reason we couldn't > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > > > Absolutely. > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other > architectures do, right? If that turns out to be beneficial over what we have now, then yes. I didn't read back the whole thread to form an opinion though. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-08 22:47 ` Nicolas Pitre @ 2013-02-09 1:12 ` Kim Phillips 2013-02-09 3:16 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-09 1:12 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 8 Feb 2013 17:47:33 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > > > what gets used in the kernel - for example, no floating point, and no > > > > > use of 64 x 64bit division. > > > > > > > > Which is all very sensible. But there's no particular reason we couldn't > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > > > > > Absolutely. > > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other > > architectures do, right? > > If that turns out to be beneficial over what we have now, then yes. > I didn't read back the whole thread to form an opinion though. The diff below implements __bswap[sd]i2 in arch/arm/lib, and results in the following savings in vmlinux size: column 1: name of defconfig column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3 column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3 column 4: col. 3 - col. 2 (ie., -ve numbers represent savings) acs5k_tiny_defconfig: 2886048 2886016 -32 ag5evm_defconfig: 2767596 2767564 -32 am200epdkit_defconfig: 3283557 3283557 0 ap4evb_defconfig: 2059540 2059524 -16 armadillo800eva_defconfig: 4521440 4521200 -240 assabet_defconfig: 3562923 3562603 -320 at91_dt_defconfig: 4195020 4189164 -5856 at91rm9200_defconfig: 5804566 5803742 -824 at91sam9261_defconfig: 4726625 4725969 -656 at91sam9263_defconfig: 5115407 5114459 -948 at91sam9g20_defconfig: 4175306 4175138 -168 at91x40_defconfig: 1344264 1344256 -8 badge4_defconfig: 3359661 3359485 -176 bcm_defconfig: 3575901 3575757 -144 bonito_defconfig: 2221729 2221681 -48 cerfcube_defconfig: 3063610 3063370 -240 cns3420vb_defconfig: 2595667 2595651 -16 collie_defconfig: 3015148 3014972 -176 da8xx_omapl_defconfig: 4168517 4168165 -352 davinci_all_defconfig: 4060434 4060110 -324 dove_defconfig: 4940257 4939245 -1012 ebsa110_defconfig: 3236795 3236587 -208 ep93xx_defconfig: 4169466 4168986 -480 eseries_pxa_defconfig: 3090656 3090448 -208 exynos4_defconfig: 3008671 3008639 -32 ezx_defconfig: 10042035 10041835 -200 footbridge_defconfig: 3662497 3662369 -128 h3600_defconfig: 3471140 3470820 -320 h5000_defconfig: 2976046 2976030 -16 h7201_defconfig: 2158753 2158705 -48 h7202_defconfig: 3395888 3395520 -368 hackkit_defconfig: 3128690 3128578 -112 imote2_defconfig: 9916412 9916180 -232 imx_v4_v5_defconfig: 6866952 6866272 -680 imx_v6_v7_defconfig: 7672373 7667089 -5284 integrator_defconfig: 4224588 4224284 -304 iop13xx_defconfig: 5229604 5228468 -1136 iop32x_defconfig: 5616676 5615492 -1184 iop33x_defconfig: 4611219 4610835 -384 ixp4xx_defconfig: 4566094 4564238 -1856 jornada720_defconfig: 3140569 3140345 -224 kota2_defconfig: 3971280 3971280 0 kzm9d_defconfig: 3042784 3042784 0 kzm9g_defconfig: 4728939 4728907 -32 lart_defconfig: 2941150 2941054 -96 lpc32xx_defconfig: 5009215 5004619 -4596 mackerel_defconfig: 4687162 4686922 -240 mini2440_defconfig: 5095413 5095037 -376 msm_defconfig: 2920312 2920224 -88 multi_v7_defconfig: 3511744 3511712 -32 mvebu_defconfig: 3871035 3870931 -104 mxs_defconfig: 11091983 11095679 3696 netwinder_defconfig: 3814476 3814124 -352 nhk8815_defconfig: 4277138 4276778 -360 nuc910_defconfig: 2527988 2527956 -32 nuc950_defconfig: 2634352 2634312 -40 nuc960_defconfig: 2546592 2546552 -40 omap2plus_defconfig: 13801444 13800144 -1300 palmz72_defconfig: 3345726 3345758 32 pcm027_defconfig: 3634230 3634158 -72 pleb_defconfig: 2847643 2847515 -128 prima2_defconfig: 3080439 3080347 -92 pxa3xx_defconfig: 4266677 4266457 -220 realview_defconfig: 3891673 3891465 -208 realview-smp_defconfig: 4157253 4157045 -208 rpc_defconfig: 3974677 3974661 -16 s3c2410_defconfig: 5217336 5217000 -336 s3c6400_defconfig: 3209178 3209178 0 s5p64x0_defconfig: 2843307 2843275 -32 s5pc100_defconfig: 2696641 2696657 16 shannon_defconfig: 3586908 3586636 -272 shark_defconfig: 3565471 3565423 -48 simpad_defconfig: 3681433 3681161 -272 socfpga_defconfig: 4326480 4326408 -72 spear13xx_defconfig: 4062410 4062378 -32 spear6xx_defconfig: 3457392 3457360 -32 tct_hammer_defconfig: 2406856 2406824 -32 trizeps4_defconfig: 5869063 5868483 -580 u300_defconfig: 2627488 2627488 0 versatile_defconfig: 3789155 3788787 -368 vexpress_defconfig: 5105737 5105489 -248 viper_defconfig: 3309176 3308920 -256 xcep_defconfig: 2820463 2820367 -96 zeus_defconfig: 3775525 3775367 -158 gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3, however: acs5k_tiny_defconfig: 2873380 2873330 -50 ag5evm_defconfig: 2753016 2753128 112 am200epdkit_defconfig: 3274245 3274295 50 ap4evb_defconfig: 2051952 2051944 -8 armadillo800eva_defconfig: 4531864 4531718 -146 assabet_defconfig: 3540515 3540337 -178 at91_dt_defconfig: 4211064 4205394 -5670 at91rm9200_defconfig: 5785930 5785368 -562 at91sam9261_defconfig: 4705357 4705139 -218 at91sam9263_defconfig: 5098831 5098501 -330 at91sam9g20_defconfig: 4163686 4163576 -110 at91sam9g45_defconfig: 4675215 4669553 -5662 at91x40_defconfig: 1342256 1342192 -64 badge4_defconfig: 3338677 3338615 -62 bcm_defconfig: 3568365 3568259 -106 bonito_defconfig: 2225353 2225355 2 cerfcube_defconfig: 3043650 3043556 -94 cns3420vb_defconfig: 2589035 2589049 14 collie_defconfig: 2994100 2993972 -128 da8xx_omapl_defconfig: 4155189 4155047 -142 davinci_all_defconfig: 4043450 4043304 -146 dove_defconfig: 4902213 4901627 -586 ebsa110_defconfig: 3216131 3215953 -178 ep93xx_defconfig: 4154038 4153796 -242 eseries_pxa_defconfig: 3076688 3076610 -78 exynos4_defconfig: 3001451 3001465 14 ezx_defconfig: 10022647 10022553 -94 footbridge_defconfig: 3640857 3640763 -94 h3600_defconfig: 3453932 3453722 -210 h5000_defconfig: 2963034 2963036 2 h7201_defconfig: 2148969 2148987 18 h7202_defconfig: 3376072 3375894 -178 hackkit_defconfig: 3107994 3107916 -78 imote2_defconfig: 9899372 9899278 -94 imx_v4_v5_defconfig: 6834892 6834338 -554 imx_v6_v7_defconfig: 7637605 7636935 -670 integrator_defconfig: 4204944 4204674 -270 iop13xx_defconfig: 5193128 5192790 -338 iop32x_defconfig: 5576144 5575758 -386 iop33x_defconfig: 4584683 4584585 -98 ixp4xx_defconfig: 4546578 4545176 -1402 jornada720_defconfig: 3121377 3121299 -78 kota2_defconfig: 3951220 3951220 0 kzm9d_defconfig: 3055284 3055252 -32 kzm9g_defconfig: 4742195 4742161 -34 lart_defconfig: 2922550 2926600 4050 lpc32xx_defconfig: 5036722 5032072 -4650 mackerel_defconfig: 4660786 4660612 -174 mini2440_defconfig: 5074529 5074231 -298 msm_defconfig: 2911200 2911240 40 multi_v7_defconfig: 3502888 3502824 -64 mvebu_defconfig: 3853083 3853165 82 mxs_defconfig: 11071139 11070893 -246 netwinder_defconfig: 3791108 3790884 -224 netx_defconfig: 3951133 3950911 -222 nhk8815_defconfig: 4259910 4259696 -214 nuc910_defconfig: 2523448 2523408 -40 nuc950_defconfig: 2633644 2633604 -40 nuc960_defconfig: 2541916 2541876 -40 omap2plus_defconfig: 13770376 13770026 -350 palmz72_defconfig: 3335850 3335868 18 pcm027_defconfig: 3615962 3615896 -66 pleb_defconfig: 2828707 2828753 46 prima2_defconfig: 3021195 3021213 18 pxa3xx_defconfig: 4250245 4250135 -110 realview_defconfig: 3874745 3874791 46 realview-smp_defconfig: 4144973 4145051 78 rpc_defconfig: 3952749 3952831 82 s3c2410_defconfig: 5191024 5195006 3982 s3c6400_defconfig: 3208530 3208548 18 s5p64x0_defconfig: 2837611 2837625 14 s5pc100_defconfig: 2689877 2689895 18 shannon_defconfig: 3567412 3567234 -178 shark_defconfig: 3548103 3548149 46 simpad_defconfig: 3657889 3657763 -126 socfpga_defconfig: 4287524 4287610 86 spear13xx_defconfig: 4040898 4044932 4034 spear6xx_defconfig: 3446968 3446998 30 tct_hammer_defconfig: 2394112 2394126 14 trizeps4_defconfig: 5846883 5846505 -378 u300_defconfig: 2622856 2622934 78 versatile_defconfig: 3766635 3766409 -226 vexpress_defconfig: 5076237 5076279 42 viper_defconfig: 3293388 3293294 -94 xcep_defconfig: 2804423 2808549 4126 zeus_defconfig: 3760613 3760539 -74 Haven't looked at why. In any case, some questions I have are: (a) are the __bswap[sd]i2 implementations acceptable written in C, as in the diff? I don't speak ARM asm (yet at least). The generated code looks pretty optimal in both armv5 and 6+. (b) would adding __bswap[sd]i2 to the kernel build need to be disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the 8-byte swap, even with -Os, on armv6+. (c) testing allyesconfigs is proving to be a pain - lots of breakeage - other than defconfig testing, is there any more I can do? Thanks, and for your patience, Kim diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4265a26..5e8b735 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -58,6 +58,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index c9865f6..c225624 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -111,12 +111,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.o: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.o + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c new file mode 100644 index 0000000..1923b2f --- /dev/null +++ b/arch/arm/lib/bswapsdi2.c @@ -0,0 +1,19 @@ +unsigned int __bswapsi2(unsigned int x) +{ + return ((x & 0x000000ffUL) << 24) | + ((x & 0x0000ff00UL) << 8) | + ((x & 0x00ff0000UL) >> 8) | + ((x & 0xff000000UL) >> 24); +} + +unsigned long long __bswapdi2(unsigned long long x) +{ + return ((x & 0x00000000000000ffULL) << 56) | + ((x & 0x000000000000ff00ULL) << 40) | + ((x & 0x0000000000ff0000ULL) << 24) | + ((x & 0x00000000ff000000ULL) << 8) | + ((x & 0x000000ff00000000ULL) >> 8) | + ((x & 0x0000ff0000000000ULL) >> 24) | + ((x & 0x00ff000000000000ULL) >> 40) | + ((x & 0xff00000000000000ULL) >> 56); +} ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-09 1:12 ` Kim Phillips @ 2013-02-09 3:16 ` Nicolas Pitre 2013-02-20 2:31 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-09 3:16 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 8 Feb 2013, Kim Phillips wrote: > On Fri, 8 Feb 2013 17:47:33 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > > > > what gets used in the kernel - for example, no floating point, and no > > > > > > use of 64 x 64bit division. > > > > > > > > > > Which is all very sensible. But there's no particular reason we couldn't > > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > > > > > > > Absolutely. > > > > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other > > > architectures do, right? > > > > If that turns out to be beneficial over what we have now, then yes. > > I didn't read back the whole thread to form an opinion though. > > The diff below implements __bswap[sd]i2 in arch/arm/lib, and > results in the following savings in vmlinux size: > > column 1: name of defconfig > column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3 > column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3 > column 4: col. 3 - col. 2 (ie., -ve numbers represent savings) > [...] > imx_v6_v7_defconfig: 7672373 7667089 -5284 > lart_defconfig: 2941150 2941054 -96 > mxs_defconfig: 11091983 11095679 3696 The savings are good, with some impressive cases. However the mxs_defconfig is completely the opposite and by far. Any idea? > gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3, > however: Not only that, but in many cases the results are wildly different given the same config: > imx_v6_v7_defconfig: 7637605 7636935 -670 > lart_defconfig: 2922550 2926600 4050 > mxs_defconfig: 11071139 11070893 -246 The mxs_defconfig became much better while lart_defconfig regressed a lot. > Haven't looked at why. Would be a good idea since this is rather weird and gcc could benefit from your findings. > In any case, some questions I have are: > > (a) are the __bswap[sd]i2 implementations acceptable written in C, > as in the diff? I don't speak ARM asm (yet at least). The > generated code looks pretty optimal in both armv5 and 6+. It looks pretty nice indeed: __bswapsi2: eor r2, r0, r0, ror #16 mov r1, r2, lsr #8 bic r3, r1, #65280 eor r0, r3, r0, ror #8 bx lr There is no way to do better than that. But that's true only if -Os is _not_ used. With -Os we get the following output: __bswapsi2: mov r3, r0, asl #24 and r2, r0, #65280 orr r3, r3, r0, lsr #24 orr r3, r3, r2, asl #8 and r0, r0, #16711680 orr r0, r3, r0, lsr #8 bx lr I really don't get why gcc thinks the above is shorter. I'm saving you from pasting the __bswapdi2 result which is also way way worse. That was with Linaro gcc v4.6.2. With Sourcery gcc v4.5.1 we get: __bswapsi2: stmfd sp!, {r3, lr} bl __bswapsi2 ldmfd sp!, {r3, pc} This is indeed shorter, but much less useful. So you better enforce -O2 for this file. And the nice thing with C code is that it is fully optimized with the rev instruction when compiling for ARMv6+ if it is ever used in that case. > (b) would adding __bswap[sd]i2 to the kernel build need to be > disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the > 8-byte swap, even with -Os, on armv6+. I wouldn't bother. That would save only 6 instructions total. And who knows if some gcc flavor start calling them for some reason eventually. > (c) testing allyesconfigs is proving to be a pain - lots of > breakeage - other than defconfig testing, is there any more I can do? The defconfigs provide wildly different results and that is a good thing for further investigation. You may concentrate on a small interesting sample such as those I kept above. With allyesconfig the good configs would cancel out the bad ones making the bad ones invisible. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-09 3:16 ` Nicolas Pitre @ 2013-02-20 2:31 ` Kim Phillips 2013-02-20 2:38 ` Stephen Boyd ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kim Phillips @ 2013-02-20 2:31 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 8 Feb 2013 22:16:47 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Fri, 8 Feb 2013, Kim Phillips wrote: > > > On Fri, 8 Feb 2013 17:47:33 -0500 > > Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote: > > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote: > > > > > > > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote: > > > > > > > > > > > > > > However, the biggest reason not to use libgcc is that we want to control > > > > > > > what gets used in the kernel - for example, no floating point, and no > > > > > > > use of 64 x 64bit division. > > > > > > > > > > > > Which is all very sensible. But there's no particular reason we couldn't > > > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to. > > > > > > > > > > Absolutely. > > > > > > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other > > > > architectures do, right? > > > > > > If that turns out to be beneficial over what we have now, then yes. > > > I didn't read back the whole thread to form an opinion though. > > > > The diff below implements __bswap[sd]i2 in arch/arm/lib, and > > results in the following savings in vmlinux size: > > > > column 1: name of defconfig > > column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3 > > column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3 > > column 4: col. 3 - col. 2 (ie., -ve numbers represent savings) > > > [...] > > imx_v6_v7_defconfig: 7672373 7667089 -5284 > > lart_defconfig: 2941150 2941054 -96 > > mxs_defconfig: 11091983 11095679 3696 > > The savings are good, with some impressive cases. However the > mxs_defconfig is completely the opposite and by far. Any idea? Unfortunately, I don't seem to be able to reproduce this anymore. Same linux-next, with three different compilers always produces smaller binaries: text data bss dec hex filename 5239363 280576 5569648 11089587 a936b3 linux-next-mxs-orig-gcc4.7/vmlinux 5239169 280556 5569648 11089373 a935dd linux-next-mxs-bswap-gcc4.7/vmlinux 5262223 280592 5569648 11112463 a9900f linux-next-mxs-orig-gcc4.6.3/vmlinux 5261909 280584 5569648 11112141 a98ecd linux-next-mxs-bswap-gcc4.6.3/vmlinux 5241379 280580 5569648 11091607 a93e97 linux-next-mxs-orig-gcc4.6ubuntu/vmlinux 5241189 280600 5569648 11091437 a93ded linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux So I've since made a more consistent cross-build environment, using cross tools from Linaro [1,2] instead of via Ubuntu [3]. > > gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3, > > however: > > Not only that, but in many cases the results are wildly different given > the same config: > > > imx_v6_v7_defconfig: 7637605 7636935 -670 > > lart_defconfig: 2922550 2926600 4050 > > mxs_defconfig: 11071139 11070893 -246 > > The mxs_defconfig became much better while lart_defconfig regressed a > lot. > > > Haven't looked at why. > > Would be a good idea since this is rather weird and gcc could benefit > from your findings. The following is next-20130207 built with Linaro gcc 4.7.1 [1], and before and after the diff at the bottom of this email (and with normalized linux version string sizes): lart_defconfig: 2752106 120864 56444 2929414 2cb306 vmlinux lart_defconfig: 2756092 120864 56444 2933400 2cc298 vmlinux #builtin-bswap mxs_defconfig: 5229115 280572 5569648 11079335 a90ea7 vmlinux mxs_defconfig: 5228969 280552 5569648 11079169 a90e01 vmlinux #builtin-bswap imx_v6_v7_defconfig: 6935025 356172 360648 7651845 74c205 vmlinux imx_v6_v7_defconfig: 6934091 356180 360648 7650919 74be67 vmlinux #builtin-bswap so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_ 3986 bytes to .text -> not good. Getting a closer look at lart, bloat-o-meter says the code actually shrunk: add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58) function old new delta inet_abc_len - 96 +96 __bswapdi2 - 52 +52 __bswapsi2 - 32 +32 icmp_unreach 472 492 +20 xfrm_selector_match 988 1000 +12 fib_table_insert 2176 2188 +12 __kstrtab___bswapsi2 - 11 +11 __kstrtab___bswapdi2 - 11 +11 __ksymtab___bswapsi2 - 8 +8 __ksymtab___bswapdi2 - 8 +8 vermagic 51 57 +6 linux_banner 230 236 +6 xfrm_replay_check_esn 320 324 +4 xfrm_replay_check_bmp 200 204 +4 xfrm_replay_check 152 156 +4 static.tcp_parse_aligned_timestamp 80 84 +4 fib_table_delete 708 712 +4 cookie_v4_check 1316 1320 +4 tcp_tso_segment 728 724 -4 tcp_options_write 724 720 -4 ip_rt_ioctl 1152 1148 -4 fib_trie_seq_show 724 720 -4 crc32_be 448 444 -4 xfrm_stateonly_find 640 632 -8 tcp_finish_connect 276 268 -8 static.tcp_v4_send_ack 480 472 -8 __xfrm_state_lookup 356 348 -8 __xfrm_state_bump_genids 436 428 -8 __find_acq_core 1256 1248 -8 cookie_v4_init_sequence 272 260 -12 __xfrm_state_insert 616 600 -16 sys_swapon 2500 2480 -20 xfrm_state_find 2420 2396 -24 xfrm_hash_resize 1620 1596 -24 fib_route_seq_show 560 536 -24 fib_table_dump 704 676 -28 devinet_ioctl 1856 1796 -60 static.inet_abc_len 80 - -80 Comparing System.maps, .rodata starts at the same address: c020a000 R __start_rodata c020a000 R __start_rodata #builtin-bswap however, changes including the __bswap[sd]i2 implementation pushes the .rodata section size just over the 4KiB alignment boundary specified in arm/kernel/vmlinux.lds: no builtin_bswap: c028ffc4 R __stop___modver c0290000 R __end_rodata c0290000 R __start___ex_table with builtin_bswap: c0290068 R __stop___modver c0291000 R __end_rodata c0291000 R __start___ex_table So, AFAICT, that's why we see a total increase in .text for lart, and, looking at both numbers being a little less than 4KiB, I suspect the same with whatever happened with mxs above. FYI, here are the same platforms with Linaro gcc 4.7.3 [2]: lart_defconfig: 2745218 120888 56444 2922550 2c9836 vmlinux lart_defconfig: 2749300 120888 56444 2926632 2ca828 vmlinux #builtin-bswap mxs_defconfig: 5220919 280572 5569648 11071139 a8eea3 vmlinux mxs_defconfig: 5220725 280552 5569648 11070925 a8edcd vmlinux #builtin-bswap imx_v6_v7_defconfig: 6920769 356188 360648 7637605 748a65 vmlinux imx_v6_v7_defconfig: 6920091 356196 360648 7636935 7487c7 vmlinux #builtin-bswap so lart is still mostly affected by the .rodata alignment: add/remove: 7/1 grow/shrink: 11/16 up/down: 330/-308 (22) function old new delta inet_abc_len - 96 +96 __bswapdi2 - 52 +52 fib_table_insert 2152 2196 +44 __bswapsi2 - 32 +32 icmp_unreach 472 492 +20 xfrm_selector_match 988 1000 +12 __kstrtab___bswapsi2 - 11 +11 __kstrtab___bswapdi2 - 11 +11 __ksymtab___bswapsi2 - 8 +8 __ksymtab___bswapdi2 - 8 +8 vermagic 51 57 +6 linux_banner 232 238 +6 xfrm_replay_check_esn 320 324 +4 xfrm_replay_check_bmp 200 204 +4 xfrm_replay_check 152 156 +4 static.tcp_parse_aligned_timestamp 80 84 +4 fib_table_delete 708 712 +4 cookie_v4_check 1316 1320 +4 tcp_options_write 724 720 -4 ip_rt_ioctl 1152 1148 -4 fib_trie_seq_show 724 720 -4 crc32_be 448 444 -4 xfrm_stateonly_find 640 632 -8 tcp_finish_connect 276 268 -8 static.tcp_v4_send_ack 480 472 -8 __xfrm_state_lookup 356 348 -8 __xfrm_state_bump_genids 436 428 -8 __find_acq_core 1252 1244 -8 cookie_v4_init_sequence 272 260 -12 __xfrm_state_insert 616 600 -16 xfrm_state_find 2420 2396 -24 xfrm_hash_resize 1620 1596 -24 fib_table_dump 704 676 -28 devinet_ioctl 1856 1796 -60 static.inet_abc_len 80 - -80 > > In any case, some questions I have are: > > > > (a) are the __bswap[sd]i2 implementations acceptable written in C, > > as in the diff? I don't speak ARM asm (yet at least). The > > generated code looks pretty optimal in both armv5 and 6+. > > It looks pretty nice indeed: > > __bswapsi2: > eor r2, r0, r0, ror #16 > mov r1, r2, lsr #8 > bic r3, r1, #65280 > eor r0, r3, r0, ror #8 > bx lr > > There is no way to do better than that. But that's true only if -Os is > _not_ used. With -Os we get the following output: > > __bswapsi2: > mov r3, r0, asl #24 > and r2, r0, #65280 > orr r3, r3, r0, lsr #24 > orr r3, r3, r2, asl #8 > and r0, r0, #16711680 > orr r0, r3, r0, lsr #8 > bx lr > > I really don't get why gcc thinks the above is shorter. I'm saving you > from pasting the __bswapdi2 result which is also way way worse. > That was with Linaro gcc v4.6.2. > > With Sourcery gcc v4.5.1 we get: > > __bswapsi2: > stmfd sp!, {r3, lr} > bl __bswapsi2 > ldmfd sp!, {r3, pc} > > This is indeed shorter, but much less useful. So you better enforce -O2 > for this file. And the nice thing with C code is that it is fully > optimized with the rev instruction when compiling for ARMv6+ if it is > ever used in that case. ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o. > > (c) testing allyesconfigs is proving to be a pain - lots of > > breakeage - other than defconfig testing, is there any more I can do? > > The defconfigs provide wildly different results and that is a good > thing for further investigation. You may concentrate on a small > interesting sample such as those I kept above. > > With allyesconfig the good configs would cancel out the bad ones making > the bad ones invisible. ok, thanks for your comments, Nicolas. Here's the new diff: changes from last diff: - enforce -O2 for bswapsdi2.o - fix building out-of-source tree diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4265a26..5e8b735 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -58,6 +58,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index c9865f6..8ef97c4 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -111,12 +111,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..dbee639 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S + +CFLAGS_bswapsdi2.o = -O2 diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c new file mode 100644 index 0000000..1923b2f --- /dev/null +++ b/arch/arm/lib/bswapsdi2.c @@ -0,0 +1,19 @@ +unsigned int __bswapsi2(unsigned int x) +{ + return ((x & 0x000000ffUL) << 24) | + ((x & 0x0000ff00UL) << 8) | + ((x & 0x00ff0000UL) >> 8) | + ((x & 0xff000000UL) >> 24); +} + +unsigned long long __bswapdi2(unsigned long long x) +{ + return ((x & 0x00000000000000ffULL) << 56) | + ((x & 0x000000000000ff00ULL) << 40) | + ((x & 0x0000000000ff0000ULL) << 24) | + ((x & 0x00000000ff000000ULL) << 8) | + ((x & 0x000000ff00000000ULL) >> 8) | + ((x & 0x0000ff0000000000ULL) >> 24) | + ((x & 0x00ff000000000000ULL) >> 40) | + ((x & 0xff00000000000000ULL) >> 56); +} Thanks, Kim [1] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-2012.06-20120625 - Linaro GCC 2012.06) 4.7.1 20120531 (prerelease) [2] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.01-20130125 - Linaro GCC 2013.01) 4.7.3 20130102 (prerelease) [3] gcc version 4.6.3 20120624 (prerelease) (Ubuntu/Linaro 4.6.3-8ubuntu1) (deb http://mirrors.us.kernel.org/ubuntu/ precise main universe) ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 2:31 ` Kim Phillips @ 2013-02-20 2:38 ` Stephen Boyd 2013-02-20 3:17 ` Nicolas Pitre 2013-03-13 13:35 ` Woodhouse, David 2 siblings, 0 replies; 62+ messages in thread From: Stephen Boyd @ 2013-02-20 2:38 UTC (permalink / raw) To: Kim Phillips Cc: Nicolas Pitre, Russell King - ARM Linux, Woodhouse, David, Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes, Andrew Morton, linux-arm-kernel On 2/19/2013 6:31 PM, Kim Phillips wrote: > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..dbee639 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > + > +CFLAGS_bswapsdi2.o = -O2 Please put a comment here so we don't have to dig through git history and/or emails to understand why this has -O2 forced on for this file. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 2:31 ` Kim Phillips 2013-02-20 2:38 ` Stephen Boyd @ 2013-02-20 3:17 ` Nicolas Pitre 2013-02-20 10:38 ` Woodhouse, David 2013-03-13 13:35 ` Woodhouse, David 2 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-20 3:17 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Tue, 19 Feb 2013, Kim Phillips wrote: > On Fri, 8 Feb 2013 22:16:47 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > Not only that, but in many cases the results are wildly different given > > the same config: > > > > > imx_v6_v7_defconfig: 7637605 7636935 -670 > > > lart_defconfig: 2922550 2926600 4050 > > > mxs_defconfig: 11071139 11070893 -246 > > > > The mxs_defconfig became much better while lart_defconfig regressed a > > lot. > > > > > Haven't looked at why. > > > > Would be a good idea since this is rather weird and gcc could benefit > > from your findings. > > The following is next-20130207 built with Linaro gcc 4.7.1 [1], and > before and after the diff at the bottom of this email (and with > normalized linux version string sizes): > > lart_defconfig: 2752106 120864 56444 2929414 2cb306 vmlinux > lart_defconfig: 2756092 120864 56444 2933400 2cc298 vmlinux #builtin-bswap > > mxs_defconfig: 5229115 280572 5569648 11079335 a90ea7 vmlinux > mxs_defconfig: 5228969 280552 5569648 11079169 a90e01 vmlinux #builtin-bswap > > imx_v6_v7_defconfig: 6935025 356172 360648 7651845 74c205 vmlinux > imx_v6_v7_defconfig: 6934091 356180 360648 7650919 74be67 vmlinux #builtin-bswap > > > so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_ > 3986 bytes to .text -> not good. > > Getting a closer look at lart, bloat-o-meter says the code actually > shrunk: > > add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58) > function old new delta > inet_abc_len - 96 +96 > __bswapdi2 - 52 +52 > __bswapsi2 - 32 +32 > icmp_unreach 472 492 +20 > xfrm_selector_match 988 1000 +12 > fib_table_insert 2176 2188 +12 > __kstrtab___bswapsi2 - 11 +11 > __kstrtab___bswapdi2 - 11 +11 > __ksymtab___bswapsi2 - 8 +8 > __ksymtab___bswapdi2 - 8 +8 > vermagic 51 57 +6 > linux_banner 230 236 +6 > xfrm_replay_check_esn 320 324 +4 > xfrm_replay_check_bmp 200 204 +4 > xfrm_replay_check 152 156 +4 > static.tcp_parse_aligned_timestamp 80 84 +4 > fib_table_delete 708 712 +4 > cookie_v4_check 1316 1320 +4 > tcp_tso_segment 728 724 -4 > tcp_options_write 724 720 -4 > ip_rt_ioctl 1152 1148 -4 > fib_trie_seq_show 724 720 -4 > crc32_be 448 444 -4 > xfrm_stateonly_find 640 632 -8 > tcp_finish_connect 276 268 -8 > static.tcp_v4_send_ack 480 472 -8 > __xfrm_state_lookup 356 348 -8 > __xfrm_state_bump_genids 436 428 -8 > __find_acq_core 1256 1248 -8 > cookie_v4_init_sequence 272 260 -12 > __xfrm_state_insert 616 600 -16 > sys_swapon 2500 2480 -20 > xfrm_state_find 2420 2396 -24 > xfrm_hash_resize 1620 1596 -24 > fib_route_seq_show 560 536 -24 > fib_table_dump 704 676 -28 > devinet_ioctl 1856 1796 -60 > static.inet_abc_len 80 - -80 > > Comparing System.maps, .rodata starts at the same address: > > c020a000 R __start_rodata > c020a000 R __start_rodata #builtin-bswap > > however, changes including the __bswap[sd]i2 implementation pushes > the .rodata section size just over the 4KiB alignment boundary > specified in arm/kernel/vmlinux.lds: > > no builtin_bswap: > > c028ffc4 R __stop___modver > c0290000 R __end_rodata > c0290000 R __start___ex_table > > with builtin_bswap: > > c0290068 R __stop___modver > c0291000 R __end_rodata > c0291000 R __start___ex_table > > So, AFAICT, that's why we see a total increase in .text for lart, > and, looking at both numbers being a little less than 4KiB, I > suspect the same with whatever happened with mxs above. OK. At least we do have a plausible explanation now. The actual code being smaller should compensate for section alignment loss. > ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o. Not only recursion, but the horrible assembly output from -Os. > Here's the new diff: > > changes from last diff: > - enforce -O2 for bswapsdi2.o > - fix building out-of-source tree > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 4265a26..5e8b735 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -58,6 +58,7 @@ config ARM > select CLONE_BACKWARDS > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile > index c9865f6..8ef97c4 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -111,12 +111,12 @@ endif > > targets := vmlinux vmlinux.lds \ > piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) > > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o > + $(call cmd,shipped) > + I don't think you can get away with this. The decompressor code is compiled with -fpic and the main kernel is not. Most toolchains do mark object files with some flags to prevent the link of incompatible objects together (normally pic and non pic objects are not compatible even if in this very simple case that would not matter). Maybe you are able to link zImage successfully simply because no references to __bswap* needed to be resolved and therefore the linker didn't need to search/include that object? > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..dbee639 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > - call_with_stack.o > + call_with_stack.o bswapsdi2.o > > mmu-y := clear_user.o copy_page.o getuser.o putuser.o > > @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > + > +CFLAGS_bswapsdi2.o = -O2 Please insert a small comment to explain why this is done. Adding some more elaborate explanation in the commit log would be good too. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 3:17 ` Nicolas Pitre @ 2013-02-20 10:38 ` Woodhouse, David 2013-02-20 13:36 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-20 10:38 UTC (permalink / raw) To: Nicolas Pitre Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 1102 bytes --] On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote: > > > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o > > + $(call cmd,shipped) > > + > > I don't think you can get away with this. The decompressor code is > compiled with -fpic and the main kernel is not. Most toolchains do mark > object files with some flags to prevent the link of incompatible objects > together (normally pic and non pic objects are not compatible even if in > this very simple case that would not matter). Maybe you are able to > link zImage successfully simply because no references to __bswap* needed > to be resolved and therefore the linker didn't need to search/include > that object? This, and the issue with -Os vs. -O2, make me inclined to suggest that we should just provide an asm version of these functions instead of using the compiler. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 6242 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 10:38 ` Woodhouse, David @ 2013-02-20 13:36 ` Nicolas Pitre 2013-02-20 13:44 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-20 13:36 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013, Woodhouse, David wrote: > On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote: > > > > > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o > > > + $(call cmd,shipped) > > > + > > > > I don't think you can get away with this. The decompressor code is > > compiled with -fpic and the main kernel is not. Most toolchains do mark > > object files with some flags to prevent the link of incompatible objects > > together (normally pic and non pic objects are not compatible even if in > > this very simple case that would not matter). Maybe you are able to > > link zImage successfully simply because no references to __bswap* needed > > to be resolved and therefore the linker didn't need to search/include > > that object? > > This, and the issue with -Os vs. -O2, make me inclined to suggest that > we should just provide an asm version of these functions instead of > using the compiler. You'll have the same issue wrt the above whether or not the source file is C or assembly. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 13:36 ` Nicolas Pitre @ 2013-02-20 13:44 ` Woodhouse, David 2013-02-20 14:06 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-20 13:44 UTC (permalink / raw) To: Nicolas Pitre Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 654 bytes --] On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote: > You'll have the same issue wrt the above whether or not the source > file is C or assembly. Hm, true. I was thinking of the code itself (which is position-independent anyway), rather than the flags in the object file. So just ship a .S file and for the decompressor (if we need it at all) rebuild it just the same as we do the *other* libgcc code like ashldi3.S etc. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 6242 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 13:44 ` Woodhouse, David @ 2013-02-20 14:06 ` Nicolas Pitre 2013-02-20 14:53 ` Woodhouse, David 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-20 14:06 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013, Woodhouse, David wrote: > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote: > > You'll have the same issue wrt the above whether or not the source > > file is C or assembly. > > Hm, true. I was thinking of the code itself (which is > position-independent anyway), rather than the flags in the object file. > > So just ship a .S file and for the decompressor (if we need it at all) > rebuild it just the same as we do the *other* libgcc code like ashldi3.S > etc. ... in which case there is no harm shipping a .c file and trivially enforcing -O2, the rest being equal. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 14:06 ` Nicolas Pitre @ 2013-02-20 14:53 ` Woodhouse, David 2013-02-20 15:43 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-20 14:53 UTC (permalink / raw) To: Nicolas Pitre Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 968 bytes --] On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote: > > > You'll have the same issue wrt the above whether or not the source > > > file is C or assembly. > > > > Hm, true. I was thinking of the code itself (which is > > position-independent anyway), rather than the flags in the object file. > > > > So just ship a .S file and for the decompressor (if we need it at all) > > rebuild it just the same as we do the *other* libgcc code like ashldi3.S > > etc. > > ... in which case there is no harm shipping a .c file and trivially > enforcing -O2, the rest being equal. For today's compilers, unless the wind changes. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 6242 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 14:53 ` Woodhouse, David @ 2013-02-20 15:43 ` Nicolas Pitre 2013-02-21 3:49 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-20 15:43 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013, Woodhouse, David wrote: > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote: > > > > You'll have the same issue wrt the above whether or not the source > > > > file is C or assembly. > > > > > > Hm, true. I was thinking of the code itself (which is > > > position-independent anyway), rather than the flags in the object file. > > > > > > So just ship a .S file and for the decompressor (if we need it at all) > > > rebuild it just the same as we do the *other* libgcc code like ashldi3.S > > > etc. > > > > ... in which case there is no harm shipping a .c file and trivially > > enforcing -O2, the rest being equal. > > For today's compilers, unless the wind changes. We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 15:43 ` Nicolas Pitre @ 2013-02-21 3:49 ` Kim Phillips 2013-02-21 4:29 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-21 3:49 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013 10:43:18 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > ... in which case there is no harm shipping a .c file and trivially > > > enforcing -O2, the rest being equal. > > > > For today's compilers, unless the wind changes. > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. Alas, not so for gcc 4.4 - I had forgotten I had tested Ubuntu/Linaro 4.4.7-1ubuntu2 here: https://patchwork.kernel.org/patch/2101491/ add -O2 to that test script and gcc 4.4 *always* emits calls to __bswap[sd]i2, even with -march=armv6k+. I'll try working on an assembly version given it probably makes more sense, future-gcc-immunity-wise. Otherwise we're back to the old 'if GCC_VERSION >= 40500' in arch/arm/include/asm/swab.h... Kim ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 3:49 ` Kim Phillips @ 2013-02-21 4:29 ` Nicolas Pitre 2013-02-21 6:52 ` Kim Phillips 2013-02-21 16:37 ` [RFC] " Woodhouse, David 0 siblings, 2 replies; 62+ messages in thread From: Nicolas Pitre @ 2013-02-21 4:29 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013, Kim Phillips wrote: > On Wed, 20 Feb 2013 10:43:18 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > > ... in which case there is no harm shipping a .c file and trivially > > > > enforcing -O2, the rest being equal. > > > > > > For today's compilers, unless the wind changes. > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. > > Alas, not so for gcc 4.4 - I had forgotten I had tested > Ubuntu/Linaro 4.4.7-1ubuntu2 here: > > https://patchwork.kernel.org/patch/2101491/ > > add -O2 to that test script and gcc 4.4 *always* emits calls to > __bswap[sd]i2, even with -march=armv6k+. Crap. OK, assembly code is the way to go then. > I'll try working on an assembly version given it probably > makes more sense, future-gcc-immunity-wise. Agreed. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 4:29 ` Nicolas Pitre @ 2013-02-21 6:52 ` Kim Phillips 2013-02-21 16:40 ` Nicolas Pitre 2013-02-21 16:37 ` [RFC] " Woodhouse, David 1 sibling, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-21 6:52 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Wed, 20 Feb 2013 23:29:58 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Wed, 20 Feb 2013, Kim Phillips wrote: > > > On Wed, 20 Feb 2013 10:43:18 -0500 > > Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > > > ... in which case there is no harm shipping a .c file and trivially > > > > > enforcing -O2, the rest being equal. > > > > > > > > For today's compilers, unless the wind changes. > > > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. > > > > Alas, not so for gcc 4.4 - I had forgotten I had tested > > Ubuntu/Linaro 4.4.7-1ubuntu2 here: > > > > https://patchwork.kernel.org/patch/2101491/ > > > > add -O2 to that test script and gcc 4.4 *always* emits calls to > > __bswap[sd]i2, even with -march=armv6k+. argh, sorry - that script was testing support for __builtin_bswap{16,32,64} directly, which isn't the same as testing code generation of a byte swap pattern in C. > Crap. OK, assembly code is the way to go then. > > > I'll try working on an assembly version given it probably > > makes more sense, future-gcc-immunity-wise. > > Agreed. I'll still try the assembly approach - gcc 4.4's armv6 output looks worse than both the pre-armv6 and post-armv6 __arch_swab32 implementations currently in use: mov ip, sp push {fp, ip, lr, pc} sub fp, ip, #4 and r2, r0, #65280 ; 0xff00 lsl ip, r0, #24 orr r1, ip, r0, lsr #24 and r0, r0, #16711680 ; 0xff0000 orr r3, r1, r2, lsl #8 orr r0, r3, r0, lsr #8 ldm sp, {fp, sp, pc} Kim ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 6:52 ` Kim Phillips @ 2013-02-21 16:40 ` Nicolas Pitre 2013-02-22 2:33 ` Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-21 16:40 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 21 Feb 2013, Kim Phillips wrote: > On Wed, 20 Feb 2013 23:29:58 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Wed, 20 Feb 2013, Kim Phillips wrote: > > > > > On Wed, 20 Feb 2013 10:43:18 -0500 > > > Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > > > > ... in which case there is no harm shipping a .c file and trivially > > > > > > enforcing -O2, the rest being equal. > > > > > > > > > > For today's compilers, unless the wind changes. > > > > > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. > > > > > > Alas, not so for gcc 4.4 - I had forgotten I had tested > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here: > > > > > > https://patchwork.kernel.org/patch/2101491/ > > > > > > add -O2 to that test script and gcc 4.4 *always* emits calls to > > > __bswap[sd]i2, even with -march=armv6k+. > > argh, sorry - that script was testing support for > __builtin_bswap{16,32,64} directly, which isn't the same as testing > code generation of a byte swap pattern in C. Still, I'm not as confident as I was about this. > I'll still try the assembly approach - gcc 4.4's armv6 output looks > worse than both the pre-armv6 and post-armv6 __arch_swab32 > implementations currently in use: > > mov ip, sp > push {fp, ip, lr, pc} > sub fp, ip, #4 You should use -fomit-frame-pointer to compile this. We don't need a frame pointer here, especially for a leaf function that the compiler decides to call on its own. > and r2, r0, #65280 ; 0xff00 > lsl ip, r0, #24 > orr r1, ip, r0, lsr #24 > and r0, r0, #16711680 ; 0xff0000 > orr r3, r1, r2, lsl #8 > orr r0, r3, r0, lsr #8 Other than that, it is true that the above is slightly suboptimal. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 16:40 ` Nicolas Pitre @ 2013-02-22 2:33 ` Kim Phillips 2013-02-22 3:40 ` Nicolas Pitre 0 siblings, 1 reply; 62+ messages in thread From: Kim Phillips @ 2013-02-22 2:33 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 21 Feb 2013 11:40:54 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Thu, 21 Feb 2013, Kim Phillips wrote: > > > On Wed, 20 Feb 2013 23:29:58 -0500 > > Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > On Wed, 20 Feb 2013, Kim Phillips wrote: > > > > > > > On Wed, 20 Feb 2013 10:43:18 -0500 > > > > Nicolas Pitre <nico@fluxnic.net> wrote: > > > > > > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > > > > > ... in which case there is no harm shipping a .c file and trivially > > > > > > > enforcing -O2, the rest being equal. > > > > > > > > > > > > For today's compilers, unless the wind changes. > > > > > > > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway. > > > > > > > > Alas, not so for gcc 4.4 - I had forgotten I had tested > > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here: > > > > > > > > https://patchwork.kernel.org/patch/2101491/ > > > > > > > > add -O2 to that test script and gcc 4.4 *always* emits calls to > > > > __bswap[sd]i2, even with -march=armv6k+. > > > > argh, sorry - that script was testing support for > > __builtin_bswap{16,32,64} directly, which isn't the same as testing > > code generation of a byte swap pattern in C. > > Still, I'm not as confident as I was about this. which part exactly? Having -O2 as "protection"? Yes, me neither. > > I'll still try the assembly approach - gcc 4.4's armv6 output looks > > worse than both the pre-armv6 and post-armv6 __arch_swab32 > > implementations currently in use: > > > > mov ip, sp > > push {fp, ip, lr, pc} > > sub fp, ip, #4 > > You should use -fomit-frame-pointer to compile this. We don't need a > frame pointer here, especially for a leaf function that the compiler > decides to call on its own. > > > and r2, r0, #65280 ; 0xff00 > > lsl ip, r0, #24 > > orr r1, ip, r0, lsr #24 > > and r0, r0, #16711680 ; 0xff0000 > > orr r3, r1, r2, lsl #8 > > orr r0, r3, r0, lsr #8 > > Other than that, it is true that the above is slightly suboptimal. Here's the asm version I'm working on now, based on compiler output of the C version. Haven't tested beyond defconfig builds, which pass ok. Is there anything I have to do for thumb mode? If so, how to test? diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index dedf02b..e8a41d0 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 5cad8a6..a277e97 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -108,12 +108,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S new file mode 100644 index 0000000..e9c8ca7 --- /dev/null +++ b/arch/arm/lib/bswapsdi2.S @@ -0,0 +1,36 @@ +#include <linux/linkage.h> + +#if __LINUX_ARM_ARCH__ >= 6 +ENTRY(__bswapsi2) + rev r0, r0 + bx lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + rev r3, r0 + rev r0, r1 + mov r1, r3 + bx lr +ENDPROC(__bswapdi2) +#else +ENTRY(__bswapsi2) + eor r3, r0, r0, ror #16 + lsr r3, r3, #8 + bic r3, r3, #65280 @ 0xff00 + eor r0, r3, r0, ror #8 + mov pc, lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + mov ip, r1 + eor r3, ip, ip, ror #16 + eor r1, r0, r0, ror #16 + lsr r1, r1, #8 + lsr r3, r3, #8 + bic r3, r3, #65280 @ 0xff00 + bic r1, r1, #65280 @ 0xff00 + eor r1, r1, r0, ror #8 + eor r0, r3, ip, ror #8 + mov pc, lr +ENDPROC(__bswapdi2) +#endif Thanks, Kim ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-22 2:33 ` Kim Phillips @ 2013-02-22 3:40 ` Nicolas Pitre 2013-02-23 1:40 ` [PATCH v6] " Kim Phillips 0 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-02-22 3:40 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 21 Feb 2013, Kim Phillips wrote: > Here's the asm version I'm working on now, based on compiler > output of the C version. Haven't tested beyond defconfig builds, > which pass ok. > > Is there anything I have to do for thumb mode? If so, how to test? You just need to pick a config that uses some ARMv7 processor, and enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch wrt Thumb2. Still, I have minor comments below. > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index dedf02b..e8a41d0 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -59,6 +59,7 @@ config ARM > select CLONE_BACKWARDS > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile > index 5cad8a6..a277e97 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -108,12 +108,12 @@ endif > > targets := vmlinux vmlinux.lds \ > piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ Should be both bswapsdi2.o bswapsdi2.S > font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) Should be bswapsdi2.S. > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S > + $(call cmd,shipped) > + > # We need to prevent any GOTOFF relocs being used with references > # to symbols in the .bss section since we cannot relocate them > # independently from the rest at run time. This can be achieved by > @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ > fi > > $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ > - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE > + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ > + $(bswapsdi2) FORCE > @$(check_for_multiple_zreladdr) > $(call if_changed,ld) > @$(check_for_bad_syms) > diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c > index 60d3b73..ba578f7 100644 > --- a/arch/arm/kernel/armksyms.c > +++ b/arch/arm/kernel/armksyms.c > @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); > extern void __udivsi3(void); > extern void __umodsi3(void); > extern void __do_div64(void); > +extern void __bswapsi2(void); > +extern void __bswapdi2(void); > > extern void __aeabi_idiv(void); > extern void __aeabi_idivmod(void); > @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); > EXPORT_SYMBOL(__udivsi3); > EXPORT_SYMBOL(__umodsi3); > EXPORT_SYMBOL(__do_div64); > +EXPORT_SYMBOL(__bswapsi2); > +EXPORT_SYMBOL(__bswapdi2); > > #ifdef CONFIG_AEABI > EXPORT_SYMBOL(__aeabi_idiv); > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..5383df7 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > - call_with_stack.o > + call_with_stack.o bswapsdi2.o > > mmu-y := clear_user.o copy_page.o getuser.o putuser.o > > diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S > new file mode 100644 > index 0000000..e9c8ca7 > --- /dev/null > +++ b/arch/arm/lib/bswapsdi2.S > @@ -0,0 +1,36 @@ > +#include <linux/linkage.h> > + > +#if __LINUX_ARM_ARCH__ >= 6 > +ENTRY(__bswapsi2) > + rev r0, r0 > + bx lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + rev r3, r0 > + rev r0, r1 > + mov r1, r3 > + bx lr > +ENDPROC(__bswapdi2) > +#else > +ENTRY(__bswapsi2) > + eor r3, r0, r0, ror #16 > + lsr r3, r3, #8 Some older binutils used with pre ARMv6 platforms don't understand the latest unified syntax. So in this case it is better to use: mov r3, r3, lsr #8 > + bic r3, r3, #65280 @ 0xff00 Please use #0xff00 directly rather than keeping it as a comment. > + eor r0, r3, r0, ror #8 > + mov pc, lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + mov ip, r1 > + eor r3, ip, ip, ror #16 > + eor r1, r0, r0, ror #16 > + lsr r1, r1, #8 > + lsr r3, r3, #8 > + bic r3, r3, #65280 @ 0xff00 > + bic r1, r1, #65280 @ 0xff00 Same comments for the 4 instructions above. > + eor r1, r1, r0, ror #8 > + eor r0, r3, ip, ror #8 > + mov pc, lr > +ENDPROC(__bswapdi2) > +#endif > > Thanks, > > Kim > ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v6] arm: use built-in byte swap function 2013-02-22 3:40 ` Nicolas Pitre @ 2013-02-23 1:40 ` Kim Phillips 2013-02-23 2:40 ` Nicolas Pitre 2013-02-23 23:20 ` Woodhouse, David 0 siblings, 2 replies; 62+ messages in thread From: Kim Phillips @ 2013-02-23 1:40 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 21 Feb 2013 22:40:08 -0500 Nicolas Pitre <nico@fluxnic.net> wrote: > On Thu, 21 Feb 2013, Kim Phillips wrote: > > > Here's the asm version I'm working on now, based on compiler > > output of the C version. Haven't tested beyond defconfig builds, > > which pass ok. > > > > Is there anything I have to do for thumb mode? If so, how to test? > > You just need to pick a config that uses some ARMv7 processor, and > enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch > wrt Thumb2. ok, I've addressed your comments and tested both pre-armv6 and armv6 + bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and CONFIG_THUMB2_KERNEL set: >From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001 From: Kim Phillips <kim.phillips@freescale.com> Date: Tue, 19 Feb 2013 17:16:11 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3): text data bss dec hex filename 2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig 2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap 6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig 6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap 7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig 7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap Signed-off-by: Kim Phillips <kim.phillips@freescale.com> --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130221. changes from last diff: - addressed Nicolas' comments - updated commit text figures and reformatted as a patch changes from diff before that: - 1st asm version changes from diff before that: - enforce -O2 for bswapsdi2.o - fix building out-of-source tree changes from diff before that: - implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c v5: re-work based on new gcc version test data: - moved outside armv6 protection - check for gcc 4.6+ demoted to gcc 4.5+ with: !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) v4: - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris and David - object is to find arches that define _HAVE_BSWAP and clean it up in the future: patch is much less intrusive. :) v3: - moved out of uapi swab.h into arch/arm/include/asm/swab.h - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/arm/Kconfig | 1 + arch/arm/boot/compressed/Makefile | 15 +++++++++++---- arch/arm/kernel/armksyms.c | 4 ++++ arch/arm/lib/Makefile | 2 +- arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++ 5 files changed, 53 insertions(+), 5 deletions(-) create mode 100644 arch/arm/lib/bswapsdi2.S diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index dedf02b..e8a41d0 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 5cad8a6..d9b5ee5 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -108,12 +108,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ - font.o font.c head.o misc.o $(OBJS) + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S new file mode 100644 index 0000000..2ba43a0 --- /dev/null +++ b/arch/arm/lib/bswapsdi2.S @@ -0,0 +1,36 @@ +#include <linux/linkage.h> + +#if __LINUX_ARM_ARCH__ >= 6 +ENTRY(__bswapsi2) + rev r0, r0 + bx lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + rev r3, r0 + rev r0, r1 + mov r1, r3 + bx lr +ENDPROC(__bswapdi2) +#else +ENTRY(__bswapsi2) + eor r3, r0, r0, ror #16 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + eor r0, r3, r0, ror #8 + mov pc, lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + mov ip, r1 + eor r3, ip, ip, ror #16 + eor r1, r0, r0, ror #16 + mov r1, r1, lsr #8 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + bic r1, r1, #0xff00 + eor r1, r1, r0, ror #8 + eor r0, r3, ip, ror #8 + mov pc, lr +ENDPROC(__bswapdi2) +#endif -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v6] arm: use built-in byte swap function 2013-02-23 1:40 ` [PATCH v6] " Kim Phillips @ 2013-02-23 2:40 ` Nicolas Pitre 2013-02-23 23:20 ` Woodhouse, David 1 sibling, 0 replies; 62+ messages in thread From: Nicolas Pitre @ 2013-02-23 2:40 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Fri, 22 Feb 2013, Kim Phillips wrote: > On Thu, 21 Feb 2013 22:40:08 -0500 > Nicolas Pitre <nico@fluxnic.net> wrote: > > > On Thu, 21 Feb 2013, Kim Phillips wrote: > > > > > Here's the asm version I'm working on now, based on compiler > > > output of the C version. Haven't tested beyond defconfig builds, > > > which pass ok. > > > > > > Is there anything I have to do for thumb mode? If so, how to test? > > > > You just need to pick a config that uses some ARMv7 processor, and > > enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch > > wrt Thumb2. > > ok, I've addressed your comments and tested both pre-armv6 and armv6 > + bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and > CONFIG_THUMB2_KERNEL set: > > >From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001 > From: Kim Phillips <kim.phillips@freescale.com> > Date: Tue, 19 Feb 2013 17:16:11 -0600 > Subject: [PATCH] arm: use built-in byte swap function > > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3): > > text data bss dec hex filename > 2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig > 2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap > 6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig > 6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap > 7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig > 7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> > --- > akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > based on linux-next-20130221. > > changes from last diff: > - addressed Nicolas' comments > - updated commit text figures and reformatted as a patch > > changes from diff before that: > - 1st asm version > > changes from diff before that: > - enforce -O2 for bswapsdi2.o > - fix building out-of-source tree > > changes from diff before that: > - implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c > > v5: re-work based on new gcc version test data: > - moved outside armv6 protection > - check for gcc 4.6+ demoted to gcc 4.5+ with: > !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) > > v4: > - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris > and David - object is to find arches that define _HAVE_BSWAP > and clean it up in the future: patch is much less intrusive. :) > > v3: > - moved out of uapi swab.h into arch/arm/include/asm/swab.h > - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message > - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block > > v2: > - at91 and lpd270 builds fixed by limiting to ARMv6 and above > (i.e., ARM cores that have support for the 'rev' instruction). > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). > All ARM defconfigs now have the same build status as they did > without this patch (some are broken on linux-next). > > - move ARM check from generic compiler.h to arch ARM's swab.h. > - pretty sure it should be limited to __KERNEL__ builds > > - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). > - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx > - not too sure about this having to be a new CONFIG_, but it's hard > to find a place for it given linux/compiler.h doesn't include any > arch-specific files. > > - move new selects to end of CONFIG_ARM's Kconfig select list, > as is done in David Woodhouse's original patchseries for ppc/x86. > > arch/arm/Kconfig | 1 + > arch/arm/boot/compressed/Makefile | 15 +++++++++++---- > arch/arm/kernel/armksyms.c | 4 ++++ > arch/arm/lib/Makefile | 2 +- > arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 53 insertions(+), 5 deletions(-) > create mode 100644 arch/arm/lib/bswapsdi2.S > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index dedf02b..e8a41d0 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -59,6 +59,7 @@ config ARM > select CLONE_BACKWARDS > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile > index 5cad8a6..d9b5ee5 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -108,12 +108,12 @@ endif > > targets := vmlinux vmlinux.lds \ > piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > - font.o font.c head.o misc.o $(OBJS) > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) > > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S > + $(call cmd,shipped) > + > # We need to prevent any GOTOFF relocs being used with references > # to symbols in the .bss section since we cannot relocate them > # independently from the rest at run time. This can be achieved by > @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ > fi > > $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ > - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE > + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ > + $(bswapsdi2) FORCE > @$(check_for_multiple_zreladdr) > $(call if_changed,ld) > @$(check_for_bad_syms) > diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c > index 60d3b73..ba578f7 100644 > --- a/arch/arm/kernel/armksyms.c > +++ b/arch/arm/kernel/armksyms.c > @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); > extern void __udivsi3(void); > extern void __umodsi3(void); > extern void __do_div64(void); > +extern void __bswapsi2(void); > +extern void __bswapdi2(void); > > extern void __aeabi_idiv(void); > extern void __aeabi_idivmod(void); > @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); > EXPORT_SYMBOL(__udivsi3); > EXPORT_SYMBOL(__umodsi3); > EXPORT_SYMBOL(__do_div64); > +EXPORT_SYMBOL(__bswapsi2); > +EXPORT_SYMBOL(__bswapdi2); > > #ifdef CONFIG_AEABI > EXPORT_SYMBOL(__aeabi_idiv); > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..5383df7 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > - call_with_stack.o > + call_with_stack.o bswapsdi2.o > > mmu-y := clear_user.o copy_page.o getuser.o putuser.o > > diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S > new file mode 100644 > index 0000000..2ba43a0 > --- /dev/null > +++ b/arch/arm/lib/bswapsdi2.S > @@ -0,0 +1,36 @@ > +#include <linux/linkage.h> > + > +#if __LINUX_ARM_ARCH__ >= 6 > +ENTRY(__bswapsi2) > + rev r0, r0 > + bx lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + rev r3, r0 > + rev r0, r1 > + mov r1, r3 > + bx lr > +ENDPROC(__bswapdi2) > +#else > +ENTRY(__bswapsi2) > + eor r3, r0, r0, ror #16 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + eor r0, r3, r0, ror #8 > + mov pc, lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + mov ip, r1 > + eor r3, ip, ip, ror #16 > + eor r1, r0, r0, ror #16 > + mov r1, r1, lsr #8 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + bic r1, r1, #0xff00 > + eor r1, r1, r0, ror #8 > + eor r0, r3, ip, ror #8 > + mov pc, lr > +ENDPROC(__bswapdi2) > +#endif > -- > 1.8.1.4 > > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v6] arm: use built-in byte swap function 2013-02-23 1:40 ` [PATCH v6] " Kim Phillips 2013-02-23 2:40 ` Nicolas Pitre @ 2013-02-23 23:20 ` Woodhouse, David 2013-05-23 16:46 ` [PATCH v7] " Kim Phillips 1 sibling, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-23 23:20 UTC (permalink / raw) To: Kim Phillips Cc: Nicolas Pitre, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 1095 bytes --] On Fri, 2013-02-22 at 19:40 -0600, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3): > > text data bss dec hex filename > 2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig > 2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap > 6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig > 6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap > 7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig > 7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Looks good, thanks. Acked-by: David Woodhouse <David.Woodhouse@intel.com> -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH v7] arm: use built-in byte swap function 2013-02-23 23:20 ` Woodhouse, David @ 2013-05-23 16:46 ` Kim Phillips 2013-05-23 20:09 ` Nicolas Pitre ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Kim Phillips @ 2013-05-23 16:46 UTC (permalink / raw) To: Woodhouse, David, Russell King - ARM Linux Cc: Nicolas Pitre, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, and has a very modest benefit on vmlinux size (Linaro gcc 4.8): text data bss dec hex filename 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Acked-by: David Woodhouse <David.Woodhouse@intel.com> --- resending as v6 appears to have fallen though the cracks. Russell? v7: rebased onto next-20130521, re-ran above vmlinux sizes with Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by. v6 and prior version information: https://lkml.org/lkml/2013/2/22/475 arch/arm/Kconfig | 1 + arch/arm/boot/compressed/Makefile | 15 +++++++++++---- arch/arm/kernel/armksyms.c | 4 ++++ arch/arm/lib/Makefile | 2 +- arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++ 5 files changed, 53 insertions(+), 5 deletions(-) create mode 100644 arch/arm/lib/bswapsdi2.S diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a7fc5ea..c2fe04d 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -63,6 +63,7 @@ config ARM select OLD_SIGSUSPEND3 select OLD_SIGACTION select HAVE_CONTEXT_TRACKING + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 198a4ad..bd8a176 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -112,12 +112,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ - font.o font.c head.o misc.o $(OBJS) + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); EXPORT_SYMBOL(__udivsi3); EXPORT_SYMBOL(__umodsi3); EXPORT_SYMBOL(__do_div64); +EXPORT_SYMBOL(__bswapsi2); +EXPORT_SYMBOL(__bswapdi2); #ifdef CONFIG_AEABI EXPORT_SYMBOL(__aeabi_idiv); diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index af72969..5383df7 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ ucmpdi2.o lib1funcs.o div64.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ - call_with_stack.o + call_with_stack.o bswapsdi2.o mmu-y := clear_user.o copy_page.o getuser.o putuser.o diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S new file mode 100644 index 0000000..2ba43a0 --- /dev/null +++ b/arch/arm/lib/bswapsdi2.S @@ -0,0 +1,36 @@ +#include <linux/linkage.h> + +#if __LINUX_ARM_ARCH__ >= 6 +ENTRY(__bswapsi2) + rev r0, r0 + bx lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + rev r3, r0 + rev r0, r1 + mov r1, r3 + bx lr +ENDPROC(__bswapdi2) +#else +ENTRY(__bswapsi2) + eor r3, r0, r0, ror #16 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + eor r0, r3, r0, ror #8 + mov pc, lr +ENDPROC(__bswapsi2) + +ENTRY(__bswapdi2) + mov ip, r1 + eor r3, ip, ip, ror #16 + eor r1, r0, r0, ror #16 + mov r1, r1, lsr #8 + mov r3, r3, lsr #8 + bic r3, r3, #0xff00 + bic r1, r1, #0xff00 + eor r1, r1, r0, ror #8 + eor r0, r3, ip, ror #8 + mov pc, lr +ENDPROC(__bswapdi2) +#endif -- 1.8.1.5 ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-05-23 16:46 ` [PATCH v7] " Kim Phillips @ 2013-05-23 20:09 ` Nicolas Pitre 2013-05-23 23:13 ` Russell King - ARM Linux 2013-10-27 2:41 ` Nicolas Pitre 2 siblings, 0 replies; 62+ messages in thread From: Nicolas Pitre @ 2013-05-23 20:09 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 23 May 2013, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > --- > resending as v6 appears to have fallen though the cracks. Russell? Please send your patch to Russell's patch system: http://www.arm.linux.org.uk/developer/patches/ Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-05-23 16:46 ` [PATCH v7] " Kim Phillips 2013-05-23 20:09 ` Nicolas Pitre @ 2013-05-23 23:13 ` Russell King - ARM Linux 2013-06-06 22:12 ` Russell King - ARM Linux 2013-10-27 2:41 ` Nicolas Pitre 2 siblings, 1 reply; 62+ messages in thread From: Russell King - ARM Linux @ 2013-05-23 23:13 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Nicolas Pitre, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > --- > resending as v6 appears to have fallen though the cracks. Russell? Please put it in the patch system (otherwise I do drop patches.) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-05-23 23:13 ` Russell King - ARM Linux @ 2013-06-06 22:12 ` Russell King - ARM Linux 2013-06-06 22:23 ` Borislav Petkov 2013-06-07 0:03 ` Stephen Rothwell 0 siblings, 2 replies; 62+ messages in thread From: Russell King - ARM Linux @ 2013-06-06 22:12 UTC (permalink / raw) To: Kim Phillips, Arnd Bergmann, Stephen Rothwell Cc: Woodhouse, David, Nicolas Pitre, Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes, Andrew Morton, linux-arm-kernel On Fri, May 24, 2013 at 12:13:36AM +0100, Russell King - ARM Linux wrote: > On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote: > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > > > text data bss dec hex filename > > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > > Reviewed-by: Nicolas Pitre <nico@linaro.org> > > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > > --- > > resending as v6 appears to have fallen though the cracks. Russell? > > Please put it in the patch system (otherwise I do drop patches.) (Added Arnd/SFR in case they have comments.) So, we have a problem here - the kind which appears when people stuff things into the -next tree which aren't destined for the next merge window. This is the relevant context from your patch, which is against linux-next: - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ - font.o font.c head.o misc.o $(OBJS) + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ ^^^^^^^^^ - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) the underlined bit - piggy.lz4 for those who read mail with proportional fonts. That is not in any kernel I have, and if it _is_ something that is destined for the next merge window, it should be in my tree as it's a core ARM feature, not in some random other tree. Short of hand-editing and manually applying the patch, a solution would be to rebase it on a mainline kernel version, like -rc4, and resubmit that version instead. That will ultimately then give sfr a conflict which should be trivial to resolve - and hopefully we'll find out who's carrying the LZ4 patch and putting it into linux-next. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-06-06 22:12 ` Russell King - ARM Linux @ 2013-06-06 22:23 ` Borislav Petkov 2013-06-07 0:03 ` Stephen Rothwell 1 sibling, 0 replies; 62+ messages in thread From: Borislav Petkov @ 2013-06-06 22:23 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Kim Phillips, Arnd Bergmann, Stephen Rothwell, Woodhouse, David, Nicolas Pitre, Rusty Russell, linux-kernel, Daniel Santos, David Rientjes, Andrew Morton, linux-arm-kernel On Thu, Jun 06, 2013 at 11:12:34PM +0100, Russell King - ARM Linux wrote: > That will ultimately then give sfr a conflict which should be trivial > to resolve - and hopefully we'll find out who's carrying the LZ4 patch > and putting it into linux-next. That should be akpm: http://ozlabs.org/~akpm/mmotm/broken-out/arm-add-support-for-lz4-compressed-kernel.patch AFAICT. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-06-06 22:12 ` Russell King - ARM Linux 2013-06-06 22:23 ` Borislav Petkov @ 2013-06-07 0:03 ` Stephen Rothwell 1 sibling, 0 replies; 62+ messages in thread From: Stephen Rothwell @ 2013-06-07 0:03 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Kim Phillips, Arnd Bergmann, Woodhouse, David, Nicolas Pitre, Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes, Andrew Morton, linux-arm-kernel, Kyungsik Lee [-- Attachment #1: Type: text/plain, Size: 2386 bytes --] Hi Russell, On Thu, 6 Jun 2013 23:12:34 +0100 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > So, we have a problem here - the kind which appears when people stuff > things into the -next tree which aren't destined for the next merge > window. This is the relevant context from your patch, which is > against linux-next: > > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > - font.o font.c head.o misc.o $(OBJS) > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ > ^^^^^^^^^ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) > > the underlined bit - piggy.lz4 for those who read mail with proportional > fonts. > > That is not in any kernel I have, and if it _is_ something that is > destined for the next merge window, it should be in my tree as it's > a core ARM feature, not in some random other tree. That is commit d8a6bf1b25bd ("arm: add support for LZ4-compressed kernel") from next-20130606 from the akpm tree. (adding author cc) That patch was cc'd to you, and is part of a series that adds LZ4 compression to the kernel, so would not work on its own. The first patch in the series is "decompressor: add LZ4 decompressor module". > Short of hand-editing and manually applying the patch, a solution would > be to rebase it on a mainline kernel version, like -rc4, and resubmit > that version instead. That will ultimately then give sfr a conflict > which should be trivial to resolve - and hopefully we'll find out who's > carrying the LZ4 patch and putting it into linux-next. People should *never, ever* submit patches based on linux-next (unless, of course they are to me to help fix merge conflicts in linux-next, etc). Patches submitted to a particular maintainer should be based on (an ancestor of) that maintainer's current tree. Sure, test new code before and after merging linux-next, but don;t base new code on it. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-05-23 16:46 ` [PATCH v7] " Kim Phillips 2013-05-23 20:09 ` Nicolas Pitre 2013-05-23 23:13 ` Russell King - ARM Linux @ 2013-10-27 2:41 ` Nicolas Pitre 2013-11-05 21:45 ` Kim Phillips 2 siblings, 1 reply; 62+ messages in thread From: Nicolas Pitre @ 2013-10-27 2:41 UTC (permalink / raw) To: Kim Phillips Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Thu, 23 May 2013, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > 4.8): > > text data bss dec hex filename > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > Reviewed-by: Nicolas Pitre <nico@linaro.org> > Acked-by: David Woodhouse <David.Woodhouse@intel.com> Did this ever go somewhere? Russell suggested at the time to base it against a mainline kernel (since it was patching files which apparently were already patched with out-of-tree lz4 patches) and send it to his patch system. > --- > resending as v6 appears to have fallen though the cracks. Russell? > > v7: rebased onto next-20130521, re-ran above vmlinux sizes with > Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by. > v6 and prior version information: > https://lkml.org/lkml/2013/2/22/475 > > arch/arm/Kconfig | 1 + > arch/arm/boot/compressed/Makefile | 15 +++++++++++---- > arch/arm/kernel/armksyms.c | 4 ++++ > arch/arm/lib/Makefile | 2 +- > arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 53 insertions(+), 5 deletions(-) > create mode 100644 arch/arm/lib/bswapsdi2.S > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index a7fc5ea..c2fe04d 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -63,6 +63,7 @@ config ARM > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > select HAVE_CONTEXT_TRACKING > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile > index 198a4ad..bd8a176 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -112,12 +112,12 @@ endif > > targets := vmlinux vmlinux.lds \ > piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > - font.o font.c head.o misc.o $(OBJS) > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ > + bswapsdi2.S font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs) > > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S > + $(call cmd,shipped) > + > # We need to prevent any GOTOFF relocs being used with references > # to symbols in the .bss section since we cannot relocate them > # independently from the rest at run time. This can be achieved by > @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ > fi > > $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ > - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE > + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ > + $(bswapsdi2) FORCE > @$(check_for_multiple_zreladdr) > $(call if_changed,ld) > @$(check_for_bad_syms) > diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c > index 60d3b73..ba578f7 100644 > --- a/arch/arm/kernel/armksyms.c > +++ b/arch/arm/kernel/armksyms.c > @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); > extern void __udivsi3(void); > extern void __umodsi3(void); > extern void __do_div64(void); > +extern void __bswapsi2(void); > +extern void __bswapdi2(void); > > extern void __aeabi_idiv(void); > extern void __aeabi_idivmod(void); > @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); > EXPORT_SYMBOL(__udivsi3); > EXPORT_SYMBOL(__umodsi3); > EXPORT_SYMBOL(__do_div64); > +EXPORT_SYMBOL(__bswapsi2); > +EXPORT_SYMBOL(__bswapdi2); > > #ifdef CONFIG_AEABI > EXPORT_SYMBOL(__aeabi_idiv); > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..5383df7 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > - call_with_stack.o > + call_with_stack.o bswapsdi2.o > > mmu-y := clear_user.o copy_page.o getuser.o putuser.o > > diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S > new file mode 100644 > index 0000000..2ba43a0 > --- /dev/null > +++ b/arch/arm/lib/bswapsdi2.S > @@ -0,0 +1,36 @@ > +#include <linux/linkage.h> > + > +#if __LINUX_ARM_ARCH__ >= 6 > +ENTRY(__bswapsi2) > + rev r0, r0 > + bx lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + rev r3, r0 > + rev r0, r1 > + mov r1, r3 > + bx lr > +ENDPROC(__bswapdi2) > +#else > +ENTRY(__bswapsi2) > + eor r3, r0, r0, ror #16 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + eor r0, r3, r0, ror #8 > + mov pc, lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + mov ip, r1 > + eor r3, ip, ip, ror #16 > + eor r1, r0, r0, ror #16 > + mov r1, r1, lsr #8 > + mov r3, r3, lsr #8 > + bic r3, r3, #0xff00 > + bic r1, r1, #0xff00 > + eor r1, r1, r0, ror #8 > + eor r0, r3, ip, ror #8 > + mov pc, lr > +ENDPROC(__bswapdi2) > +#endif > -- > 1.8.1.5 > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH v7] arm: use built-in byte swap function 2013-10-27 2:41 ` Nicolas Pitre @ 2013-11-05 21:45 ` Kim Phillips 0 siblings, 0 replies; 62+ messages in thread From: Kim Phillips @ 2013-11-05 21:45 UTC (permalink / raw) To: Nicolas Pitre Cc: Woodhouse, David, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring On Sat, 26 Oct 2013 22:41:34 -0400 Nicolas Pitre <nico@fluxnic.net> wrote: > On Thu, 23 May 2013, Kim Phillips wrote: > > > Enable the compiler intrinsic for byte swapping on arch ARM. This > > allows the compiler to detect and be able to optimize out byte > > swappings, and has a very modest benefit on vmlinux size (Linaro gcc > > 4.8): > > > > text data bss dec hex filename > > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig > > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap > > > > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig > > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap > > > > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig > > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap > > > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > > Reviewed-by: Nicolas Pitre <nico@linaro.org> > > Acked-by: David Woodhouse <David.Woodhouse@intel.com> > > Did this ever go somewhere? > > Russell suggested at the time to base it against a mainline kernel > (since it was patching files which apparently were already patched with > out-of-tree lz4 patches) and send it to his patch system. I'll re-base and send it to his patch system. Thanks, Kim ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 4:29 ` Nicolas Pitre 2013-02-21 6:52 ` Kim Phillips @ 2013-02-21 16:37 ` Woodhouse, David 2013-02-21 17:27 ` Nicolas Pitre 1 sibling, 1 reply; 62+ messages in thread From: Woodhouse, David @ 2013-02-21 16:37 UTC (permalink / raw) To: Nicolas Pitre Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 513 bytes --] On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote: > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > For today's compilers, unless the wind changes. > … > Crap. OK, assembly code is the way to go then. How quickly the wind changes, these days. I blame global warming. :) -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 6242 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-21 16:37 ` [RFC] " Woodhouse, David @ 2013-02-21 17:27 ` Nicolas Pitre 0 siblings, 0 replies; 62+ messages in thread From: Nicolas Pitre @ 2013-02-21 17:27 UTC (permalink / raw) To: Woodhouse, David Cc: Kim Phillips, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: TEXT/PLAIN, Size: 403 bytes --] On Thu, 21 Feb 2013, Woodhouse, David wrote: > On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote: > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > > For today's compilers, unless the wind changes. > > … > > Crap. OK, assembly code is the way to go then. > > How quickly the wind changes, these days. I blame global warming. :) Only fools don't believe global warming. Nicolas ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-02-20 2:31 ` Kim Phillips 2013-02-20 2:38 ` Stephen Boyd 2013-02-20 3:17 ` Nicolas Pitre @ 2013-03-13 13:35 ` Woodhouse, David 2 siblings, 0 replies; 62+ messages in thread From: Woodhouse, David @ 2013-03-13 13:35 UTC (permalink / raw) To: Kim Phillips Cc: Nicolas Pitre, Russell King - ARM Linux, Borislav Petkov, Andrew Morton, Daniel Santos, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel, Rob Herring [-- Attachment #1: Type: text/plain, Size: 1838 bytes --] On Tue, 2013-02-19 at 20:31 -0600, Kim Phillips wrote: > > > > imx_v6_v7_defconfig: 7672373 7667089 -5284 > > > lart_defconfig: 2941150 2941054 -96 > > > mxs_defconfig: 11091983 11095679 3696 > > > > The savings are good, with some impressive cases. However the > > mxs_defconfig is completely the opposite and by far. Any idea? > > Unfortunately, I don't seem to be able to reproduce this anymore. > Same linux-next, with three different compilers always produces > smaller binaries: > > text data bss dec hex filename > 5239363 280576 5569648 11089587 a936b3 linux-next-mxs-orig-gcc4.7/vmlinux > 5239169 280556 5569648 11089373 a935dd linux-next-mxs-bswap-gcc4.7/vmlinux > > 5262223 280592 5569648 11112463 a9900f linux-next-mxs-orig-gcc4.6.3/vmlinux > 5261909 280584 5569648 11112141 a98ecd linux-next-mxs-bswap-gcc4.6.3/vmlinux > > 5241379 280580 5569648 11091607 a93e97 linux-next-mxs-orig-gcc4.6ubuntu/vmlinux > 5241189 280600 5569648 11091437 a93ded linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux > > So I've since made a more consistent cross-build environment, using > cross tools from Linaro [1,2] instead of via Ubuntu [3]. Andrew Pinski has done some work on GCC to support further optimisations: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177 If you feel like building with the branch at http://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/pinskia/bytewiseunop and seeing how that affects the results, that could be interesting. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 6242 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 1:30 [RFC] arm: use built-in byte swap function Kim Phillips 2013-01-29 8:35 ` Borislav Petkov @ 2013-01-29 14:13 ` Russell King - ARM Linux 2013-01-29 14:43 ` Woodhouse, David 2013-01-29 14:53 ` Rob Herring 2 siblings, 1 reply; 62+ messages in thread From: Russell King - ARM Linux @ 2013-01-29 14:13 UTC (permalink / raw) To: Kim Phillips Cc: Andrew Morton, Daniel Santos, Borislav Petkov, David Rientjes, Rusty Russell, David Woodhouse, linux-arm-kernel, linux-kernel On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote: > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. Hmm. $ /usr/local/aeabi/bin/arm-linux-gcc --version arm-linux-gcc (GCC) 4.5.4 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \ echo $a:; \ for f in 16 32 64; do \ echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | \ /usr/local/aeabi/bin/arm-linux-gcc -x c -S -o - - -march=$a | grep 'bl'; \ done; \ done produces: armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps across all our architectures, but not the 16-bit ones. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 14:13 ` Russell King - ARM Linux @ 2013-01-29 14:43 ` Woodhouse, David 0 siblings, 0 replies; 62+ messages in thread From: Woodhouse, David @ 2013-01-29 14:43 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Kim Phillips, Andrew Morton, Daniel Santos, Borislav Petkov, David Rientjes, Rusty Russell, linux-arm-kernel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1394 bytes --] On Tue, 2013-01-29 at 14:13 +0000, Russell King - ARM Linux wrote: > > So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps > across all our architectures, but not the 16-bit ones. That observation is consistent with my dig through GCC history. I had come to the conclusion that the 32-bit and 64-bit versions were added *generically* in 4.4, and that the 16-bit version was added in 4.6 to that PowerPC back end, and made generic in 4.8. So I *had* put that arch-specific check into compiler-gcc4.h, for PowerPC. It's just outside the context of Kim's patch. If it really does end up being different for every arch, I may reconsider that. As for the __bswapsi2() calls... if it's ever emitting an out-of-line call for something like that, that seems like a really dubious decision; surely it's better being inlined? So rather than adding it to your bits-of-libgcc.a in the kernel, I'd suggest just *not* using ARCH_USE_BUILTIN_BSWAP for the offending compilers, and filing a bug to get them fixed. But really, this is why I created ARCH_USE_BUILTIN_BSWAP and left it to architecture maintainers to enable it at their leisure.... :) -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 1:30 [RFC] arm: use built-in byte swap function Kim Phillips 2013-01-29 8:35 ` Borislav Petkov 2013-01-29 14:13 ` Russell King - ARM Linux @ 2013-01-29 14:53 ` Rob Herring 2013-01-29 15:10 ` Woodhouse, David 2 siblings, 1 reply; 62+ messages in thread From: Rob Herring @ 2013-01-29 14:53 UTC (permalink / raw) To: Kim Phillips Cc: Russell King, Andrew Morton, David Woodhouse, Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes, linux-arm-kernel On 01/28/2013 07:30 PM, Kim Phillips wrote: > Enable the compiler intrinsic for byte swapping on arch ARM. This > allows the compiler to detect and be able to optimize out byte > swappings, e.g. in big endian to big endian moves. > > AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6, > and for the 16-bit version in 4.8. I think these are the versions ARM got optimized swap support. The newer compilers are smart enough to replace a swap written in C with a rev instruction. Last time I checked, they did not do rev16, but that is probably what was added in 4.8 (and backported to Linaro gcc). The Linaro backports make it impossible to define what gcc version has a specific feature. If you specify to use the builtin's, do you still get inline rev instructions emitted? Rob > > Signed-off-by: Kim Phillips <kim.phillips@freescale.com> > --- > akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 > > based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use > GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>, > currently in the akpm branch. > > RFC because of unfamiliarity with arch ARM, and that at91sam9rl, > at91rm9200, and lpd270 (so far, at least) builds fail with: > > include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2' > > I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4 > 20120303 (prerelease) > > arch/arm/Kconfig | 1 + > include/linux/compiler-gcc4.h | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index eda8711..437d11a 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -3,6 +3,7 @@ config ARM > default y > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > + select ARCH_USE_BUILTIN_BSWAP > select ARCH_HAVE_CUSTOM_GPIO_H > select ARCH_WANT_IPC_PARSE_VERSION > select BUILDTIME_EXTABLE_SORT if MMU > diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h > index 68b162d..da5f728 100644 > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -67,7 +67,8 @@ > > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > -#if GCC_VERSION >= 40400 > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \ > + (defined(__arm__) && GCC_VERSION >= 40600) > #define __HAVE_BUILTIN_BSWAP32__ > #define __HAVE_BUILTIN_BSWAP64__ > #endif > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] arm: use built-in byte swap function 2013-01-29 14:53 ` Rob Herring @ 2013-01-29 15:10 ` Woodhouse, David 0 siblings, 0 replies; 62+ messages in thread From: Woodhouse, David @ 2013-01-29 15:10 UTC (permalink / raw) To: Rob Herring Cc: Kim Phillips, Russell King, Andrew Morton, Rusty Russell, linux-kernel, Daniel Santos, Borislav Petkov, David Rientjes, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 981 bytes --] On Tue, 2013-01-29 at 08:53 -0600, Rob Herring wrote: > If you specify to use the builtin's, do you still get inline rev > instructions emitted? You mean from the architecture's __arch_swab32() et al. macros? No. If the architecture enables ARCH_USE_BUILTIN_BSWAP and the compiler version checks indicate that the correspondingly-sized builtin is available, then the builtin will be used. Only if the arch *doesn't* set ARCH_USE_BUILTIN_BSWAP, or if the compiler is old enough not to have the corresponding intrinsic, will the inline assembler in the __arch_swabXX macros get used. Note, however, that there is no such compiler intrinsic for swahb32(), which is where rev16 gets used. That's still being left to the inline asm in all cases. -- Sent with MeeGo's ActiveSync support. David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 4370 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2013-11-05 21:50 UTC | newest] Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-01-29 1:30 [RFC] arm: use built-in byte swap function Kim Phillips 2013-01-29 8:35 ` Borislav Petkov 2013-01-29 16:46 ` Woodhouse, David 2013-01-29 17:42 ` Borislav Petkov 2013-01-29 17:55 ` Woodhouse, David 2013-01-29 18:10 ` Borislav Petkov 2013-01-30 10:22 ` Woodhouse, David 2013-01-31 2:09 ` Kim Phillips 2013-01-31 6:44 ` Borislav Petkov 2013-01-31 9:28 ` Russell King - ARM Linux 2013-01-31 20:59 ` Kim Phillips 2013-01-31 21:33 ` Borislav Petkov 2013-01-31 22:11 ` Woodhouse, David 2013-02-01 0:37 ` [PATCH v4] " Kim Phillips 2013-02-01 10:46 ` Russell King - ARM Linux 2013-02-01 1:17 ` [RFC] " Russell King - ARM Linux 2013-02-01 7:33 ` Woodhouse, David 2013-02-06 3:04 ` Kim Phillips 2013-02-06 9:02 ` Woodhouse, David 2013-02-07 1:19 ` Kim Phillips 2013-02-07 10:19 ` Will Newton 2013-02-07 10:43 ` Catalin Marinas 2013-02-07 18:13 ` Russell King - ARM Linux 2013-02-08 17:25 ` Woodhouse, David 2013-02-08 20:04 ` Nicolas Pitre 2013-02-08 22:40 ` Woodhouse, David 2013-02-08 22:47 ` Nicolas Pitre 2013-02-09 1:12 ` Kim Phillips 2013-02-09 3:16 ` Nicolas Pitre 2013-02-20 2:31 ` Kim Phillips 2013-02-20 2:38 ` Stephen Boyd 2013-02-20 3:17 ` Nicolas Pitre 2013-02-20 10:38 ` Woodhouse, David 2013-02-20 13:36 ` Nicolas Pitre 2013-02-20 13:44 ` Woodhouse, David 2013-02-20 14:06 ` Nicolas Pitre 2013-02-20 14:53 ` Woodhouse, David 2013-02-20 15:43 ` Nicolas Pitre 2013-02-21 3:49 ` Kim Phillips 2013-02-21 4:29 ` Nicolas Pitre 2013-02-21 6:52 ` Kim Phillips 2013-02-21 16:40 ` Nicolas Pitre 2013-02-22 2:33 ` Kim Phillips 2013-02-22 3:40 ` Nicolas Pitre 2013-02-23 1:40 ` [PATCH v6] " Kim Phillips 2013-02-23 2:40 ` Nicolas Pitre 2013-02-23 23:20 ` Woodhouse, David 2013-05-23 16:46 ` [PATCH v7] " Kim Phillips 2013-05-23 20:09 ` Nicolas Pitre 2013-05-23 23:13 ` Russell King - ARM Linux 2013-06-06 22:12 ` Russell King - ARM Linux 2013-06-06 22:23 ` Borislav Petkov 2013-06-07 0:03 ` Stephen Rothwell 2013-10-27 2:41 ` Nicolas Pitre 2013-11-05 21:45 ` Kim Phillips 2013-02-21 16:37 ` [RFC] " Woodhouse, David 2013-02-21 17:27 ` Nicolas Pitre 2013-03-13 13:35 ` Woodhouse, David 2013-01-29 14:13 ` Russell King - ARM Linux 2013-01-29 14:43 ` Woodhouse, David 2013-01-29 14:53 ` Rob Herring 2013-01-29 15:10 ` Woodhouse, David
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).