* [PATCH 0 of 3] MMIO 32-bit copy routine, the final frontier @ 2006-01-11 22:39 Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 1 of 3] Introduce __raw_memcpy_toio32 Bryan O'Sullivan ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-11 22:39 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak [-- Attachment #1: Type: text/plain, Size: 722 bytes --] After yet more review comments from several people, here is a reworked set of 32-bit MMIO copy patches. This may even be the final set. These define the generic __raw_memcpy_toio32 as a weak symbol, which arches are free to override. We provide a specialised implementation for x86_64. These patches should apply cleanly against current -git, and have been tested on i386 and x86_64. The patch series is as follows: raw_memcpy_io.patch Introduce the generic MMIO 32-bit copy routine. x86_64-memcpy32.patch Add memcpy32 routine to x86_64. arch-specific-raw_memcpy_io.patch Get each arch to use generic memcpy_io code, except x86_64, which uses memcpy32. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1 of 3] Introduce __raw_memcpy_toio32 2006-01-11 22:39 [PATCH 0 of 3] MMIO 32-bit copy routine, the final frontier Bryan O'Sullivan @ 2006-01-11 22:39 ` Bryan O'Sullivan 2006-01-11 23:43 ` Andrew Morton 2006-01-11 22:39 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch Bryan O'Sullivan 2 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-11 22:39 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak This arch-independent routine copies data to a memory-mapped I/O region, using 32-bit accesses. It does not guarantee access ordering, nor does it perform a memory barrier afterwards. This style of access is required by some devices. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r c90267e4a29b -r 05b3d1af27eb lib/Makefile --- a/lib/Makefile Wed Jan 11 13:31:24 2006 +0800 +++ b/lib/Makefile Wed Jan 11 14:35:45 2006 -0800 @@ -5,7 +5,7 @@ lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \ bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \ idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \ - sha1.o + sha1.o raw_memcpy_io.o lib-y += kobject.o kref.o kobject_uevent.o klist.o diff -r c90267e4a29b -r 05b3d1af27eb lib/raw_memcpy_io.c --- /dev/null Thu Jan 1 00:00:00 1970 +0000 +++ b/lib/raw_memcpy_io.c Wed Jan 11 14:35:45 2006 -0800 @@ -0,0 +1,41 @@ +/* + * Copyright 2006 PathScale, Inc. All Rights Reserved. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include <linux/module.h> +#include <asm/io.h> + +/** + * __raw_memcpy_toio32 - copy data to MMIO space, in 32-bit units + * @to: destination, in MMIO space (must be 32-bit aligned) + * @from: source (must be 32-bit aligned) + * @count: number of 32-bit quantities to copy + * + * Copy data from kernel space to MMIO space, in units of 32 bits at a + * time. Order of access is not guaranteed, nor is a memory barrier + * performed afterwards. + */ +void __attribute__((weak)) __raw_memcpy_toio32(void __iomem *to, + const void *from, size_t count) +{ + u32 __iomem *dst = to; + const u32 *src = from; + const u32 *end = src + count; + + while (src < end) + __raw_writel(*src++, dst++); +} +EXPORT_SYMBOL_GPL(__raw_memcpy_toio32); ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32 2006-01-11 22:39 ` [PATCH 1 of 3] Introduce __raw_memcpy_toio32 Bryan O'Sullivan @ 2006-01-11 23:43 ` Andrew Morton 0 siblings, 0 replies; 24+ messages in thread From: Andrew Morton @ 2006-01-11 23:43 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: linux-kernel, hch, ak "Bryan O'Sullivan" <bos@pathscale.com> wrote: > > lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \ > bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \ > idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \ > - sha1.o > + sha1.o raw_memcpy_io.o You'll find that if nothing in vmlinux references __raw_memcpy_toio32 then this file won't be included in vmlinux and __raw_memcpy_toio32 won't be available to modules. I'll move this to obj-y, which does the right thing. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2 of 3] memcpy32 for x86_64 2006-01-11 22:39 [PATCH 0 of 3] MMIO 32-bit copy routine, the final frontier Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 1 of 3] Introduce __raw_memcpy_toio32 Bryan O'Sullivan @ 2006-01-11 22:39 ` Bryan O'Sullivan 2006-01-11 23:45 ` Roland Dreier 2006-01-11 22:39 ` [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch Bryan O'Sullivan 2 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-11 22:39 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak Introduce an x86_64-specific memcpy32 routine. The routine is similar to memcpy, but is guaranteed to work in units of 32 bits at a time. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r 05b3d1af27eb -r 1052904816d7 arch/x86_64/kernel/x8664_ksyms.c --- a/arch/x86_64/kernel/x8664_ksyms.c Wed Jan 11 14:35:45 2006 -0800 +++ b/arch/x86_64/kernel/x8664_ksyms.c Wed Jan 11 14:35:45 2006 -0800 @@ -163,6 +163,8 @@ EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(__memcpy); +EXPORT_SYMBOL_GPL(memcpy32); + #ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM /* prototypes are wrong, these are assembly with custom calling functions */ extern void rwsem_down_read_failed_thunk(void); diff -r 05b3d1af27eb -r 1052904816d7 arch/x86_64/lib/Makefile --- a/arch/x86_64/lib/Makefile Wed Jan 11 14:35:45 2006 -0800 +++ b/arch/x86_64/lib/Makefile Wed Jan 11 14:35:45 2006 -0800 @@ -9,4 +9,4 @@ lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ thunk.o clear_page.o copy_page.o bitstr.o bitops.o -lib-y += memcpy.o memmove.o memset.o copy_user.o +lib-y += memcpy.o memcpy32.o memmove.o memset.o copy_user.o diff -r 05b3d1af27eb -r 1052904816d7 include/asm-x86_64/string.h --- a/include/asm-x86_64/string.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-x86_64/string.h Wed Jan 11 14:35:45 2006 -0800 @@ -45,6 +45,9 @@ #define __HAVE_ARCH_MEMMOVE void * memmove(void * dest,const void *src,size_t count); +/* copy data, 32 bits at a time */ +void memcpy32(void *dst, const void *src, size_t count); + /* Use C out of line version for memcmp */ #define memcmp __builtin_memcmp int memcmp(const void * cs,const void * ct,size_t count); diff -r 05b3d1af27eb -r 1052904816d7 arch/x86_64/lib/memcpy32.S --- /dev/null Thu Jan 1 00:00:00 1970 +0000 +++ b/arch/x86_64/lib/memcpy32.S Wed Jan 11 14:35:45 2006 -0800 @@ -0,0 +1,32 @@ +/* + * Copyright 2006 PathScale, Inc. All Rights Reserved. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +/** + * memcpy32 - copy data, in units of 32 bits at a time + * @dst: destination (must be 32-bit aligned) + * @src: source (must be 32-bit aligned) + * @count: number of 32-bit quantities to copy + */ + .globl memcpy32 +memcpy32: + movl %edx,%ecx + shrl $1,%ecx + andl $1,%edx + rep movsq + movl %edx,%ecx + rep movsd + ret ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-11 22:39 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan @ 2006-01-11 23:45 ` Roland Dreier 2006-01-12 0:03 ` Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Roland Dreier @ 2006-01-11 23:45 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: akpm, linux-kernel, hch, ak > +/** > + * memcpy32 - copy data, in units of 32 bits at a time > + * @dst: destination (must be 32-bit aligned) > + * @src: source (must be 32-bit aligned) > + * @count: number of 32-bit quantities to copy > + */ > + .globl memcpy32 > +memcpy32: > + movl %edx,%ecx > + shrl $1,%ecx > + andl $1,%edx > + rep movsq > + movl %edx,%ecx > + rep movsd > + ret Sorry to keep this going still further, but I'm still confused. Why can't this assembly just define __raw_memcpy_toio32() directly? In other words, Why do we need to introduce the indirection of having a stub in C that calls the memcpy32 assembly routine? Is there some reason having to do with linker magic and weak symbols? Could it be solved by using gcc inline assembly rather than putting the assembly in a .S file? Also why does memcpy32() need to be exported? There are no users other than the x86_64 version of __raw_memcpy_toio32(), and memcpy32() doesn't seem like an API we want to add to every arch anyway. - R. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-11 23:45 ` Roland Dreier @ 2006-01-12 0:03 ` Bryan O'Sullivan 0 siblings, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-12 0:03 UTC (permalink / raw) To: Roland Dreier; +Cc: akpm, linux-kernel, hch, ak On Wed, 2006-01-11 at 15:45 -0800, Roland Dreier wrote: > Sorry to keep this going still further, but I'm still confused. Why > can't this assembly just define __raw_memcpy_toio32() directly? It certainly can. I've just been buried in this bloody thing for a little too long. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch 2006-01-11 22:39 [PATCH 0 of 3] MMIO 32-bit copy routine, the final frontier Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 1 of 3] Introduce __raw_memcpy_toio32 Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan @ 2006-01-11 22:39 ` Bryan O'Sullivan 2006-01-11 23:46 ` Andrew Morton 2 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-11 22:39 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak Most arches use the generic routine. x86_64 uses memcpy32 instead; this is substantially faster, even over a bus that is much slower than the CPU. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r 1052904816d7 -r ee6ce7e55dc7 arch/x86_64/lib/io.c --- a/arch/x86_64/lib/io.c Wed Jan 11 14:35:45 2006 -0800 +++ b/arch/x86_64/lib/io.c Wed Jan 11 14:35:45 2006 -0800 @@ -21,3 +21,9 @@ memset((void *)a,b,c); } EXPORT_SYMBOL(memset_io); + +/* override generic definition in lib/raw_memcpy_io.c */ +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count) +{ + memcpy32((void __force *) to, from, count); +} diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-alpha/io.h --- a/include/asm-alpha/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-alpha/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -504,6 +504,8 @@ extern void memcpy_toio(volatile void __iomem *, const void *, long); extern void _memset_c_io(volatile void __iomem *, unsigned long, long); +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + static inline void memset_io(volatile void __iomem *addr, u8 c, long len) { _memset_c_io(addr, 0x0101010101010101UL * c, len); diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-arm/io.h --- a/include/asm-arm/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-arm/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -189,6 +189,8 @@ #define memset_io(c,v,l) _memset_io(__mem_pci(c),(v),(l)) #define memcpy_fromio(a,c,l) _memcpy_fromio((a),__mem_pci(c),(l)) #define memcpy_toio(c,a,l) _memcpy_toio(__mem_pci(c),(a),(l)) + +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); #define eth_io_copy_and_sum(s,c,l,b) \ eth_copy_and_sum((s),__mem_pci(c),(l),(b)) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-cris/io.h --- a/include/asm-cris/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-cris/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -121,6 +121,8 @@ #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c)) #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* * Again, CRIS does not require mem IO specific function. */ diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-frv/io.h --- a/include/asm-frv/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-frv/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -127,6 +127,8 @@ memcpy((void __force *) dst, src, count); } +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + static inline uint8_t inb(unsigned long addr) { return __builtin_read8((void *)addr); diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-h8300/io.h --- a/include/asm-h8300/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-h8300/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -209,6 +209,8 @@ #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c)) #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define mmiowb() #define inb(addr) ((h8300_buswidth(addr))?readw((addr) & ~1) & 0xff:readb(addr)) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-i386/io.h --- a/include/asm-i386/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-i386/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -203,6 +203,8 @@ { __memcpy((void __force *) dst, src, count); } + +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); /* * ISA space is 'always mapped' on a typical x86 system, no need to diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-ia64/io.h --- a/include/asm-ia64/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-ia64/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -444,6 +444,8 @@ extern void memcpy_toio(volatile void __iomem *dst, const void *src, long n); extern void memset_io(volatile void __iomem *s, int c, long n); +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define dma_cache_inv(_start,_size) do { } while (0) #define dma_cache_wback(_start,_size) do { } while (0) #define dma_cache_wback_inv(_start,_size) do { } while (0) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-m32r/io.h --- a/include/asm-m32r/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-m32r/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -216,6 +216,8 @@ memcpy((void __force *) dst, src, count); } +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem * access diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-m68knommu/io.h --- a/include/asm-m68knommu/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-m68knommu/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -113,6 +113,8 @@ #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c)) #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define inb(addr) readb(addr) #define inw(addr) readw(addr) #define inl(addr) readl(addr) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-mips/io.h --- a/include/asm-mips/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-mips/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -534,6 +534,8 @@ memcpy((void __force *) dst, src, count); } +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* * Memory Mapped I/O */ diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-parisc/io.h --- a/include/asm-parisc/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-parisc/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -294,6 +294,8 @@ void memcpy_fromio(void *dst, const volatile void __iomem *src, int count); void memcpy_toio(volatile void __iomem *dst, const void *src, int count); +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* Support old drivers which don't ioremap. * NB this interface is scheduled to disappear in 2.5 */ diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-powerpc/io.h --- a/include/asm-powerpc/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-powerpc/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -64,6 +64,8 @@ #define memcpy_fromio(a,b,c) iSeries_memcpy_fromio((a), (b), (c)) #define memcpy_toio(a,b,c) iSeries_memcpy_toio((a), (b), (c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define inb(addr) readb(((void __iomem *)(long)(addr))) #define inw(addr) readw(((void __iomem *)(long)(addr))) #define inl(addr) readl(((void __iomem *)(long)(addr))) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-ppc/io.h --- a/include/asm-ppc/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-ppc/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -369,6 +369,8 @@ } #endif +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define eth_io_copy_and_sum(a,b,c,d) eth_copy_and_sum((a),(void __force *)(void __iomem *)(b),(c),(d)) /* diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-s390/io.h --- a/include/asm-s390/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-s390/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -99,6 +99,8 @@ #define memcpy_fromio(a,b,c) memcpy((a),__io_virt(b),(c)) #define memcpy_toio(a,b,c) memcpy(__io_virt(a),(b),(c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #define inb_p(addr) readb(addr) #define inb(addr) readb(addr) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-sh/io.h --- a/include/asm-sh/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-sh/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -177,6 +177,8 @@ extern void memcpy_toio(unsigned long, const void *, unsigned long); extern void memset_io(unsigned long, int, unsigned long); +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* SuperH on-chip I/O functions */ static __inline__ unsigned char ctrl_inb(unsigned long addr) { diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-sh64/io.h --- a/include/asm-sh64/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-sh64/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -125,6 +125,8 @@ void memcpy_toio(void __iomem *to, const void *from, long count); void memcpy_fromio(void *to, void __iomem *from, long count); + +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); #define mmiowb() diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-sparc/io.h --- a/include/asm-sparc/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-sparc/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -239,6 +239,8 @@ #define memcpy_toio(d,s,sz) _memcpy_toio(d,s,sz) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + #ifdef __KERNEL__ /* diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-sparc64/io.h --- a/include/asm-sparc64/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-sparc64/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -440,6 +440,8 @@ #define memcpy_toio(d,s,sz) _memcpy_toio(d,s,sz) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + static inline int check_signature(void __iomem *io_addr, const unsigned char *signature, int length) diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-v850/io.h --- a/include/asm-v850/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-v850/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -130,6 +130,8 @@ #define memcpy_fromio(dst, src, len) memcpy (dst, (void *)src, len) #define memcpy_toio(dst, src, len) memcpy ((void *)dst, src, len) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* * Convert a physical pointer to a virtual kernel pointer for /dev/mem * access diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-x86_64/io.h --- a/include/asm-x86_64/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-x86_64/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -252,6 +252,8 @@ __memcpy_toio((unsigned long)to,from,len); } +void __raw_memcpy_toio32(void __iomem *dst, const void *src, size_t count); + void memset_io(volatile void __iomem *a, int b, size_t c); /* diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-xtensa/io.h --- a/include/asm-xtensa/io.h Wed Jan 11 14:35:45 2006 -0800 +++ b/include/asm-xtensa/io.h Wed Jan 11 14:35:45 2006 -0800 @@ -159,6 +159,8 @@ #define memcpy_fromio(a,b,c) memcpy((a),(void *)(b),(c)) #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c)) +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); + /* At this point the Xtensa doesn't provide byte swap instructions */ #ifdef __XTENSA_EB__ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch 2006-01-11 22:39 ` [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch Bryan O'Sullivan @ 2006-01-11 23:46 ` Andrew Morton 2006-01-12 0:05 ` Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2006-01-11 23:46 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: linux-kernel, hch, ak "Bryan O'Sullivan" <bos@pathscale.com> wrote: > > Most arches use the generic routine. x86_64 uses memcpy32 instead; > this is substantially faster, even over a bus that is much slower than > the CPU. > > Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> > > diff -r 1052904816d7 -r ee6ce7e55dc7 arch/x86_64/lib/io.c > --- a/arch/x86_64/lib/io.c Wed Jan 11 14:35:45 2006 -0800 > +++ b/arch/x86_64/lib/io.c Wed Jan 11 14:35:45 2006 -0800 > @@ -21,3 +21,9 @@ > memset((void *)a,b,c); > } > EXPORT_SYMBOL(memset_io); > + > +/* override generic definition in lib/raw_memcpy_io.c */ > +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count) > +{ > + memcpy32((void __force *) to, from, count); > +} > diff -r 1052904816d7 -r ee6ce7e55dc7 include/asm-alpha/io.h > --- a/include/asm-alpha/io.h Wed Jan 11 14:35:45 2006 -0800 > +++ b/include/asm-alpha/io.h Wed Jan 11 14:35:45 2006 -0800 > @@ -504,6 +504,8 @@ > extern void memcpy_toio(volatile void __iomem *, const void *, long); > extern void _memset_c_io(volatile void __iomem *, unsigned long, long); > > +void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); > + <etc> How's about we add a new linux/io.h which does: #include <asm/io.h> void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch 2006-01-11 23:46 ` Andrew Morton @ 2006-01-12 0:05 ` Bryan O'Sullivan 2006-01-12 0:13 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-12 0:05 UTC (permalink / raw) To: Andrew Morton; +Cc: rdreier, linux-kernel, hch, ak On Wed, 2006-01-11 at 15:46 -0800, Andrew Morton wrote: > How's about we add a new linux/io.h which does: > > #include <asm/io.h> > void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); I thought about this, and about moving other duplicated definitions from asm-*/io.h in here, but I couldn't find any other obvious candidates, so I wasn't anxious to introduce a new file. If you think that's OK, though, it obviously makes the patch a lot smaller, and gives a common place to put future cross-arch definitions. I'll run another spin of the patch with your and Roland's suggestions. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch 2006-01-12 0:05 ` Bryan O'Sullivan @ 2006-01-12 0:13 ` Andrew Morton 2006-01-12 0:21 ` Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2006-01-12 0:13 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: rdreier, linux-kernel, hch, ak "Bryan O'Sullivan" <bos@pathscale.com> wrote: > > On Wed, 2006-01-11 at 15:46 -0800, Andrew Morton wrote: > > > How's about we add a new linux/io.h which does: > > > > #include <asm/io.h> > > void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count); > > I thought about this, and about moving other duplicated definitions from > asm-*/io.h in here, but I couldn't find any other obvious candidates, so > I wasn't anxious to introduce a new file. > Well it's obviously better than duplicating the thing. There are other common things which can be hoisted to linux/io.h, but if we do that then zillions of .c files need to be changed to include linux/io.h rather than asm/io.h. That's a good janitorial thing to do, but I doubt if you want to do it ;) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch 2006-01-12 0:13 ` Andrew Morton @ 2006-01-12 0:21 ` Bryan O'Sullivan 0 siblings, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-12 0:21 UTC (permalink / raw) To: Andrew Morton; +Cc: rdreier, linux-kernel, hch, ak On Wed, 2006-01-11 at 16:13 -0800, Andrew Morton wrote: > There are other common things which can be hoisted to linux/io.h, but if we > do that then zillions of .c files need to be changed to include linux/io.h > rather than asm/io.h. Right. > That's a good janitorial thing to do, but I doubt if > you want to do it ;) Not as part of these patches, anyway. They've left me a dried-up husk. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0 of 3] 32-bit MMIO copy routines, reworked @ 2006-01-10 19:53 Bryan O'Sullivan 2006-01-10 19:53 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-10 19:53 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak, rdreier [-- Attachment #1: Type: text/plain, Size: 752 bytes --] After some more review comments from Roland, Andrew and Chris Hellwig, here is a reworked set of 32-bit MMIO copy patches. These use CONFIG_RAW_MEMCPY_IO to determine whether an arch should use the generic __raw_memcpy_toio32 routine or its own specialised version. We provide a specialised implementation for x86_64. These patches should apply cleanly against current -git, and have been tested on i386 and x86_64. The patch series is as follows: raw_memcpy_io.patch Introduce the generic MMIO 32-bit copy routine. x86_64-memcpy32.patch Add memcpy32 routine to x86_64. arch-specific-raw_memcpy_io.patch Get each arch to use generic memcpy_io code, except x86_64, which uses memcpy32. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2 of 3] memcpy32 for x86_64 2006-01-10 19:53 [PATCH 0 of 3] 32-bit MMIO copy routines, reworked Bryan O'Sullivan @ 2006-01-10 19:53 ` Bryan O'Sullivan 2006-01-12 8:38 ` Denis Vlasenko 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-10 19:53 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, hch, ak, rdreier Introduce an x86_64-specific memcpy32 routine. The routine is similar to memcpy, but is guaranteed to work in units of 32 bits at a time. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/kernel/x8664_ksyms.c --- a/arch/x86_64/kernel/x8664_ksyms.c Tue Jan 10 11:52:46 2006 -0800 +++ b/arch/x86_64/kernel/x8664_ksyms.c Tue Jan 10 11:52:48 2006 -0800 @@ -164,6 +164,8 @@ EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(__memcpy); +EXPORT_SYMBOL_GPL(memcpy32); + #ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM /* prototypes are wrong, these are assembly with custom calling functions */ extern void rwsem_down_read_failed_thunk(void); diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/lib/Makefile --- a/arch/x86_64/lib/Makefile Tue Jan 10 11:52:46 2006 -0800 +++ b/arch/x86_64/lib/Makefile Tue Jan 10 11:52:48 2006 -0800 @@ -9,4 +9,4 @@ lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ thunk.o clear_page.o copy_page.o bitstr.o bitops.o -lib-y += memcpy.o memmove.o memset.o copy_user.o +lib-y += memcpy.o memcpy32.o memmove.o memset.o copy_user.o diff -r 2d4af213d9c5 -r b4863171295f include/asm-x86_64/string.h --- a/include/asm-x86_64/string.h Tue Jan 10 11:52:46 2006 -0800 +++ b/include/asm-x86_64/string.h Tue Jan 10 11:52:48 2006 -0800 @@ -45,6 +45,9 @@ #define __HAVE_ARCH_MEMMOVE void * memmove(void * dest,const void *src,size_t count); +/* copy data, 32 bits at a time */ +void memcpy32(void *dst, const void *src, size_t count); + /* Use C out of line version for memcmp */ #define memcmp __builtin_memcmp int memcmp(const void * cs,const void * ct,size_t count); diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/lib/memcpy32.S --- /dev/null Thu Jan 1 00:00:00 1970 +0000 +++ b/arch/x86_64/lib/memcpy32.S Tue Jan 10 11:52:48 2006 -0800 @@ -0,0 +1,39 @@ +/* + * Copyright 2006 PathScale, Inc. All Rights Reserved. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +/* + * Registers used below: + * dst - rdi + * src - rsi + * count - rdx + */ + +/** + * memcpy32 - copy data, in units of 32 bits at a time + * @dst: destination (must be 32-bit aligned) + * @src: source (must be 32-bit aligned) + * @count: number of 32-bit quantities to copy + */ + .globl memcpy32 +memcpy32: + movl %edx,%ecx + shrl $1,%ecx + andl $1,%edx + rep movsq + movl %edx,%ecx + rep movsd + ret ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-10 19:53 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan @ 2006-01-12 8:38 ` Denis Vlasenko 2006-01-12 16:04 ` Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Denis Vlasenko @ 2006-01-12 8:38 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: akpm, linux-kernel, hch, ak, rdreier On Tuesday 10 January 2006 21:53, Bryan O'Sullivan wrote: > Introduce an x86_64-specific memcpy32 routine. The routine is similar > to memcpy, but is guaranteed to work in units of 32 bits at a time. > > Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> > > diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/kernel/x8664_ksyms.c > --- a/arch/x86_64/kernel/x8664_ksyms.c Tue Jan 10 11:52:46 2006 -0800 > +++ b/arch/x86_64/kernel/x8664_ksyms.c Tue Jan 10 11:52:48 2006 -0800 > @@ -164,6 +164,8 @@ > EXPORT_SYMBOL(memcpy); > EXPORT_SYMBOL(__memcpy); > > +EXPORT_SYMBOL_GPL(memcpy32); > + > #ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM > /* prototypes are wrong, these are assembly with custom calling functions */ > extern void rwsem_down_read_failed_thunk(void); > diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/lib/Makefile > --- a/arch/x86_64/lib/Makefile Tue Jan 10 11:52:46 2006 -0800 > +++ b/arch/x86_64/lib/Makefile Tue Jan 10 11:52:48 2006 -0800 > @@ -9,4 +9,4 @@ > lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ > usercopy.o getuser.o putuser.o \ > thunk.o clear_page.o copy_page.o bitstr.o bitops.o > -lib-y += memcpy.o memmove.o memset.o copy_user.o > +lib-y += memcpy.o memcpy32.o memmove.o memset.o copy_user.o > diff -r 2d4af213d9c5 -r b4863171295f include/asm-x86_64/string.h > --- a/include/asm-x86_64/string.h Tue Jan 10 11:52:46 2006 -0800 > +++ b/include/asm-x86_64/string.h Tue Jan 10 11:52:48 2006 -0800 > @@ -45,6 +45,9 @@ > #define __HAVE_ARCH_MEMMOVE > void * memmove(void * dest,const void *src,size_t count); > > +/* copy data, 32 bits at a time */ > +void memcpy32(void *dst, const void *src, size_t count); > + > /* Use C out of line version for memcmp */ > #define memcmp __builtin_memcmp > int memcmp(const void * cs,const void * ct,size_t count); > diff -r 2d4af213d9c5 -r b4863171295f arch/x86_64/lib/memcpy32.S > --- /dev/null Thu Jan 1 00:00:00 1970 +0000 > +++ b/arch/x86_64/lib/memcpy32.S Tue Jan 10 11:52:48 2006 -0800 > @@ -0,0 +1,39 @@ > +/* > + * Copyright 2006 PathScale, Inc. All Rights Reserved. > + * > + * This file is free software; you can redistribute it and/or modify > + * it under the terms of version 2 of the GNU General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software Foundation, > + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. > + */ > + > +/* > + * Registers used below: > + * dst - rdi > + * src - rsi > + * count - rdx > + */ > + > +/** > + * memcpy32 - copy data, in units of 32 bits at a time > + * @dst: destination (must be 32-bit aligned) > + * @src: source (must be 32-bit aligned) > + * @count: number of 32-bit quantities to copy > + */ > + .globl memcpy32 > +memcpy32: > + movl %edx,%ecx > + shrl $1,%ecx > + andl $1,%edx > + rep movsq > + movl %edx,%ecx > + rep movsd > + ret movsq is not a 32bit move, it's a 64 bit one. There are three possibilities here: 1) I misunderstand what memcpy32 means (I understand it like "it guarantees that all accesses will be strictly 32bit") 2) On all current x86_64 hardware each 64bit access from/to IO mapped addresses is always converted to two 32bit accesses. 3) code is buggy If it is (1) or (2), consider adding a comment to clear future reader's confusion. -- vda ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-12 8:38 ` Denis Vlasenko @ 2006-01-12 16:04 ` Bryan O'Sullivan 2006-01-13 9:56 ` Chris Wedgwood 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-12 16:04 UTC (permalink / raw) To: Denis Vlasenko; +Cc: akpm, linux-kernel, hch, ak, rdreier On Thu, 2006-01-12 at 10:38 +0200, Denis Vlasenko wrote: > 2) On all current x86_64 hardware each 64bit access from/to > IO mapped addresses is always converted to two 32bit accesses. This is true for 64-bit writes over Hypertransport (reads don't get split up this way), but not for PCI-Express memory writes, which remain atomic 64-bit. I'll be converting the 64-bit accesses to 32-bit, as you and Andi suggested. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-12 16:04 ` Bryan O'Sullivan @ 2006-01-13 9:56 ` Chris Wedgwood 2006-01-13 10:24 ` Denis Vlasenko 0 siblings, 1 reply; 24+ messages in thread From: Chris Wedgwood @ 2006-01-13 9:56 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: Denis Vlasenko, akpm, linux-kernel, hch, ak, rdreier On Thu, Jan 12, 2006 at 08:04:41AM -0800, Bryan O'Sullivan wrote: > This is true for 64-bit writes over Hypertransport is this something that will always be or just something current hardware does? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-13 9:56 ` Chris Wedgwood @ 2006-01-13 10:24 ` Denis Vlasenko 2006-01-13 16:21 ` Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Denis Vlasenko @ 2006-01-13 10:24 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Bryan O'Sullivan, akpm, linux-kernel, hch, ak, rdreier On Friday 13 January 2006 11:56, Chris Wedgwood wrote: > On Thu, Jan 12, 2006 at 08:04:41AM -0800, Bryan O'Sullivan wrote: > > > This is true for 64-bit writes over Hypertransport > > is this something that will always be or just something current > hardware does? Yes, why risking that things will go wrong? Also you'll get shorter code. Instead of > + .globl memcpy32 > +memcpy32: > + movl %edx,%ecx > + shrl $1,%ecx > + andl $1,%edx > + rep movsq > + movl %edx,%ecx > + rep movsd > + ret you need just .globl memcpy32 memcpy32: movl %edx,%ecx rep movsd ret With properly written inlined asms code will be reduced to just "rep movsd". -- vda ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-13 10:24 ` Denis Vlasenko @ 2006-01-13 16:21 ` Bryan O'Sullivan 0 siblings, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-13 16:21 UTC (permalink / raw) To: Denis Vlasenko; +Cc: Chris Wedgwood, akpm, linux-kernel, hch, ak, rdreier On Fri, 2006-01-13 at 12:24 +0200, Denis Vlasenko wrote: > you need just > > .globl memcpy32 > memcpy32: > movl %edx,%ecx > rep movsd > ret This is what the current version of the patches in -mm does. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0 of 3] 32-bit MMIO copy routine @ 2006-01-06 20:26 Bryan O'Sullivan 2006-01-06 20:26 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-06 20:26 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 915 bytes --] Following some discussion with Roland, and patterned after the style anointed by Linus last week, here is a new version of the 32-bit MMIO copy routine needed by our InfiniPath device. The name of the routine has changed from memcpy_toio32 to __raw_memcpy_toio32. This reflects the basic nature of the routine; it dodes not guarantee the order in which writes are performed, nor does it perform a memory barrier after it is done. The reason for this is that our chip treats the first and last writes to some MMIO regions specially; our driver performs those directly using writel, and uses __raw_memcpy_toio32 for the bits in between. Regarding the specialised x86_64 implementation, Andi Kleen asked me to perform some measurements of its performance impact. It makes a difference of about 5% in performance on moderately large copies over the HyperTransport bus, compared to the generic implementation. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2 of 3] memcpy32 for x86_64 2006-01-06 20:26 [PATCH 0 of 3] 32-bit MMIO copy routine Bryan O'Sullivan @ 2006-01-06 20:26 ` Bryan O'Sullivan 0 siblings, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-06 20:26 UTC (permalink / raw) To: linux-kernel Introduce an x86_64-specific memcpy32 routine. The routine is similar to memcpy, but is guaranteed to work in units of 32 bits at a time. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r d286502c3b3c -r 33790477a163 arch/x86_64/kernel/x8664_ksyms.c --- a/arch/x86_64/kernel/x8664_ksyms.c Fri Jan 6 12:25:00 2006 -0800 +++ b/arch/x86_64/kernel/x8664_ksyms.c Fri Jan 6 12:25:02 2006 -0800 @@ -164,6 +164,8 @@ EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(__memcpy); +EXPORT_SYMBOL_GPL(memcpy32); + #ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM /* prototypes are wrong, these are assembly with custom calling functions */ extern void rwsem_down_read_failed_thunk(void); diff -r d286502c3b3c -r 33790477a163 arch/x86_64/lib/Makefile --- a/arch/x86_64/lib/Makefile Fri Jan 6 12:25:00 2006 -0800 +++ b/arch/x86_64/lib/Makefile Fri Jan 6 12:25:02 2006 -0800 @@ -9,4 +9,4 @@ lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ thunk.o clear_page.o copy_page.o bitstr.o bitops.o -lib-y += memcpy.o memmove.o memset.o copy_user.o +lib-y += memcpy.o memcpy32.o memmove.o memset.o copy_user.o diff -r d286502c3b3c -r 33790477a163 include/asm-x86_64/string.h --- a/include/asm-x86_64/string.h Fri Jan 6 12:25:00 2006 -0800 +++ b/include/asm-x86_64/string.h Fri Jan 6 12:25:02 2006 -0800 @@ -45,6 +45,15 @@ #define __HAVE_ARCH_MEMMOVE void * memmove(void * dest,const void *src,size_t count); +/* + * memcpy32 - copy data, 32 bits at a time + * + * @dst: destination (must be 32-bit aligned) + * @src: source (must be 32-bit aligned) + * @count: number of 32-bit quantities to copy + */ +void memcpy32(void *dst, const void *src, size_t count); + /* Use C out of line version for memcmp */ #define memcmp __builtin_memcmp int memcmp(const void * cs,const void * ct,size_t count); diff -r d286502c3b3c -r 33790477a163 arch/x86_64/lib/memcpy32.S --- /dev/null Thu Jan 1 00:00:00 1970 +0000 +++ b/arch/x86_64/lib/memcpy32.S Fri Jan 6 12:25:02 2006 -0800 @@ -0,0 +1,36 @@ +/* + * Copyright 2006 PathScale, Inc. All Rights Reserved. + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +/* + * memcpy32 - Copy a memory block, 32 bits at a time. + * + * This routine does not return anything. + * Input: + * rdi destination + * rsi source + * rdx count (32-bit quantities to copy) + */ + + .globl memcpy32 +memcpy32: + movl %edx,%ecx + shrl $1,%ecx + andl $1,%edx + rep movsq + movl %edx,%ecx + rep movsd + ret ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0 of 3] Add memcpy_toio32, a 32-bit MMIO copy routine @ 2005-12-27 23:41 Bryan O'Sullivan 2005-12-27 23:41 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 0 siblings, 1 reply; 24+ messages in thread From: Bryan O'Sullivan @ 2005-12-27 23:41 UTC (permalink / raw) To: linux-kernel; +Cc: mpm, akpm, hch [-- Attachment #1: Type: text/plain, Size: 452 bytes --] Following some discussion with Matt, Andrew and Chris, here is a recast of the 32-bit MMIO patch I posted the other day. The routine is now named memcpy_toio32, and is provided in generic and x86_64-optimised forms. I haven't added a memcpy_fromio32, or routines for other access sizes, because our hardware doesn't need them. If someone wants them for reasons of symmetry, I can introduce them. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2 of 3] memcpy32 for x86_64 2005-12-27 23:41 [PATCH 0 of 3] Add memcpy_toio32, a 32-bit MMIO copy routine Bryan O'Sullivan @ 2005-12-27 23:41 ` Bryan O'Sullivan 2005-12-28 4:22 ` Matt Mackall 2006-01-06 9:12 ` Pavel Machek 0 siblings, 2 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2005-12-27 23:41 UTC (permalink / raw) To: linux-kernel; +Cc: mpm, akpm, hch Introduce an x86_64-specific memcpy32 routine. The routine is similar to memcpy, but is guaranteed to work in units of 32 bits at a time. Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/kernel/x8664_ksyms.c --- a/arch/x86_64/kernel/x8664_ksyms.c Tue Dec 27 15:41:48 2005 -0800 +++ b/arch/x86_64/kernel/x8664_ksyms.c Tue Dec 27 15:41:48 2005 -0800 @@ -150,6 +150,8 @@ extern void * memcpy(void *,const void *,__kernel_size_t); extern void * __memcpy(void *,const void *,__kernel_size_t); +extern void memcpy32(void *,const void *,__kernel_size_t); + EXPORT_SYMBOL(memset); EXPORT_SYMBOL(strlen); EXPORT_SYMBOL(memmove); @@ -164,6 +166,8 @@ EXPORT_SYMBOL(memcpy); EXPORT_SYMBOL(__memcpy); +EXPORT_SYMBOL_GPL(memcpy32); + #ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM /* prototypes are wrong, these are assembly with custom calling functions */ extern void rwsem_down_read_failed_thunk(void); diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/lib/Makefile --- a/arch/x86_64/lib/Makefile Tue Dec 27 15:41:48 2005 -0800 +++ b/arch/x86_64/lib/Makefile Tue Dec 27 15:41:48 2005 -0800 @@ -9,4 +9,4 @@ lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ thunk.o clear_page.o copy_page.o bitstr.o bitops.o -lib-y += memcpy.o memmove.o memset.o copy_user.o +lib-y += memcpy.o memcpy32.o memmove.o memset.o copy_user.o diff -r 7b7b442a4d63 -r 042b7d9004ac include/asm-x86_64/string.h --- a/include/asm-x86_64/string.h Tue Dec 27 15:41:48 2005 -0800 +++ b/include/asm-x86_64/string.h Tue Dec 27 15:41:48 2005 -0800 @@ -45,6 +45,8 @@ #define __HAVE_ARCH_MEMMOVE void * memmove(void * dest,const void *src,size_t count); +void memcpy32(void *dst, const void *src, size_t count); + /* Use C out of line version for memcmp */ #define memcmp __builtin_memcmp int memcmp(const void * cs,const void * ct,size_t count); diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/lib/memcpy32.S --- /dev/null Thu Jan 1 00:00:00 1970 +0000 +++ b/arch/x86_64/lib/memcpy32.S Tue Dec 27 15:41:48 2005 -0800 @@ -0,0 +1,25 @@ +/* + * Copyright (c) 2003, 2004, 2005 PathScale, Inc. + */ + +/* + * memcpy32 - Copy a memory block, 32 bits at a time. + * + * Count is number of dwords; it need not be a qword multiple. + * Input: + * rdi destination + * rsi source + * rdx count + */ + + .globl memcpy32 +memcpy32: + movl %edx,%ecx + shrl $1,%ecx + andl $1,%edx + rep + movsq + movl %edx,%ecx + rep + movsd + ret ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2005-12-27 23:41 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan @ 2005-12-28 4:22 ` Matt Mackall 2005-12-28 7:54 ` Denis Vlasenko 2005-12-28 14:52 ` Bryan O'Sullivan 2006-01-06 9:12 ` Pavel Machek 1 sibling, 2 replies; 24+ messages in thread From: Matt Mackall @ 2005-12-28 4:22 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: linux-kernel, akpm, hch On Tue, Dec 27, 2005 at 03:41:56PM -0800, Bryan O'Sullivan wrote: > Introduce an x86_64-specific memcpy32 routine. The routine is similar > to memcpy, but is guaranteed to work in units of 32 bits at a time. > > Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> > > diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/kernel/x8664_ksyms.c > --- a/arch/x86_64/kernel/x8664_ksyms.c Tue Dec 27 15:41:48 2005 -0800 > +++ b/arch/x86_64/kernel/x8664_ksyms.c Tue Dec 27 15:41:48 2005 -0800 > @@ -150,6 +150,8 @@ > extern void * memcpy(void *,const void *,__kernel_size_t); > extern void * __memcpy(void *,const void *,__kernel_size_t); > > +extern void memcpy32(void *,const void *,__kernel_size_t); It's better to do an include here. Duplicating prototypes in .c files is frowned upon (despite the fact that it's already done here). > + > EXPORT_SYMBOL(memset); > EXPORT_SYMBOL(strlen); > EXPORT_SYMBOL(memmove); > @@ -164,6 +166,8 @@ > EXPORT_SYMBOL(memcpy); > EXPORT_SYMBOL(__memcpy); > > +EXPORT_SYMBOL_GPL(memcpy32); > + We've been steadily moving towards grouping EXPORTs with function definitions. Do *_ksyms.c exist solely to provide exports for functions defined in assembly at this point? If so, perhaps we ought to come up with a suitable export macro for asm files. > diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/lib/memcpy32.S > --- /dev/null Thu Jan 1 00:00:00 1970 +0000 > +++ b/arch/x86_64/lib/memcpy32.S Tue Dec 27 15:41:48 2005 -0800 > @@ -0,0 +1,25 @@ > +/* > + * Copyright (c) 2003, 2004, 2005 PathScale, Inc. > + */ > + > +/* > + * memcpy32 - Copy a memory block, 32 bits at a time. > + * > + * Count is number of dwords; it need not be a qword multiple. > + * Input: > + * rdi destination > + * rsi source > + * rdx count > + */ > + > + .globl memcpy32 > +memcpy32: > + movl %edx,%ecx > + shrl $1,%ecx > + andl $1,%edx > + rep > + movsq > + movl %edx,%ecx > + rep > + movsd > + ret Any reason this needs its own .S file? One wonders if the .p2align 4 in memcpy.S is appropriate here too. Splitting rep movsq across two lines is a little weird to me too, but I see Andi did it too. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2005-12-28 4:22 ` Matt Mackall @ 2005-12-28 7:54 ` Denis Vlasenko 2005-12-28 14:52 ` Bryan O'Sullivan 1 sibling, 0 replies; 24+ messages in thread From: Denis Vlasenko @ 2005-12-28 7:54 UTC (permalink / raw) To: Matt Mackall; +Cc: Bryan O'Sullivan, linux-kernel, akpm, hch > > + > > + .globl memcpy32 > > +memcpy32: > > + movl %edx,%ecx > > + shrl $1,%ecx > > + andl $1,%edx > > + rep > > + movsq Does this one really do 32-bit stores?! I doubt so... > > + movl %edx,%ecx > > + rep > > + movsd > > + ret > > Any reason this needs its own .S file? One wonders if the > > .p2align 4 > > in memcpy.S is appropriate here too. Splitting rep movsq across two > lines is a little weird to me too, but I see Andi did it too. -- vda ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2005-12-28 4:22 ` Matt Mackall 2005-12-28 7:54 ` Denis Vlasenko @ 2005-12-28 14:52 ` Bryan O'Sullivan 1 sibling, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2005-12-28 14:52 UTC (permalink / raw) To: Matt Mackall; +Cc: Andi Kleen, linux-kernel, akpm, hch On Tue, 2005-12-27 at 22:22 -0600, Matt Mackall wrote: > It's better to do an include here. Duplicating prototypes in .c files > is frowned upon (despite the fact that it's already done here). Yeah. I'm not thrilled about the existing style of that file, but I don't want to weed-whack it as I go. That turns a small patch into a case of mission creep. > We've been steadily moving towards grouping EXPORTs with function > definitions. Do *_ksyms.c exist solely to provide exports for > functions defined in assembly at this point? If so, perhaps we ought > to come up with a suitable export macro for asm files. That might make sense, but it's also beyond the scope of what I'm trying to do. > Any reason this needs its own .S file? Not really. > One wonders if the > > .p2align 4 > > in memcpy.S is appropriate here too. It's not clear to me that it makes any difference either way. Both routines obviously work :-) Perhaps Andi can indicate his opinion. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2005-12-27 23:41 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2005-12-28 4:22 ` Matt Mackall @ 2006-01-06 9:12 ` Pavel Machek 2006-01-06 16:02 ` Bryan O'Sullivan 1 sibling, 1 reply; 24+ messages in thread From: Pavel Machek @ 2006-01-06 9:12 UTC (permalink / raw) To: Bryan O'Sullivan; +Cc: linux-kernel, mpm, akpm, hch On Út 27-12-05 15:41:56, Bryan O'Sullivan wrote: > Introduce an x86_64-specific memcpy32 routine. The routine is similar > to memcpy, but is guaranteed to work in units of 32 bits at a time. > > Signed-off-by: Bryan O'Sullivan <bos@pathscale.com> > diff -r 7b7b442a4d63 -r 042b7d9004ac arch/x86_64/lib/memcpy32.S > --- /dev/null Thu Jan 1 00:00:00 1970 +0000 > +++ b/arch/x86_64/lib/memcpy32.S Tue Dec 27 15:41:48 2005 -0800 > @@ -0,0 +1,25 @@ > +/* > + * Copyright (c) 2003, 2004, 2005 PathScale, Inc. > + */ Did it really take 3 years to develop this? Anyway this contains copyright but not GPL, not allowing us to distribute it. Pavel -- Thanks, Sharp! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2 of 3] memcpy32 for x86_64 2006-01-06 9:12 ` Pavel Machek @ 2006-01-06 16:02 ` Bryan O'Sullivan 0 siblings, 0 replies; 24+ messages in thread From: Bryan O'Sullivan @ 2006-01-06 16:02 UTC (permalink / raw) To: Pavel Machek; +Cc: linux-kernel, mpm, akpm, hch On Fri, 2006-01-06 at 10:12 +0100, Pavel Machek wrote: > Did it really take 3 years to develop this? Each instruction is carefully aged in an oak barrel, in a climate-controlled cave. > Anyway this contains > copyright but not GPL, not allowing us to distribute it. I'll fix that, next round. <b ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-01-13 16:21 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-01-11 22:39 [PATCH 0 of 3] MMIO 32-bit copy routine, the final frontier Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 1 of 3] Introduce __raw_memcpy_toio32 Bryan O'Sullivan 2006-01-11 23:43 ` Andrew Morton 2006-01-11 22:39 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2006-01-11 23:45 ` Roland Dreier 2006-01-12 0:03 ` Bryan O'Sullivan 2006-01-11 22:39 ` [PATCH 3 of 3] Add __raw_memcpy_toio32 to each arch Bryan O'Sullivan 2006-01-11 23:46 ` Andrew Morton 2006-01-12 0:05 ` Bryan O'Sullivan 2006-01-12 0:13 ` Andrew Morton 2006-01-12 0:21 ` Bryan O'Sullivan -- strict thread matches above, loose matches on Subject: below -- 2006-01-10 19:53 [PATCH 0 of 3] 32-bit MMIO copy routines, reworked Bryan O'Sullivan 2006-01-10 19:53 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2006-01-12 8:38 ` Denis Vlasenko 2006-01-12 16:04 ` Bryan O'Sullivan 2006-01-13 9:56 ` Chris Wedgwood 2006-01-13 10:24 ` Denis Vlasenko 2006-01-13 16:21 ` Bryan O'Sullivan 2006-01-06 20:26 [PATCH 0 of 3] 32-bit MMIO copy routine Bryan O'Sullivan 2006-01-06 20:26 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2005-12-27 23:41 [PATCH 0 of 3] Add memcpy_toio32, a 32-bit MMIO copy routine Bryan O'Sullivan 2005-12-27 23:41 ` [PATCH 2 of 3] memcpy32 for x86_64 Bryan O'Sullivan 2005-12-28 4:22 ` Matt Mackall 2005-12-28 7:54 ` Denis Vlasenko 2005-12-28 14:52 ` Bryan O'Sullivan 2006-01-06 9:12 ` Pavel Machek 2006-01-06 16:02 ` Bryan O'Sullivan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).