* Call for testing/opinions: Optimized memset/memcpy @ 2013-07-13 15:51 Harm Hanemaaijer 2013-07-13 16:48 ` Dr. David Alan Gilbert 2013-07-13 17:24 ` Willy Tarreau 0 siblings, 2 replies; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-13 15:51 UTC (permalink / raw) To: linux-arm-kernel Hello, I've been doing some work on optimizing the memset/memcpy family of functions for modern ARM platforms, including copy_page, memset, memzero, memcpy, copy_from_user and copy_to_user. It appears that there is room for improvement, especially with regard to using an optimal preload strategy for armv6/v7 architectures as well as aligning the write target. For example, on an armv6-based platform (RPi) I am seeing a 80% speed-up in copy_page and large sized memcpy. Gains in the range 10-25% are seen on a Cortex A8 device. These optimizations use the regular register file, like the previous implementation, and do not use any NEON or vfp registers. To properly benchmark and test these new implementations, I've created a userspace testing utility that can be used to compare and validate exact copies of the original and optimized kernel versions of the functions in userspace. The repository is available at https://github.com/hglm/test-arm-kernel-memcpy.git. It would be useful to compare the results on different platforms and to check whether changes in the prefetch distance or write alignment result in optimized performance. I've created a preliminary patch set that replaces the copy_page, memset and memzero functions for all ARM platforms. Features include use of a configurable prefetch distance in copy_page, translation to 16-bit Thumb2 instructions whenever possible, optimization for the common word-aligned case in memset/memzero, and application of a predefined write alignment in memset/memzero. In order to safely use unified ARM assembler syntax, which appears to be desirable going forward, the first patch in the set renames all references of the "push" macro so that it no longer conflicts with the "push" instruction defined in unified syntax. The new memset/memzero functions use the unified syntax. The patch set is available at https://github.com/hglm/patches/tree/master/arm-mem-funcs. Optimization of memcpy/copy_from_user/copy_to_user is more complicated, and although I've created optimized versions that provide better results in benchmarks, we have to be careful that increased code size and branch prediction burden does not result in lower performance in real-world use, especially on older platforms. Therefore it might be desirable to only enable them on newer platforms like armv6/v7. So in short, I am looking for opinions, and test results especially from the userspace benchmark, to see the relative merit of these optimizations on different platforms. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 15:51 Call for testing/opinions: Optimized memset/memcpy Harm Hanemaaijer @ 2013-07-13 16:48 ` Dr. David Alan Gilbert 2013-07-13 21:13 ` Harm Hanemaaijer 2013-07-14 11:19 ` Harm Hanemaaijer 2013-07-13 17:24 ` Willy Tarreau 1 sibling, 2 replies; 18+ messages in thread From: Dr. David Alan Gilbert @ 2013-07-13 16:48 UTC (permalink / raw) To: linux-arm-kernel * Harm Hanemaaijer (fgenfb at yahoo.com) wrote: > Hello, > > I've been doing some work on optimizing the memset/memcpy family of > functions for modern ARM platforms, including copy_page, memset, > memzero, memcpy, copy_from_user and copy_to_user. It appears that > there is room for improvement, especially with regard to using an > optimal preload strategy for armv6/v7 architectures as well as > aligning the write target. For example, on an armv6-based platform > (RPi) I am seeing a 80% speed-up in copy_page and large sized > memcpy. Gains in the range 10-25% are seen on a Cortex A8 device. > These optimizations use the regular register file, like the > previous implementation, and do not use any NEON or vfp registers. You might like to compare with some of the routines at: https://launchpad.net/cortex-strings and some of the numbers at: https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/ (I'm sure Michael Hope who owns that set of stuff would be interested in seeing your stuff as well). > To properly benchmark and test these new implementations, I've > created a userspace testing utility that can be used to compare > and validate exact copies of the original and optimized kernel > versions of the functions in userspace. The repository is > available at https://github.com/hglm/test-arm-kernel-memcpy.git. > It would be useful to compare the results on different > platforms and to check whether changes in the prefetch distance > or write alignment result in optimized performance. It's quite tricky figuring out across different machines; also even the same machine in different setups; http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html is an interesting article on one machine being screwed over by video bandwidth. I've only had a brief scan through your code, one thing I remember from a couple of years ago was a theory that ldrd/strd was supposed to be faster on A15's (but I never had a chance to try it out). <snip> > So in short, I am looking for opinions, and test results especially > from the userspace benchmark, to see the relative merit of these > optimizations on different platforms. Maybe neon is worth a try these days (although be careful of platforms like Tegra 2 that doens't have it); there was a recent patch that enabled use in the kernel (I think for some RAID use). The downside is it's supposed to be quite power hungry. Dave -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ gro.gilbert @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 16:48 ` Dr. David Alan Gilbert @ 2013-07-13 21:13 ` Harm Hanemaaijer 2013-07-15 13:15 ` Catalin Marinas 2013-07-14 11:19 ` Harm Hanemaaijer 1 sibling, 1 reply; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-13 21:13 UTC (permalink / raw) To: linux-arm-kernel Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: > > You might like to compare with some of the routines at: > https://launchpad.net/cortex-strings > and some of the numbers at: > https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/ That's interesting. I had looked at cortex-strings before but didn't dig into it, also because its benchmark program seemed to be limited in scope. From the Linaro numbers it seems NEON isn't always a win especially on newer Cortex platforms, with large variability across different platforms/cores. > > http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html > > is an interesting article on one machine being screwed over by > video bandwidth. I have the same type of device (the Cortex A8 which I've tested on), when running a 1920x1080 screen at 32bpp that does indeed cost a lot bandwidth (it's 500MB/s of scanout bandwidth), I think this applies to most devices except higher-end ones with a 64-bit DRAM interface. > I've only had a brief scan through your code, one thing I remember > from a couple of years ago was a theory that ldrd/strd was supposed > to be faster on A15's (but I never had a chance to try it out). I briefly experimented with ldrd/strd, it seemed to be fast but highly dependent on the proper (64-bit) alignment. In my current code it is only used in Thumb2 mode in one spot. > Maybe neon is worth a try these days (although be careful of platforms > like Tegra 2 that doens't have it); there was a recent patch that enabled > use in the kernel (I think for some RAID use). The downside is it's > supposed to be quite power hungry. Although I don't have experience with NEON, there seems to be a lot of variability across platforms/cores when using it for memcpy, and it may have extra overhead when used in the kernel. I will look at it in more detail, but not using NEON does make things easier (not having to detect NEON, being compatible with older platforms etc). Thanks for the comments. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 21:13 ` Harm Hanemaaijer @ 2013-07-15 13:15 ` Catalin Marinas 0 siblings, 0 replies; 18+ messages in thread From: Catalin Marinas @ 2013-07-15 13:15 UTC (permalink / raw) To: linux-arm-kernel On Sat, Jul 13, 2013 at 10:13:12PM +0100, Harm Hanemaaijer wrote: > Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: > > > > > You might like to compare with some of the routines at: > > https://launchpad.net/cortex-strings > > and some of the numbers at: > > https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/ > > That's interesting. I had looked at cortex-strings before but didn't > dig into it, also because its benchmark program seemed to be limited in > scope. From the Linaro numbers it seems NEON isn't always a win > especially on newer Cortex platforms, with large variability across > different platforms/cores. As it has been stated in this thread, we shouldn't use Neon for memcpy. There is a significant overhead with saving/restoring Neon registers, preemptability. But Cortex Strings is a good starting point and Linaro is going to port some of these functions to the Linux kernel for ARMv8 (AArch64). -- Catalin ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 16:48 ` Dr. David Alan Gilbert 2013-07-13 21:13 ` Harm Hanemaaijer @ 2013-07-14 11:19 ` Harm Hanemaaijer 2013-07-14 11:32 ` Dr. David Alan Gilbert 2013-07-14 11:37 ` Ard Biesheuvel 1 sibling, 2 replies; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-14 11:19 UTC (permalink / raw) To: linux-arm-kernel Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: > > Maybe neon is worth a try these days (although be careful of platforms > like Tegra 2 that doens't have it); there was a recent patch that enabled > use in the kernel (I think for some RAID use). The downside is it's > supposed to be quite power hungry. > As it turns out, NEON isn't too hard to implement. I have added NEON support to copy_page, memset, memzero, and memcpy (both for the aligned and unaligned case) in my userspace testing environment. It gives a nice boost (ranging from 10% for copy_page to >30% for unaligned memcpy on a Cortex A8), which can potentially be more on other cores. Although I have not tested a live kernel yet, it looks like NEON can be used fairly transparently #ifdefed on the CONFIG_NEON kernel definition as long as only the lower end of the NEON/vfp register file is clobbered (although this needs verification). ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:19 ` Harm Hanemaaijer @ 2013-07-14 11:32 ` Dr. David Alan Gilbert 2013-07-14 11:37 ` Ard Biesheuvel 1 sibling, 0 replies; 18+ messages in thread From: Dr. David Alan Gilbert @ 2013-07-14 11:32 UTC (permalink / raw) To: linux-arm-kernel * Harm Hanemaaijer (fgenfb at yahoo.com) wrote: > Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: > > > > Maybe neon is worth a try these days (although be careful of platforms > > like Tegra 2 that doens't have it); there was a recent patch that enabled > > use in the kernel (I think for some RAID use). The downside is it's > > supposed to be quite power hungry. > > > > As it turns out, NEON isn't too hard to implement. I have added NEON support > to copy_page, memset, memzero, and memcpy (both for the aligned and unaligned > case) in my userspace testing environment. It gives a nice boost (ranging > from 10% for copy_page to >30% for unaligned memcpy on a Cortex A8), which > can potentially be more on other cores. What size memcpy's is that on? If I remember correctly A8 happens to be able to do very fast Neon to it's cache but it doesn't help outside of the cache, and it doesn't make any benefit on A9. > Although I have not tested a live > kernel yet, it looks like NEON can be used fairly transparently #ifdefed on > the CONFIG_NEON kernel definition as long as only the lower end of the > NEON/vfp register file is clobbered (although this needs verification). Hmm I'd assumed there would be some save/restory stuff needed and given copy_to_ etc get used everywhere I'd be careful. Dave -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ gro.gilbert @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:19 ` Harm Hanemaaijer 2013-07-14 11:32 ` Dr. David Alan Gilbert @ 2013-07-14 11:37 ` Ard Biesheuvel 2013-07-14 13:13 ` Russell King - ARM Linux 2013-07-14 13:33 ` Harm Hanemaaijer 1 sibling, 2 replies; 18+ messages in thread From: Ard Biesheuvel @ 2013-07-14 11:37 UTC (permalink / raw) To: linux-arm-kernel On 14 July 2013 13:19, Harm Hanemaaijer <fgenfb@yahoo.com> wrote: > Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: >> >> Maybe neon is worth a try these days (although be careful of platforms >> like Tegra 2 that doens't have it); there was a recent patch that enabled >> use in the kernel (I think for some RAID use). The downside is it's >> supposed to be quite power hungry. >> > > As it turns out, NEON isn't too hard to implement. I have added NEON support > to copy_page, memset, memzero, and memcpy (both for the aligned and unaligned > case) in my userspace testing environment. It gives a nice boost (ranging > from 10% for copy_page to >30% for unaligned memcpy on a Cortex A8), which > can potentially be more on other cores. Although I have not tested a live > kernel yet, it looks like NEON can be used fairly transparently #ifdefed on > the CONFIG_NEON kernel definition as long as only the lower end of the > NEON/vfp register file is clobbered (although this needs verification). > You will clobber the userland NEON contents of the register file if you don't preserve them properly. Also, kernel preemption (if enabled) may put your task to sleep at any time, and the context switching machinery is totally oblivious of NEON being used in the kernel, so the kernel side will get corrupted as well in this case. I have a patch series pending (i.e., accepted but not pulled yet by Russell) which addresses these issues. -- Ard. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:37 ` Ard Biesheuvel @ 2013-07-14 13:13 ` Russell King - ARM Linux 2013-07-14 13:33 ` Harm Hanemaaijer 1 sibling, 0 replies; 18+ messages in thread From: Russell King - ARM Linux @ 2013-07-14 13:13 UTC (permalink / raw) To: linux-arm-kernel On Sun, Jul 14, 2013 at 01:37:44PM +0200, Ard Biesheuvel wrote: > On 14 July 2013 13:19, Harm Hanemaaijer <fgenfb@yahoo.com> wrote: > > Dr. David Alan Gilbert <gilbertd <at> treblig.org> writes: > >> > >> Maybe neon is worth a try these days (although be careful of platforms > >> like Tegra 2 that doens't have it); there was a recent patch that enabled > >> use in the kernel (I think for some RAID use). The downside is it's > >> supposed to be quite power hungry. > >> > > > > As it turns out, NEON isn't too hard to implement. I have added NEON support > > to copy_page, memset, memzero, and memcpy (both for the aligned and unaligned > > case) in my userspace testing environment. It gives a nice boost (ranging > > from 10% for copy_page to >30% for unaligned memcpy on a Cortex A8), which > > can potentially be more on other cores. Although I have not tested a live > > kernel yet, it looks like NEON can be used fairly transparently #ifdefed on > > the CONFIG_NEON kernel definition as long as only the lower end of the > > NEON/vfp register file is clobbered (although this needs verification). > > > > You will clobber the userland NEON contents of the register file if > you don't preserve them properly. Also, kernel preemption (if enabled) > may put your task to sleep at any time, and the context switching > machinery is totally oblivious of NEON being used in the kernel, so > the kernel side will get corrupted as well in this case. The other issue is - not every ARMv7 core has Neon, so this is going to have to be something that is selected at runtime - which means indirecting every memcpy/memset through a function pointer. The final point is, don't forget that gcc will generate implicit calls to memset/memcpy, and neon won't be available early in the kernel boot, so you can't optimize those function pointers away. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:37 ` Ard Biesheuvel 2013-07-14 13:13 ` Russell King - ARM Linux @ 2013-07-14 13:33 ` Harm Hanemaaijer 2013-07-14 14:09 ` Ard Biesheuvel 1 sibling, 1 reply; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-14 13:33 UTC (permalink / raw) To: linux-arm-kernel Ard Biesheuvel <ard.biesheuvel <at> linaro.org> writes: > > You will clobber the userland NEON contents of the register file if > you don't preserve them properly. Also, kernel preemption (if enabled) > may put your task to sleep at any time, and the context switching > machinery is totally oblivious of NEON being used in the kernel, so > the kernel side will get corrupted as well in this case. > > I have a patch series pending (i.e., accepted but not pulled yet by > Russell) which addresses these issues. > That was what I was afraid of concerning NEON. It must be tricky to solve without sacrificing performance, since saving/restoring the entire NEON register file would obviously seriously impact context switch performance. For memcpy-like applications, basically only four dword registers are required (d0-d3) which could possibly be optimized for. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 13:33 ` Harm Hanemaaijer @ 2013-07-14 14:09 ` Ard Biesheuvel 2013-07-14 14:32 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: Ard Biesheuvel @ 2013-07-14 14:09 UTC (permalink / raw) To: linux-arm-kernel On 14 July 2013 15:33, Harm Hanemaaijer <fgenfb@yahoo.com> wrote: > Ard Biesheuvel <ard.biesheuvel <at> linaro.org> writes: > >> >> You will clobber the userland NEON contents of the register file if >> you don't preserve them properly. Also, kernel preemption (if enabled) >> may put your task to sleep at any time, and the context switching >> machinery is totally oblivious of NEON being used in the kernel, so >> the kernel side will get corrupted as well in this case. >> >> I have a patch series pending (i.e., accepted but not pulled yet by >> Russell) which addresses these issues. >> > > That was what I was afraid of concerning NEON. It must be tricky to solve > without sacrificing performance, since saving/restoring the entire NEON > register file would obviously seriously impact context switch performance. > For memcpy-like applications, basically only four dword registers are > required (d0-d3) which could possibly be optimized for. > Well, the whole lazy preserve/restore mechanism is based on the premise that preserve/restore is only required when multiple users are contending for the NEON (or in the SMP case, when a task gets migrated to another CPU). As we will not be allowing NEON in interrupt context nor in a preemptible section, the burden of the more costly context switches should not grow disproportionately, even if tasks may be contending for the NEON with themselves in a way (userland vs kernel). However, it also means that a NEON based memcpy() is going to be problematic, not only for the reasons pointed out by Russell, also because you will need a fallback to use from interrupt context. Perhaps for sufficiently large sizes, it makes sense to take the hit of testing whether NEON is allowable at that particular moment, and doing the preserve in that case. In the end, the numbers should speak for themselves: if you manage a considerable speedup in a real-world case, and no deterioration in others, people are usually quite receptive. -- Ard. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 14:09 ` Ard Biesheuvel @ 2013-07-14 14:32 ` Russell King - ARM Linux 0 siblings, 0 replies; 18+ messages in thread From: Russell King - ARM Linux @ 2013-07-14 14:32 UTC (permalink / raw) To: linux-arm-kernel On Sun, Jul 14, 2013 at 04:09:20PM +0200, Ard Biesheuvel wrote: > Well, the whole lazy preserve/restore mechanism is based on the > premise that preserve/restore is only required when multiple users are > contending for the NEON (or in the SMP case, when a task gets migrated > to another CPU). As we will not be allowing NEON in interrupt context > nor in a preemptible section, the burden of the more costly context > switches should not grow disproportionately, even if tasks may be > contending for the NEON with themselves in a way (userland vs kernel). > However, it also means that a NEON based memcpy() is going to be > problematic, not only for the reasons pointed out by Russell, also > because you will need a fallback to use from interrupt context. There's another reason too: it would make memcpy() et.al., non-preemptible also - that's probably fine for very small copies, but not for larger ones. The acceptability threshold depends on how RT orientated you are and what your application demands in terms of accuracy from the RT implementation. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 15:51 Call for testing/opinions: Optimized memset/memcpy Harm Hanemaaijer 2013-07-13 16:48 ` Dr. David Alan Gilbert @ 2013-07-13 17:24 ` Willy Tarreau 2013-07-13 21:51 ` Harm Hanemaaijer 1 sibling, 1 reply; 18+ messages in thread From: Willy Tarreau @ 2013-07-13 17:24 UTC (permalink / raw) To: linux-arm-kernel Hello Harm, On Sat, Jul 13, 2013 at 03:51:07PM +0000, Harm Hanemaaijer wrote: > Hello, > > I've been doing some work on optimizing the memset/memcpy family of > functions for modern ARM platforms, including copy_page, memset, > memzero, memcpy, copy_from_user and copy_to_user. It appears that > there is room for improvement, especially with regard to using an > optimal preload strategy for armv6/v7 architectures as well as > aligning the write target. For example, on an armv6-based platform > (RPi) I am seeing a 80% speed-up in copy_page and large sized > memcpy. Gains in the range 10-25% are seen on a Cortex A8 device. Interesting, especially for devices that have a narrow DDR bus where we want to shave every possible bus cycle! (...) > So in short, I am looking for opinions, and test results especially > from the userspace benchmark, to see the relative merit of these > optimizations on different platforms. OK I've run bench.script on the following platforms : - Snowball board : it is a dual-core 1GHz cortex-a9 from STE (A9500). It has some 32-bit LPDDR2 soldered on the CPU (package on package). The test ran only in ARMv7 mode. root at snowball:tmp# cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 1 (v7l) BogoMIPS : 4.80 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 1 - Armada XP-GP board : it's a quad-core 1.6 GHz Marvell Armada-XP (PJ4Bv2) CPU. It has 64-bit DDR3-1600 RAM on a DIMM. The tests were run in ARMv7 and Thumb2 modes. The difference was not impressive between the two modes. root at xpgp:tmp# cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 2 (v7l) BogoMIPS : 1594.16 Features : swp half thumb fastmult vfp edsp vfpv3 tls idiva idivt CPU implementer : 0x56 CPU architecture: 7 CPU variant : 0x2 CPU part : 0x584 CPU revision : 2 - Mirabox : single-core 1.2 GHz Marvell Armada370 (PJ4B) CPU. It uses 16-bit DDR3-1200 soldered onboard. The tests were run in ARMv7 and Thumb2 modes. It can be useful to compare with the xp-gp above because its CPU can be seen as a scaled down version of the previous one, with 1/4 of the DRAM bus width, and both have the DRAM at half CPU frequency. root at mirabox:tmp# cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 1 (v7l) BogoMIPS : 597.60 Features : swp half thumb fastmult vfp edsp vfpv3 vfpv3d16 tls idivt CPU implementer : 0x56 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x581 CPU revision : 1 I'm attaching all the results. Hoping this helps, Willy -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 599.89 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 600.57 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 597.81 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 598.70 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 595.39 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 618.28 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 615.10 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 618.15 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 615.02 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 621.19 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 618.03 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 612.97 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 614.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 611.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 616.50 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.92 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.71 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.92 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.73 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.63 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 381.35 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 383.49 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 381.49 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 383.32 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 381.47 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 426.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 426.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 426.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 426.69 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 424.72 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 311.75 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 310.30 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 311.74 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 310.22 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 311.76 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 327.84 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 327.89 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 327.87 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 326.25 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 327.87 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 364.50 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.29 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 364.51 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.24 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.31 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 361.11 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 362.86 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 361.10 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 362.86 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 361.13 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 366.61 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 364.79 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.56 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.60 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 364.84 MB/s libc memcpy: 4096 bytes page aligned: 356.71 MB/s 4096 bytes page aligned: 355.04 MB/s 4096 bytes page aligned: 356.67 MB/s 4096 bytes page aligned: 354.98 MB/s 4096 bytes page aligned: 356.68 MB/s kernel memcpy (original): 4096 bytes page aligned: 355.32 MB/s 4096 bytes page aligned: 356.96 MB/s 4096 bytes page aligned: 355.31 MB/s 4096 bytes page aligned: 357.01 MB/s 4096 bytes page aligned: 355.30 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 341.05 MB/s 4096 bytes page aligned: 339.37 MB/s 4096 bytes page aligned: 341.04 MB/s 4096 bytes page aligned: 339.37 MB/s 4096 bytes page aligned: 341.03 MB/s kernel copy_page (original): 4096 bytes page aligned: 382.31 MB/s 4096 bytes page aligned: 384.19 MB/s 4096 bytes page aligned: 382.29 MB/s 4096 bytes page aligned: 384.25 MB/s 4096 bytes page aligned: 382.30 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 340.55 MB/s 4096 bytes page aligned: 338.96 MB/s 4096 bytes page aligned: 340.60 MB/s 4096 bytes page aligned: 338.96 MB/s 4096 bytes page aligned: 340.56 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 513.06 MB/s Mixed from 1 to 1023 (power law), unaligned: 513.02 MB/s Mixed from 1 to 1023 (power law), unaligned: 512.94 MB/s Mixed from 1 to 1023 (power law), unaligned: 510.37 MB/s Mixed from 1 to 1023 (power law), unaligned: 513.35 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 532.66 MB/s Mixed from 1 to 1023 (power law), unaligned: 535.20 MB/s Mixed from 1 to 1023 (power law), unaligned: 532.29 MB/s Mixed from 1 to 1023 (power law), unaligned: 535.41 MB/s Mixed from 1 to 1023 (power law), unaligned: 535.59 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 528.33 MB/s Mixed from 1 to 1023 (power law), unaligned: 531.12 MB/s Mixed from 1 to 1023 (power law), unaligned: 527.64 MB/s Mixed from 1 to 1023 (power law), unaligned: 530.72 MB/s Mixed from 1 to 1023 (power law), unaligned: 528.05 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 888.47 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 884.25 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 888.42 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 888.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 884.05 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 962.84 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 958.71 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 963.20 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 958.83 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 962.86 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1004.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.61 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1004.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.43 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1004.46 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 922.59 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 926.98 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 926.99 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 922.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 927.07 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 930.00 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 934.53 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 930.89 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 935.60 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 935.32 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 520.37 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 520.42 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 517.93 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 520.36 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 517.84 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 594.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 591.54 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 594.39 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 594.45 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 591.58 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 658.84 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 655.68 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 658.78 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 655.58 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 658.85 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.21 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.92 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.93 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 586.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 588.64 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 585.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 588.86 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 588.66 MB/s libc memset: 4096 bytes page aligned: 2052.77 MB/s 4096 bytes page aligned: 2052.69 MB/s 4096 bytes page aligned: 2042.84 MB/s 4096 bytes page aligned: 2052.72 MB/s 4096 bytes page aligned: 2042.30 MB/s kernel memset (original): 4096 bytes page aligned: 1920.98 MB/s 4096 bytes page aligned: 1911.66 MB/s 4096 bytes page aligned: 1921.13 MB/s 4096 bytes page aligned: 1921.17 MB/s 4096 bytes page aligned: 1911.92 MB/s kernel memset (optimized): 4096 bytes page aligned: 1900.46 MB/s 4096 bytes page aligned: 1891.21 MB/s 4096 bytes page aligned: 1900.52 MB/s 4096 bytes page aligned: 1891.16 MB/s 4096 bytes page aligned: 1900.64 MB/s kernel memzero (original): 4096 bytes page aligned: 1910.57 MB/s 4096 bytes page aligned: 1920.05 MB/s 4096 bytes page aligned: 1920.02 MB/s 4096 bytes page aligned: 1910.87 MB/s 4096 bytes page aligned: 1920.06 MB/s kernel memzero (optimized): 4096 bytes page aligned: 1917.74 MB/s 4096 bytes page aligned: 1927.05 MB/s 4096 bytes page aligned: 1917.28 MB/s 4096 bytes page aligned: 1927.11 MB/s 4096 bytes page aligned: 1926.87 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 759.37 MB/s Mixed from 1 to 1023 (power law), unaligned: 759.42 MB/s Mixed from 1 to 1023 (power law), unaligned: 755.88 MB/s Mixed from 1 to 1023 (power law), unaligned: 759.32 MB/s Mixed from 1 to 1023 (power law), unaligned: 756.04 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 802.77 MB/s Mixed from 1 to 1023 (power law), unaligned: 798.89 MB/s Mixed from 1 to 1023 (power law), unaligned: 801.62 MB/s Mixed from 1 to 1023 (power law), unaligned: 802.67 MB/s Mixed from 1 to 1023 (power law), unaligned: 798.07 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 862.50 MB/s Mixed from 1 to 1023 (power law), unaligned: 857.72 MB/s Mixed from 1 to 1023 (power law), unaligned: 862.52 MB/s Mixed from 1 to 1023 (power law), unaligned: 857.00 MB/s Mixed from 1 to 1023 (power law), unaligned: 860.71 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 784.48 MB/s Mixed from 1 to 1023 (power law), unaligned: 780.41 MB/s Mixed from 1 to 1023 (power law), unaligned: 784.97 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.14 MB/s Mixed from 1 to 1023 (power law), unaligned: 783.99 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 793.48 MB/s Mixed from 1 to 1023 (power law), unaligned: 796.39 MB/s Mixed from 1 to 1023 (power law), unaligned: 792.86 MB/s Mixed from 1 to 1023 (power law), unaligned: 796.20 MB/s Mixed from 1 to 1023 (power law), unaligned: 796.68 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 614.78 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 618.39 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 614.90 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 618.16 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 614.83 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 654.11 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 650.60 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 653.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 653.81 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 649.56 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 653.09 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 650.86 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 653.72 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 650.74 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 653.71 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 332.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 333.86 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 332.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 333.86 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 333.77 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.96 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 365.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.95 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 403.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 401.21 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 403.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 401.23 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 403.02 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 293.84 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 293.87 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 293.79 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 292.46 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 293.78 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 312.63 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 314.11 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 312.64 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 314.05 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 312.63 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 347.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 345.40 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 347.01 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 347.06 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 347.05 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 338.99 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 337.42 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 338.96 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 337.42 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 339.07 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 336.61 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 338.16 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.61 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 338.21 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.58 MB/s libc memcpy: 4096 bytes page aligned: 358.08 MB/s 4096 bytes page aligned: 356.32 MB/s 4096 bytes page aligned: 358.07 MB/s 4096 bytes page aligned: 356.39 MB/s 4096 bytes page aligned: 358.08 MB/s kernel memcpy (original): 4096 bytes page aligned: 356.76 MB/s 4096 bytes page aligned: 358.47 MB/s 4096 bytes page aligned: 356.76 MB/s 4096 bytes page aligned: 358.47 MB/s 4096 bytes page aligned: 356.86 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 342.33 MB/s 4096 bytes page aligned: 340.66 MB/s 4096 bytes page aligned: 342.32 MB/s 4096 bytes page aligned: 340.70 MB/s 4096 bytes page aligned: 342.31 MB/s kernel copy_page (original): 4096 bytes page aligned: 381.93 MB/s 4096 bytes page aligned: 383.87 MB/s 4096 bytes page aligned: 381.97 MB/s 4096 bytes page aligned: 383.86 MB/s 4096 bytes page aligned: 381.98 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 341.86 MB/s 4096 bytes page aligned: 341.83 MB/s 4096 bytes page aligned: 341.86 MB/s 4096 bytes page aligned: 341.80 MB/s 4096 bytes page aligned: 341.85 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 484.57 MB/s Mixed from 1 to 1023 (power law), unaligned: 482.42 MB/s Mixed from 1 to 1023 (power law), unaligned: 484.45 MB/s Mixed from 1 to 1023 (power law), unaligned: 482.49 MB/s Mixed from 1 to 1023 (power law), unaligned: 484.27 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 503.45 MB/s Mixed from 1 to 1023 (power law), unaligned: 505.11 MB/s Mixed from 1 to 1023 (power law), unaligned: 502.65 MB/s Mixed from 1 to 1023 (power law), unaligned: 505.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 502.69 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 490.07 MB/s Mixed from 1 to 1023 (power law), unaligned: 490.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 486.98 MB/s Mixed from 1 to 1023 (power law), unaligned: 489.95 MB/s Mixed from 1 to 1023 (power law), unaligned: 487.95 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 844.51 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 840.39 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 844.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 840.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 844.55 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 886.05 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 890.19 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 890.11 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 885.76 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 889.84 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 930.57 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 934.93 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 930.50 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 934.75 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 930.35 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 860.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 860.40 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 860.34 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 860.40 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 856.31 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.67 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 877.42 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.60 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 877.48 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.70 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 496.66 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 499.04 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 498.98 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 496.62 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 498.96 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 551.78 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 554.33 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 551.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 554.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 551.60 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 601.07 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 597.87 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 601.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 601.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 598.38 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 525.40 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 522.99 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 525.42 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 522.74 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 525.28 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 556.46 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 559.02 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 559.16 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 559.00 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 559.13 MB/s libc memset: 4096 bytes page aligned: 2029.13 MB/s 4096 bytes page aligned: 2038.87 MB/s 4096 bytes page aligned: 2029.11 MB/s 4096 bytes page aligned: 2038.82 MB/s 4096 bytes page aligned: 2028.82 MB/s kernel memset (original): 4096 bytes page aligned: 1918.99 MB/s 4096 bytes page aligned: 1909.79 MB/s 4096 bytes page aligned: 1919.03 MB/s 4096 bytes page aligned: 1918.82 MB/s 4096 bytes page aligned: 1918.96 MB/s kernel memset (optimized): 4096 bytes page aligned: 1920.02 MB/s 4096 bytes page aligned: 1910.71 MB/s 4096 bytes page aligned: 1920.03 MB/s 4096 bytes page aligned: 1910.58 MB/s 4096 bytes page aligned: 1919.89 MB/s kernel memzero (original): 4096 bytes page aligned: 1885.37 MB/s 4096 bytes page aligned: 1894.53 MB/s 4096 bytes page aligned: 1885.11 MB/s 4096 bytes page aligned: 1894.52 MB/s 4096 bytes page aligned: 1894.52 MB/s kernel memzero (optimized): 4096 bytes page aligned: 1895.10 MB/s 4096 bytes page aligned: 1894.72 MB/s 4096 bytes page aligned: 1885.82 MB/s 4096 bytes page aligned: 1895.08 MB/s 4096 bytes page aligned: 1885.86 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 737.90 MB/s Mixed from 1 to 1023 (power law), unaligned: 734.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 737.61 MB/s Mixed from 1 to 1023 (power law), unaligned: 734.18 MB/s Mixed from 1 to 1023 (power law), unaligned: 737.53 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 786.00 MB/s Mixed from 1 to 1023 (power law), unaligned: 786.00 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.98 MB/s Mixed from 1 to 1023 (power law), unaligned: 782.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.96 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 813.68 MB/s Mixed from 1 to 1023 (power law), unaligned: 817.65 MB/s Mixed from 1 to 1023 (power law), unaligned: 813.22 MB/s Mixed from 1 to 1023 (power law), unaligned: 817.10 MB/s Mixed from 1 to 1023 (power law), unaligned: 813.94 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 746.57 MB/s Mixed from 1 to 1023 (power law), unaligned: 746.77 MB/s Mixed from 1 to 1023 (power law), unaligned: 742.82 MB/s Mixed from 1 to 1023 (power law), unaligned: 746.56 MB/s Mixed from 1 to 1023 (power law), unaligned: 743.25 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 785.01 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.21 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.10 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.19 MB/s Mixed from 1 to 1023 (power law), unaligned: 784.99 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 944.06 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 939.55 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 936.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 938.91 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 935.52 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 921.58 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 918.61 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 915.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 915.27 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 911.62 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 908.06 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 905.13 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 907.52 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 906.64 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 907.89 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 547.23 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 547.29 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 546.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 547.24 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 547.50 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.90 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.91 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.93 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 542.91 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.95 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 615.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 614.48 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 615.11 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 615.07 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 614.90 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 459.28 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.87 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.40 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.62 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.40 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 457.91 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.35 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 457.98 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.22 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 457.85 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 545.62 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 544.90 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 545.52 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 545.42 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 545.54 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 485.72 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 484.69 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 484.78 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.02 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.64 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 489.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 491.05 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 492.40 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 493.27 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 491.08 MB/s libc memcpy: 4096 bytes page aligned: 1027.53 MB/s 4096 bytes page aligned: 1020.33 MB/s 4096 bytes page aligned: 1026.20 MB/s 4096 bytes page aligned: 1025.76 MB/s 4096 bytes page aligned: 1024.70 MB/s kernel memcpy (original): 4096 bytes page aligned: 1026.80 MB/s 4096 bytes page aligned: 1027.25 MB/s 4096 bytes page aligned: 1026.46 MB/s 4096 bytes page aligned: 1020.09 MB/s 4096 bytes page aligned: 1027.83 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 841.49 MB/s 4096 bytes page aligned: 847.07 MB/s 4096 bytes page aligned: 840.32 MB/s 4096 bytes page aligned: 847.07 MB/s 4096 bytes page aligned: 841.32 MB/s kernel copy_page (original): 4096 bytes page aligned: 948.27 MB/s 4096 bytes page aligned: 940.34 MB/s 4096 bytes page aligned: 946.30 MB/s 4096 bytes page aligned: 942.02 MB/s 4096 bytes page aligned: 948.32 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 850.59 MB/s 4096 bytes page aligned: 857.73 MB/s 4096 bytes page aligned: 851.24 MB/s 4096 bytes page aligned: 858.75 MB/s 4096 bytes page aligned: 851.73 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 715.47 MB/s Mixed from 1 to 1023 (power law), unaligned: 714.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 715.65 MB/s Mixed from 1 to 1023 (power law), unaligned: 714.83 MB/s Mixed from 1 to 1023 (power law), unaligned: 712.47 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 721.70 MB/s Mixed from 1 to 1023 (power law), unaligned: 719.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 721.34 MB/s Mixed from 1 to 1023 (power law), unaligned: 718.81 MB/s Mixed from 1 to 1023 (power law), unaligned: 721.02 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 635.79 MB/s Mixed from 1 to 1023 (power law), unaligned: 636.97 MB/s Mixed from 1 to 1023 (power law), unaligned: 635.52 MB/s Mixed from 1 to 1023 (power law), unaligned: 636.23 MB/s Mixed from 1 to 1023 (power law), unaligned: 636.05 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1323.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1326.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1348.12 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1328.57 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1324.56 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1786.48 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1782.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1776.21 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1745.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1771.53 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1770.77 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1759.21 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1721.21 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1782.98 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1762.74 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1745.20 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1763.23 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1743.48 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1766.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1728.34 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1682.73 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1660.62 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1695.76 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1703.42 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1766.86 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 901.11 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 901.81 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 889.89 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 886.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 899.02 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1142.87 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1145.74 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1141.91 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1142.41 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1143.23 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1129.60 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1132.20 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1131.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1131.37 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1128.10 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1110.96 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1105.10 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1106.56 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1107.89 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1105.29 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1081.12 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1086.37 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1086.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1086.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1085.48 MB/s libc memset: 4096 bytes page aligned: 1371.96 MB/s 4096 bytes page aligned: 1362.53 MB/s 4096 bytes page aligned: 1383.10 MB/s 4096 bytes page aligned: 1356.89 MB/s 4096 bytes page aligned: 1367.61 MB/s kernel memset (original): 4096 bytes page aligned: 1321.56 MB/s 4096 bytes page aligned: 1337.12 MB/s 4096 bytes page aligned: 1318.98 MB/s 4096 bytes page aligned: 1330.80 MB/s 4096 bytes page aligned: 1324.66 MB/s kernel memset (optimized): 4096 bytes page aligned: 1317.07 MB/s 4096 bytes page aligned: 1305.07 MB/s 4096 bytes page aligned: 1311.78 MB/s 4096 bytes page aligned: 1301.32 MB/s 4096 bytes page aligned: 1305.47 MB/s kernel memzero (original): 4096 bytes page aligned: 1320.70 MB/s 4096 bytes page aligned: 1317.15 MB/s 4096 bytes page aligned: 1380.78 MB/s 4096 bytes page aligned: 1316.34 MB/s 4096 bytes page aligned: 1363.25 MB/s kernel memzero (optimized): 4096 bytes page aligned: 1302.89 MB/s 4096 bytes page aligned: 1349.68 MB/s 4096 bytes page aligned: 1305.33 MB/s 4096 bytes page aligned: 1338.91 MB/s 4096 bytes page aligned: 1304.71 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 1296.85 MB/s Mixed from 1 to 1023 (power law), unaligned: 1281.93 MB/s Mixed from 1 to 1023 (power law), unaligned: 1284.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 1303.82 MB/s Mixed from 1 to 1023 (power law), unaligned: 1289.72 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 1635.98 MB/s Mixed from 1 to 1023 (power law), unaligned: 1631.05 MB/s Mixed from 1 to 1023 (power law), unaligned: 1630.50 MB/s Mixed from 1 to 1023 (power law), unaligned: 1629.33 MB/s Mixed from 1 to 1023 (power law), unaligned: 1640.34 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 1674.27 MB/s Mixed from 1 to 1023 (power law), unaligned: 1661.84 MB/s Mixed from 1 to 1023 (power law), unaligned: 1670.77 MB/s Mixed from 1 to 1023 (power law), unaligned: 1656.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 1664.30 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 1583.12 MB/s Mixed from 1 to 1023 (power law), unaligned: 1576.78 MB/s Mixed from 1 to 1023 (power law), unaligned: 1579.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 1571.27 MB/s Mixed from 1 to 1023 (power law), unaligned: 1554.87 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 1613.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 1624.66 MB/s Mixed from 1 to 1023 (power law), unaligned: 1613.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 1624.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 1611.64 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 938.28 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 938.13 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 938.22 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 937.87 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 938.26 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 992.48 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 992.77 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 992.53 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 992.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 992.45 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.57 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.57 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.65 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 506.25 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 506.18 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 506.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 506.16 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 506.19 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 542.36 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 542.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.74 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 542.09 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 542.71 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 568.31 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.96 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.96 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.81 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.88 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 425.27 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.41 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.29 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 426.54 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.58 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 458.17 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.13 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.73 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.32 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.95 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 503.75 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 503.23 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 503.38 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 502.87 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 503.40 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 486.47 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.02 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.65 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.20 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.11 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 456.43 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.72 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.60 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.58 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 456.06 MB/s libc memcpy: 4096 bytes page aligned: 2733.85 MB/s 4096 bytes page aligned: 2734.82 MB/s 4096 bytes page aligned: 2735.47 MB/s 4096 bytes page aligned: 2733.74 MB/s 4096 bytes page aligned: 2735.10 MB/s kernel memcpy (original): 4096 bytes page aligned: 2763.15 MB/s 4096 bytes page aligned: 2764.57 MB/s 4096 bytes page aligned: 2762.87 MB/s 4096 bytes page aligned: 2764.31 MB/s 4096 bytes page aligned: 2763.97 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 2021.61 MB/s 4096 bytes page aligned: 2022.85 MB/s 4096 bytes page aligned: 2021.30 MB/s 4096 bytes page aligned: 2022.75 MB/s 4096 bytes page aligned: 2021.18 MB/s kernel copy_page (original): 4096 bytes page aligned: 1536.64 MB/s 4096 bytes page aligned: 1536.07 MB/s 4096 bytes page aligned: 1536.62 MB/s 4096 bytes page aligned: 1536.44 MB/s 4096 bytes page aligned: 1536.04 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 2029.46 MB/s 4096 bytes page aligned: 2028.46 MB/s 4096 bytes page aligned: 2029.26 MB/s 4096 bytes page aligned: 2028.49 MB/s 4096 bytes page aligned: 2029.51 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 677.42 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.45 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.43 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.49 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.55 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 705.91 MB/s Mixed from 1 to 1023 (power law), unaligned: 705.96 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.14 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.18 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.32 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 671.04 MB/s Mixed from 1 to 1023 (power law), unaligned: 671.49 MB/s Mixed from 1 to 1023 (power law), unaligned: 671.19 MB/s Mixed from 1 to 1023 (power law), unaligned: 671.87 MB/s Mixed from 1 to 1023 (power law), unaligned: 671.50 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1288.97 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1288.99 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1288.74 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1288.95 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1288.51 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1698.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1695.12 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1695.28 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1699.55 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1698.91 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1826.35 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1826.33 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1833.66 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1833.25 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1834.97 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1608.61 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1603.63 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1606.36 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1608.51 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1607.49 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1654.00 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1653.34 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1653.09 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1647.16 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1653.98 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 779.98 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 780.05 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 779.98 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 780.09 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 779.82 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 971.07 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 969.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 969.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 969.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 969.45 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1166.68 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1166.31 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1166.68 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1166.41 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1166.45 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 915.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 915.88 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 916.08 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 915.77 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 915.94 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 980.79 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 981.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 981.46 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 981.44 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 981.17 MB/s libc memset: 4096 bytes page aligned: 2808.48 MB/s 4096 bytes page aligned: 2809.23 MB/s 4096 bytes page aligned: 2809.10 MB/s 4096 bytes page aligned: 2808.32 MB/s 4096 bytes page aligned: 2808.85 MB/s kernel memset (original): 4096 bytes page aligned: 4285.77 MB/s 4096 bytes page aligned: 4286.95 MB/s 4096 bytes page aligned: 4285.80 MB/s 4096 bytes page aligned: 4287.03 MB/s 4096 bytes page aligned: 4286.30 MB/s kernel memset (optimized): 4096 bytes page aligned: 4332.88 MB/s 4096 bytes page aligned: 4333.13 MB/s 4096 bytes page aligned: 4332.22 MB/s 4096 bytes page aligned: 4333.00 MB/s 4096 bytes page aligned: 4331.64 MB/s kernel memzero (original): 4096 bytes page aligned: 4286.68 MB/s 4096 bytes page aligned: 4286.68 MB/s 4096 bytes page aligned: 4286.96 MB/s 4096 bytes page aligned: 4286.31 MB/s 4096 bytes page aligned: 4285.41 MB/s kernel memzero (optimized): 4096 bytes page aligned: 4307.47 MB/s 4096 bytes page aligned: 4306.33 MB/s 4096 bytes page aligned: 4307.97 MB/s 4096 bytes page aligned: 4305.94 MB/s 4096 bytes page aligned: 4307.61 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 1150.12 MB/s Mixed from 1 to 1023 (power law), unaligned: 1149.80 MB/s Mixed from 1 to 1023 (power law), unaligned: 1150.06 MB/s Mixed from 1 to 1023 (power law), unaligned: 1149.76 MB/s Mixed from 1 to 1023 (power law), unaligned: 1149.91 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 1482.23 MB/s Mixed from 1 to 1023 (power law), unaligned: 1483.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 1483.42 MB/s Mixed from 1 to 1023 (power law), unaligned: 1482.48 MB/s Mixed from 1 to 1023 (power law), unaligned: 1483.19 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 1683.39 MB/s Mixed from 1 to 1023 (power law), unaligned: 1680.19 MB/s Mixed from 1 to 1023 (power law), unaligned: 1681.58 MB/s Mixed from 1 to 1023 (power law), unaligned: 1680.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 1680.06 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 1357.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 1357.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 1356.41 MB/s Mixed from 1 to 1023 (power law), unaligned: 1357.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 1356.60 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 1469.08 MB/s Mixed from 1 to 1023 (power law), unaligned: 1470.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 1469.47 MB/s Mixed from 1 to 1023 (power law), unaligned: 1468.80 MB/s Mixed from 1 to 1023 (power law), unaligned: 1469.37 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.54 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.27 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.78 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.52 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.50 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.22 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.17 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.16 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.08 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.19 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 852.17 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 852.53 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 852.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 852.44 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 852.45 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 455.51 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 457.69 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 455.01 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 455.30 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 455.68 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 512.36 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 512.02 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 512.47 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 512.47 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 512.66 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 538.32 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 537.83 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 538.36 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 538.29 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 539.25 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 392.90 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 388.25 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 388.67 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 392.51 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 392.09 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 433.21 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 433.73 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 433.34 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 433.91 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 433.43 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 474.10 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 474.06 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 474.29 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 474.10 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 473.95 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 455.22 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.10 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 454.55 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 454.71 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 454.86 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 429.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 429.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 429.42 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 429.12 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 429.59 MB/s libc memcpy: 4096 bytes page aligned: 2698.97 MB/s 4096 bytes page aligned: 2703.85 MB/s 4096 bytes page aligned: 2706.42 MB/s 4096 bytes page aligned: 2701.26 MB/s 4096 bytes page aligned: 2699.65 MB/s kernel memcpy (original): 4096 bytes page aligned: 2735.92 MB/s 4096 bytes page aligned: 2735.76 MB/s 4096 bytes page aligned: 2739.53 MB/s 4096 bytes page aligned: 2737.95 MB/s 4096 bytes page aligned: 2735.23 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 2016.76 MB/s 4096 bytes page aligned: 2015.85 MB/s 4096 bytes page aligned: 2016.87 MB/s 4096 bytes page aligned: 2015.99 MB/s 4096 bytes page aligned: 2018.49 MB/s kernel copy_page (original): 4096 bytes page aligned: 1533.05 MB/s 4096 bytes page aligned: 1533.36 MB/s 4096 bytes page aligned: 1533.81 MB/s 4096 bytes page aligned: 1533.62 MB/s 4096 bytes page aligned: 1533.05 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 2016.48 MB/s 4096 bytes page aligned: 2019.79 MB/s 4096 bytes page aligned: 2016.49 MB/s 4096 bytes page aligned: 2017.68 MB/s 4096 bytes page aligned: 2018.23 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 640.12 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.23 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.34 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.36 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 681.11 MB/s Mixed from 1 to 1023 (power law), unaligned: 680.79 MB/s Mixed from 1 to 1023 (power law), unaligned: 681.19 MB/s Mixed from 1 to 1023 (power law), unaligned: 680.93 MB/s Mixed from 1 to 1023 (power law), unaligned: 681.05 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 645.50 MB/s Mixed from 1 to 1023 (power law), unaligned: 644.98 MB/s Mixed from 1 to 1023 (power law), unaligned: 645.10 MB/s Mixed from 1 to 1023 (power law), unaligned: 644.91 MB/s Mixed from 1 to 1023 (power law), unaligned: 645.03 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1246.47 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1246.77 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1246.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1246.87 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1246.58 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1609.02 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1612.50 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1612.66 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1614.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1609.93 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1744.85 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1747.18 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1748.65 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1745.03 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1745.42 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1509.51 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1510.41 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1509.70 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1508.00 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1508.73 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1615.44 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1617.76 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1612.05 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1616.54 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1610.91 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 735.51 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 735.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 735.62 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 735.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 735.83 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 884.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 884.39 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 884.11 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 885.90 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 884.09 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1025.79 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1025.70 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1025.98 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1025.56 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1025.59 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 831.09 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 830.34 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 830.77 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 830.50 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 830.64 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 919.83 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 920.16 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 919.50 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 919.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 920.02 MB/s libc memset: 4096 bytes page aligned: 2789.85 MB/s 4096 bytes page aligned: 2790.47 MB/s 4096 bytes page aligned: 2789.64 MB/s 4096 bytes page aligned: 2790.60 MB/s 4096 bytes page aligned: 2789.42 MB/s kernel memset (original): 4096 bytes page aligned: 4292.31 MB/s 4096 bytes page aligned: 4292.19 MB/s 4096 bytes page aligned: 4291.39 MB/s 4096 bytes page aligned: 4291.91 MB/s 4096 bytes page aligned: 4291.29 MB/s kernel memset (optimized): 4096 bytes page aligned: 4321.51 MB/s 4096 bytes page aligned: 4319.98 MB/s 4096 bytes page aligned: 4321.53 MB/s 4096 bytes page aligned: 4319.93 MB/s 4096 bytes page aligned: 4321.46 MB/s kernel memzero (original): 4096 bytes page aligned: 4243.19 MB/s 4096 bytes page aligned: 4242.35 MB/s 4096 bytes page aligned: 4243.32 MB/s 4096 bytes page aligned: 4242.29 MB/s 4096 bytes page aligned: 4243.34 MB/s kernel memzero (optimized): 4096 bytes page aligned: 4261.67 MB/s 4096 bytes page aligned: 4262.59 MB/s 4096 bytes page aligned: 4262.13 MB/s 4096 bytes page aligned: 4262.75 MB/s 4096 bytes page aligned: 4262.62 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 1084.53 MB/s Mixed from 1 to 1023 (power law), unaligned: 1084.89 MB/s Mixed from 1 to 1023 (power law), unaligned: 1084.61 MB/s Mixed from 1 to 1023 (power law), unaligned: 1084.71 MB/s Mixed from 1 to 1023 (power law), unaligned: 1084.43 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 1364.45 MB/s Mixed from 1 to 1023 (power law), unaligned: 1363.67 MB/s Mixed from 1 to 1023 (power law), unaligned: 1364.87 MB/s Mixed from 1 to 1023 (power law), unaligned: 1364.47 MB/s Mixed from 1 to 1023 (power law), unaligned: 1364.17 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 1508.02 MB/s Mixed from 1 to 1023 (power law), unaligned: 1510.44 MB/s Mixed from 1 to 1023 (power law), unaligned: 1508.57 MB/s Mixed from 1 to 1023 (power law), unaligned: 1508.86 MB/s Mixed from 1 to 1023 (power law), unaligned: 1510.14 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 1261.52 MB/s Mixed from 1 to 1023 (power law), unaligned: 1261.24 MB/s Mixed from 1 to 1023 (power law), unaligned: 1262.57 MB/s Mixed from 1 to 1023 (power law), unaligned: 1260.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 1261.35 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 1412.76 MB/s Mixed from 1 to 1023 (power law), unaligned: 1412.17 MB/s Mixed from 1 to 1023 (power law), unaligned: 1413.32 MB/s Mixed from 1 to 1023 (power law), unaligned: 1412.77 MB/s Mixed from 1 to 1023 (power law), unaligned: 1413.13 MB/s ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 17:24 ` Willy Tarreau @ 2013-07-13 21:51 ` Harm Hanemaaijer 2013-07-14 6:13 ` Willy Tarreau 0 siblings, 1 reply; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-13 21:51 UTC (permalink / raw) To: linux-arm-kernel Willy Tarreau <w <at> 1wt.eu> writes: > OK I've run bench.script on the following platforms : Thanks, that's incredibly helpful! Note that Thumb2 mode usually doesn't do much in synthetic benchmarks, because the benchmark code will fit into the L1 instruction cache; the benefit of Thumb2 happens in real-world usage when the active code footprint becomes larger. To summarize, memset seems to be in good shape and also the "fast path" for common word-aligned memcpy of size <= 256 seems to be working well. However, the copy_page and memcpy results for larger sizes seem to suggest that the prefetch strategy isn't working well on these platforms. Note also that on the quad core the existing copy_page is also highly sub-optimal. Fixing the preload strategy for these platforms may simply be a case of changing the configurable constant PREFETCH_DISTANCE from 3 to 2 (from an offset of 192 bytes to 128 bytes), which more closely mimics the original kernel memcpy. I have added PREFETCH_DISTANCE as a configurable parameter in the Makefile in the latest version of test-arm-kernel-memcpy. It will be interesting to see the results of testing with a PREFETCH_DISTANCE of 2 especially on the quad-core platform or a similar one. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-13 21:51 ` Harm Hanemaaijer @ 2013-07-14 6:13 ` Willy Tarreau 2013-07-14 11:00 ` Harm Hanemaaijer 0 siblings, 1 reply; 18+ messages in thread From: Willy Tarreau @ 2013-07-14 6:13 UTC (permalink / raw) To: linux-arm-kernel Hi, On Sat, Jul 13, 2013 at 09:51:18PM +0000, Harm Hanemaaijer wrote: > Willy Tarreau <w <at> 1wt.eu> writes: > > > OK I've run bench.script on the following platforms : > > Thanks, that's incredibly helpful! > > Note that Thumb2 mode usually doesn't do much in synthetic benchmarks, > because the benchmark code will fit into the L1 instruction cache; the > benefit of Thumb2 happens in real-world usage when the active code > footprint becomes larger. > > To summarize, memset seems to be in good shape and also the "fast path" > for common word-aligned memcpy of size <= 256 seems to be working well. > > However, the copy_page and memcpy results for larger sizes seem to suggest > that the prefetch strategy isn't working well on these platforms. Note also > that on the quad core the existing copy_page is also highly sub-optimal. > > Fixing the preload strategy for these platforms may simply be a case of > changing the configurable constant PREFETCH_DISTANCE from 3 to 2 (from an > offset of 192 bytes to 128 bytes), which more closely mimics the original > kernel memcpy. I have added PREFETCH_DISTANCE as a configurable parameter > in the Makefile in the latest version of test-arm-kernel-memcpy. It will > be interesting to see the results of testing with a PREFETCH_DISTANCE > of 2 especially on the quad-core platform or a similar one. No problem, I ran it on the two in armv7+thumb mode again. Please find the results attached. It seems that memcpy improved by 0.8% though that's not even certain. Regards, Willy -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.97 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.98 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.96 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.88 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.63 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 955.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 955.36 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 955.71 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 955.41 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 955.66 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 850.25 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 850.26 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 850.16 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 849.91 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 850.27 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 454.00 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 457.50 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 453.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 456.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 454.23 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 508.77 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 508.95 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 509.26 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 509.19 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 509.46 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 523.20 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 523.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 523.31 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 523.09 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 523.62 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 389.04 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 388.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 387.82 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 387.74 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 387.92 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 429.52 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 430.19 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 430.10 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 430.02 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 429.45 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 473.75 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 474.00 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 473.59 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 473.24 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 473.65 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 452.37 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 452.11 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 452.91 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 451.84 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 452.71 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 427.17 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 427.11 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 426.57 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 426.67 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 427.11 MB/s libc memcpy: 4096 bytes page aligned: 2703.64 MB/s 4096 bytes page aligned: 2702.35 MB/s 4096 bytes page aligned: 2705.23 MB/s 4096 bytes page aligned: 2702.31 MB/s 4096 bytes page aligned: 2703.18 MB/s kernel memcpy (original): 4096 bytes page aligned: 2735.75 MB/s 4096 bytes page aligned: 2736.98 MB/s 4096 bytes page aligned: 2739.54 MB/s 4096 bytes page aligned: 2736.56 MB/s 4096 bytes page aligned: 2735.81 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 2019.77 MB/s 4096 bytes page aligned: 2019.01 MB/s 4096 bytes page aligned: 2019.78 MB/s 4096 bytes page aligned: 2019.88 MB/s 4096 bytes page aligned: 2018.68 MB/s kernel copy_page (original): 4096 bytes page aligned: 1533.13 MB/s 4096 bytes page aligned: 1532.51 MB/s 4096 bytes page aligned: 1534.12 MB/s 4096 bytes page aligned: 1532.53 MB/s 4096 bytes page aligned: 1533.16 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 2012.66 MB/s 4096 bytes page aligned: 2013.76 MB/s 4096 bytes page aligned: 2013.53 MB/s 4096 bytes page aligned: 2013.34 MB/s 4096 bytes page aligned: 2013.62 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 641.26 MB/s Mixed from 1 to 1023 (power law), unaligned: 641.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.95 MB/s Mixed from 1 to 1023 (power law), unaligned: 641.30 MB/s Mixed from 1 to 1023 (power law), unaligned: 640.65 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 677.55 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.50 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.51 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 676.69 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 660.80 MB/s Mixed from 1 to 1023 (power law), unaligned: 660.89 MB/s Mixed from 1 to 1023 (power law), unaligned: 660.50 MB/s Mixed from 1 to 1023 (power law), unaligned: 660.72 MB/s Mixed from 1 to 1023 (power law), unaligned: 661.12 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1241.64 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1242.02 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1241.66 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1241.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1241.57 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1603.86 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1608.36 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1605.22 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1606.88 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1606.02 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1733.22 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1729.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1737.01 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1734.14 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1733.59 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1509.90 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1507.44 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1508.64 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1508.11 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1505.42 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1616.59 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1616.74 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1617.85 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1613.74 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1621.71 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 742.55 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 742.68 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 742.64 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 742.52 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 742.60 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 893.16 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 893.35 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 893.18 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 893.45 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 893.39 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1028.50 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1028.49 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1028.30 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1028.37 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1028.22 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 839.00 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 838.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 839.01 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 838.93 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 838.96 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 930.07 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 930.04 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 930.11 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 930.09 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 930.08 MB/s libc memset: 4096 bytes page aligned: 2787.64 MB/s 4096 bytes page aligned: 2788.50 MB/s 4096 bytes page aligned: 2788.44 MB/s 4096 bytes page aligned: 2788.39 MB/s 4096 bytes page aligned: 2788.18 MB/s kernel memset (original): 4096 bytes page aligned: 4285.78 MB/s 4096 bytes page aligned: 4286.76 MB/s 4096 bytes page aligned: 4285.85 MB/s 4096 bytes page aligned: 4286.59 MB/s 4096 bytes page aligned: 4285.58 MB/s kernel memset (optimized): 4096 bytes page aligned: 4314.98 MB/s 4096 bytes page aligned: 4314.69 MB/s 4096 bytes page aligned: 4314.15 MB/s 4096 bytes page aligned: 4314.67 MB/s 4096 bytes page aligned: 4313.65 MB/s kernel memzero (original): 4096 bytes page aligned: 4242.90 MB/s 4096 bytes page aligned: 4241.60 MB/s 4096 bytes page aligned: 4242.77 MB/s 4096 bytes page aligned: 4241.56 MB/s 4096 bytes page aligned: 4243.05 MB/s kernel memzero (optimized): 4096 bytes page aligned: 4265.52 MB/s 4096 bytes page aligned: 4264.31 MB/s 4096 bytes page aligned: 4265.14 MB/s 4096 bytes page aligned: 4264.22 MB/s 4096 bytes page aligned: 4265.74 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 1083.33 MB/s Mixed from 1 to 1023 (power law), unaligned: 1083.76 MB/s Mixed from 1 to 1023 (power law), unaligned: 1083.22 MB/s Mixed from 1 to 1023 (power law), unaligned: 1083.63 MB/s Mixed from 1 to 1023 (power law), unaligned: 1083.44 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 1361.29 MB/s Mixed from 1 to 1023 (power law), unaligned: 1362.14 MB/s Mixed from 1 to 1023 (power law), unaligned: 1361.44 MB/s Mixed from 1 to 1023 (power law), unaligned: 1362.91 MB/s Mixed from 1 to 1023 (power law), unaligned: 1361.52 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 1511.68 MB/s Mixed from 1 to 1023 (power law), unaligned: 1511.65 MB/s Mixed from 1 to 1023 (power law), unaligned: 1512.21 MB/s Mixed from 1 to 1023 (power law), unaligned: 1512.55 MB/s Mixed from 1 to 1023 (power law), unaligned: 1512.37 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 1259.19 MB/s Mixed from 1 to 1023 (power law), unaligned: 1259.69 MB/s Mixed from 1 to 1023 (power law), unaligned: 1260.27 MB/s Mixed from 1 to 1023 (power law), unaligned: 1259.07 MB/s Mixed from 1 to 1023 (power law), unaligned: 1260.15 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 1410.53 MB/s Mixed from 1 to 1023 (power law), unaligned: 1410.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 1410.48 MB/s Mixed from 1 to 1023 (power law), unaligned: 1408.95 MB/s Mixed from 1 to 1023 (power law), unaligned: 1412.63 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 944.18 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 943.83 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 944.12 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 943.90 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 944.20 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.62 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.90 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.98 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.64 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1000.03 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 869.93 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.49 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.24 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.35 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 870.49 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 505.38 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 505.22 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 505.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 505.57 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 505.54 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.00 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 540.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.01 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 541.03 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 549.25 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 549.45 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 549.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 549.20 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 549.48 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 425.16 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.82 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.51 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.70 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 425.59 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 458.28 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.62 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.25 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 458.18 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 459.43 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 501.98 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 502.06 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 501.65 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 502.31 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 502.14 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 484.64 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 484.08 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 483.97 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.09 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 485.96 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 455.69 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.98 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.98 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 455.97 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 457.07 MB/s libc memcpy: 4096 bytes page aligned: 2739.85 MB/s 4096 bytes page aligned: 2738.74 MB/s 4096 bytes page aligned: 2739.70 MB/s 4096 bytes page aligned: 2738.93 MB/s 4096 bytes page aligned: 2739.83 MB/s kernel memcpy (original): 4096 bytes page aligned: 2770.15 MB/s 4096 bytes page aligned: 2772.07 MB/s 4096 bytes page aligned: 2771.84 MB/s 4096 bytes page aligned: 2770.57 MB/s 4096 bytes page aligned: 2771.75 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 2016.25 MB/s 4096 bytes page aligned: 2017.41 MB/s 4096 bytes page aligned: 2017.92 MB/s 4096 bytes page aligned: 2019.81 MB/s 4096 bytes page aligned: 2016.19 MB/s kernel copy_page (original): 4096 bytes page aligned: 1537.52 MB/s 4096 bytes page aligned: 1537.46 MB/s 4096 bytes page aligned: 1536.99 MB/s 4096 bytes page aligned: 1537.60 MB/s 4096 bytes page aligned: 1536.97 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 2032.28 MB/s 4096 bytes page aligned: 2031.33 MB/s 4096 bytes page aligned: 2032.23 MB/s 4096 bytes page aligned: 2032.35 MB/s 4096 bytes page aligned: 2031.26 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 678.17 MB/s Mixed from 1 to 1023 (power law), unaligned: 677.84 MB/s Mixed from 1 to 1023 (power law), unaligned: 678.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 678.03 MB/s Mixed from 1 to 1023 (power law), unaligned: 678.14 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 706.55 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.71 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 706.90 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 691.01 MB/s Mixed from 1 to 1023 (power law), unaligned: 691.40 MB/s Mixed from 1 to 1023 (power law), unaligned: 691.07 MB/s Mixed from 1 to 1023 (power law), unaligned: 691.55 MB/s Mixed from 1 to 1023 (power law), unaligned: 691.35 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1279.54 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1280.04 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1279.75 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1279.82 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1279.46 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1700.89 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1699.79 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1699.45 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1699.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1699.12 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1859.00 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1855.05 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1857.88 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1858.97 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1855.57 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1603.50 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1603.51 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1602.76 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1603.89 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1604.60 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1653.52 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1652.73 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1654.63 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1652.44 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1654.76 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 777.78 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 777.85 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 777.78 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 777.86 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 777.86 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 966.31 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 966.26 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 966.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 966.31 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 966.12 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1161.60 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1161.58 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1161.33 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1161.54 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 1161.27 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 912.78 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 912.68 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 912.72 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 912.83 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 912.75 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 978.47 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 978.58 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 978.63 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 978.51 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 977.65 MB/s libc memset: 4096 bytes page aligned: 2809.19 MB/s 4096 bytes page aligned: 2809.15 MB/s 4096 bytes page aligned: 2809.19 MB/s 4096 bytes page aligned: 2808.39 MB/s 4096 bytes page aligned: 2809.20 MB/s kernel memset (original): 4096 bytes page aligned: 4286.67 MB/s 4096 bytes page aligned: 4287.73 MB/s 4096 bytes page aligned: 4287.69 MB/s 4096 bytes page aligned: 4287.50 MB/s 4096 bytes page aligned: 4287.77 MB/s kernel memset (optimized): 4096 bytes page aligned: 4332.86 MB/s 4096 bytes page aligned: 4333.92 MB/s 4096 bytes page aligned: 4332.87 MB/s 4096 bytes page aligned: 4333.86 MB/s 4096 bytes page aligned: 4332.81 MB/s kernel memzero (original): 4096 bytes page aligned: 4286.77 MB/s 4096 bytes page aligned: 4286.73 MB/s 4096 bytes page aligned: 4285.68 MB/s 4096 bytes page aligned: 4286.65 MB/s 4096 bytes page aligned: 4285.85 MB/s kernel memzero (optimized): 4096 bytes page aligned: 4308.08 MB/s 4096 bytes page aligned: 4307.07 MB/s 4096 bytes page aligned: 4308.18 MB/s 4096 bytes page aligned: 4307.95 MB/s 4096 bytes page aligned: 4306.85 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 1156.13 MB/s Mixed from 1 to 1023 (power law), unaligned: 1156.08 MB/s Mixed from 1 to 1023 (power law), unaligned: 1156.25 MB/s Mixed from 1 to 1023 (power law), unaligned: 1156.23 MB/s Mixed from 1 to 1023 (power law), unaligned: 1156.31 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 1491.20 MB/s Mixed from 1 to 1023 (power law), unaligned: 1491.11 MB/s Mixed from 1 to 1023 (power law), unaligned: 1491.80 MB/s Mixed from 1 to 1023 (power law), unaligned: 1491.44 MB/s Mixed from 1 to 1023 (power law), unaligned: 1491.66 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 1690.43 MB/s Mixed from 1 to 1023 (power law), unaligned: 1691.03 MB/s Mixed from 1 to 1023 (power law), unaligned: 1693.37 MB/s Mixed from 1 to 1023 (power law), unaligned: 1691.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 1691.96 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 1364.67 MB/s Mixed from 1 to 1023 (power law), unaligned: 1365.10 MB/s Mixed from 1 to 1023 (power law), unaligned: 1364.98 MB/s Mixed from 1 to 1023 (power law), unaligned: 1365.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 1365.25 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 1475.90 MB/s Mixed from 1 to 1023 (power law), unaligned: 1476.30 MB/s Mixed from 1 to 1023 (power law), unaligned: 1476.07 MB/s Mixed from 1 to 1023 (power law), unaligned: 1476.49 MB/s Mixed from 1 to 1023 (power law), unaligned: 1476.28 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 652.61 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 649.67 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 652.72 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 649.61 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 652.57 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 673.87 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 677.13 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 677.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 677.41 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 677.17 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 662.60 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 663.56 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 659.15 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 664.26 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 659.52 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 364.58 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 364.71 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 362.93 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 364.58 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.00 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 382.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 380.45 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 382.24 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 380.23 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 382.24 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 424.01 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 421.91 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 423.94 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 421.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 423.90 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 311.50 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 312.98 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 311.42 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 312.96 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 312.97 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 327.64 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 329.20 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 327.67 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 329.21 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 327.65 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 367.15 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 365.31 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 367.18 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 367.12 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 365.37 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 365.11 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 363.52 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 365.17 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 363.37 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 365.18 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 368.24 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 368.29 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 368.23 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 366.48 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 368.24 MB/s libc memcpy: 4096 bytes page aligned: 358.42 MB/s 4096 bytes page aligned: 360.12 MB/s 4096 bytes page aligned: 358.39 MB/s 4096 bytes page aligned: 360.09 MB/s 4096 bytes page aligned: 358.45 MB/s kernel memcpy (original): 4096 bytes page aligned: 360.40 MB/s 4096 bytes page aligned: 358.72 MB/s 4096 bytes page aligned: 360.39 MB/s 4096 bytes page aligned: 358.79 MB/s 4096 bytes page aligned: 360.46 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 342.08 MB/s 4096 bytes page aligned: 343.69 MB/s 4096 bytes page aligned: 341.96 MB/s 4096 bytes page aligned: 343.70 MB/s 4096 bytes page aligned: 342.10 MB/s kernel copy_page (original): 4096 bytes page aligned: 386.91 MB/s 4096 bytes page aligned: 385.04 MB/s 4096 bytes page aligned: 386.90 MB/s 4096 bytes page aligned: 385.13 MB/s 4096 bytes page aligned: 386.90 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 341.49 MB/s 4096 bytes page aligned: 343.25 MB/s 4096 bytes page aligned: 343.26 MB/s 4096 bytes page aligned: 343.20 MB/s 4096 bytes page aligned: 343.12 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 514.14 MB/s Mixed from 1 to 1023 (power law), unaligned: 515.74 MB/s Mixed from 1 to 1023 (power law), unaligned: 514.14 MB/s Mixed from 1 to 1023 (power law), unaligned: 515.79 MB/s Mixed from 1 to 1023 (power law), unaligned: 514.18 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 540.90 MB/s Mixed from 1 to 1023 (power law), unaligned: 537.63 MB/s Mixed from 1 to 1023 (power law), unaligned: 539.82 MB/s Mixed from 1 to 1023 (power law), unaligned: 540.33 MB/s Mixed from 1 to 1023 (power law), unaligned: 537.00 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 540.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 537.17 MB/s Mixed from 1 to 1023 (power law), unaligned: 540.38 MB/s Mixed from 1 to 1023 (power law), unaligned: 539.03 MB/s Mixed from 1 to 1023 (power law), unaligned: 542.41 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.70 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.56 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 877.40 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 881.52 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.65 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 958.99 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 954.36 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 959.20 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 958.94 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.30 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1004.01 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.36 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 1004.03 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 999.32 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 925.38 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 925.25 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 920.83 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 925.23 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 920.99 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 933.68 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 929.32 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 933.83 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 933.73 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 933.68 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 521.29 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 518.76 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 521.32 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 518.80 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 521.31 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 588.12 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 590.97 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 591.00 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 588.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 590.94 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 645.02 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 648.18 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 645.16 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 648.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 648.04 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.18 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.19 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 566.41 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.04 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 566.44 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 587.84 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 585.04 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 587.75 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 587.79 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 585.07 MB/s libc memset: 4096 bytes page aligned: 2052.96 MB/s 4096 bytes page aligned: 2042.84 MB/s 4096 bytes page aligned: 2052.52 MB/s 4096 bytes page aligned: 2043.01 MB/s 4096 bytes page aligned: 2052.58 MB/s kernel memset (original): 4096 bytes page aligned: 1912.63 MB/s 4096 bytes page aligned: 1922.23 MB/s 4096 bytes page aligned: 1921.84 MB/s 4096 bytes page aligned: 1912.60 MB/s 4096 bytes page aligned: 1921.86 MB/s kernel memset (optimized): 4096 bytes page aligned: 1892.39 MB/s 4096 bytes page aligned: 1901.32 MB/s 4096 bytes page aligned: 1892.51 MB/s 4096 bytes page aligned: 1901.22 MB/s 4096 bytes page aligned: 1901.58 MB/s kernel memzero (original): 4096 bytes page aligned: 1920.75 MB/s 4096 bytes page aligned: 1920.38 MB/s 4096 bytes page aligned: 1911.56 MB/s 4096 bytes page aligned: 1920.81 MB/s 4096 bytes page aligned: 1911.45 MB/s kernel memzero (optimized): 4096 bytes page aligned: 1928.78 MB/s 4096 bytes page aligned: 1919.76 MB/s 4096 bytes page aligned: 1928.75 MB/s 4096 bytes page aligned: 1929.09 MB/s 4096 bytes page aligned: 1919.61 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 785.51 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.66 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.54 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.71 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.41 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 816.79 MB/s Mixed from 1 to 1023 (power law), unaligned: 820.37 MB/s Mixed from 1 to 1023 (power law), unaligned: 820.29 MB/s Mixed from 1 to 1023 (power law), unaligned: 817.25 MB/s Mixed from 1 to 1023 (power law), unaligned: 820.35 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 880.18 MB/s Mixed from 1 to 1023 (power law), unaligned: 884.47 MB/s Mixed from 1 to 1023 (power law), unaligned: 880.03 MB/s Mixed from 1 to 1023 (power law), unaligned: 884.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 884.00 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 797.30 MB/s Mixed from 1 to 1023 (power law), unaligned: 800.99 MB/s Mixed from 1 to 1023 (power law), unaligned: 797.06 MB/s Mixed from 1 to 1023 (power law), unaligned: 800.49 MB/s Mixed from 1 to 1023 (power law), unaligned: 797.08 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 813.62 MB/s Mixed from 1 to 1023 (power law), unaligned: 813.55 MB/s Mixed from 1 to 1023 (power law), unaligned: 813.41 MB/s Mixed from 1 to 1023 (power law), unaligned: 813.81 MB/s Mixed from 1 to 1023 (power law), unaligned: 809.52 MB/s -------------- next part -------------- libc memcpy: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 628.06 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 623.94 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 626.71 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 623.43 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 627.13 MB/s kernel memcpy (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 657.41 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 661.00 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 660.91 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 659.46 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 661.87 MB/s kernel memcpy (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 657.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 661.33 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 659.10 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 662.16 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 658.66 MB/s libc memcpy: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 332.21 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 330.70 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 332.24 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 332.27 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 330.55 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.62 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 361.89 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.65 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 361.77 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 363.54 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 397.26 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 399.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 397.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 399.11 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 399.11 MB/s libc memcpy: Mixed multiples of 4 from 4 to 130, word aligned: 292.31 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 292.31 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 290.92 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 292.26 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 290.86 MB/s kernel memcpy (original): Mixed multiples of 4 from 4 to 130, word aligned: 311.41 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 309.88 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 311.35 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 309.86 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 311.41 MB/s kernel memcpy (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 343.87 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 343.89 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 343.85 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 342.24 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 343.91 MB/s kernel copy_from_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 336.13 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 337.70 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.16 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 337.76 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.12 MB/s kernel copy_to_user (optimized): Mixed multiples of 4 from 4 to 130, word aligned: 336.24 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 334.60 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.29 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.30 MB/s Mixed multiples of 4 from 4 to 130, word aligned: 336.28 MB/s libc memcpy: 4096 bytes page aligned: 350.93 MB/s 4096 bytes page aligned: 350.87 MB/s 4096 bytes page aligned: 350.86 MB/s 4096 bytes page aligned: 349.12 MB/s 4096 bytes page aligned: 350.82 MB/s kernel memcpy (original): 4096 bytes page aligned: 349.41 MB/s 4096 bytes page aligned: 351.20 MB/s 4096 bytes page aligned: 349.45 MB/s 4096 bytes page aligned: 351.11 MB/s 4096 bytes page aligned: 349.44 MB/s kernel memcpy (optimized): 4096 bytes page aligned: 335.77 MB/s 4096 bytes page aligned: 334.08 MB/s 4096 bytes page aligned: 335.69 MB/s 4096 bytes page aligned: 334.18 MB/s 4096 bytes page aligned: 335.80 MB/s kernel copy_page (original): 4096 bytes page aligned: 376.23 MB/s 4096 bytes page aligned: 377.99 MB/s 4096 bytes page aligned: 376.22 MB/s 4096 bytes page aligned: 378.12 MB/s 4096 bytes page aligned: 376.26 MB/s kernel copy_page (optimized): 4096 bytes page aligned: 335.23 MB/s 4096 bytes page aligned: 333.74 MB/s 4096 bytes page aligned: 335.35 MB/s 4096 bytes page aligned: 333.73 MB/s 4096 bytes page aligned: 335.24 MB/s libc memcpy: Mixed from 1 to 1023 (power law), unaligned: 491.15 MB/s Mixed from 1 to 1023 (power law), unaligned: 494.03 MB/s Mixed from 1 to 1023 (power law), unaligned: 491.42 MB/s Mixed from 1 to 1023 (power law), unaligned: 493.73 MB/s Mixed from 1 to 1023 (power law), unaligned: 493.67 MB/s kernel memcpy (original): Mixed from 1 to 1023 (power law), unaligned: 511.36 MB/s Mixed from 1 to 1023 (power law), unaligned: 511.31 MB/s Mixed from 1 to 1023 (power law), unaligned: 508.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 510.07 MB/s Mixed from 1 to 1023 (power law), unaligned: 508.48 MB/s kernel memcpy (optimized): Mixed from 1 to 1023 (power law), unaligned: 504.81 MB/s Mixed from 1 to 1023 (power law), unaligned: 502.20 MB/s Mixed from 1 to 1023 (power law), unaligned: 504.56 MB/s Mixed from 1 to 1023 (power law), unaligned: 502.11 MB/s Mixed from 1 to 1023 (power law), unaligned: 504.76 MB/s libc memset: Mixed powers of 2 from 4 to 4096 (power law), word aligned: 848.27 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 848.05 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 848.22 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 844.06 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 848.15 MB/s kernel memset (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 904.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 908.54 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 904.19 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 908.48 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 903.71 MB/s kernel memset (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 950.89 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 951.03 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 946.37 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 950.95 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 946.38 MB/s kernel memzero (original): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 861.66 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 857.97 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 861.77 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 857.91 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 861.79 MB/s kernel memzero (optimized): Mixed powers of 2 from 4 to 4096 (power law), word aligned: 895.24 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 895.20 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 895.13 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 890.91 MB/s Mixed powers of 2 from 4 to 4096 (power law), word aligned: 895.07 MB/s libc memset: Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 501.37 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 503.81 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 501.35 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 503.73 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 501.30 MB/s kernel memset (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.17 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.07 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.06 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 566.40 MB/s kernel memset (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 621.23 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 618.26 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 621.15 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 618.15 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 621.22 MB/s kernel memzero (original): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 535.10 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 537.69 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 537.67 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 535.13 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 537.73 MB/s kernel memzero (optimized): Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 566.99 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.74 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.10 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 569.83 MB/s Mixed multiples of 4 from 4 to 1024 (power law), word aligned: 567.03 MB/s libc memset: 4096 bytes page aligned: 2041.83 MB/s 4096 bytes page aligned: 2032.34 MB/s 4096 bytes page aligned: 2042.07 MB/s 4096 bytes page aligned: 2042.09 MB/s 4096 bytes page aligned: 2031.88 MB/s kernel memset (original): 4096 bytes page aligned: 1922.09 MB/s 4096 bytes page aligned: 1912.70 MB/s 4096 bytes page aligned: 1922.13 MB/s 4096 bytes page aligned: 1912.52 MB/s 4096 bytes page aligned: 1921.78 MB/s kernel memset (optimized): 4096 bytes page aligned: 1913.71 MB/s 4096 bytes page aligned: 1923.03 MB/s 4096 bytes page aligned: 1913.67 MB/s 4096 bytes page aligned: 1922.56 MB/s 4096 bytes page aligned: 1923.01 MB/s kernel memzero (original): 4096 bytes page aligned: 1888.00 MB/s 4096 bytes page aligned: 1897.21 MB/s 4096 bytes page aligned: 1887.74 MB/s 4096 bytes page aligned: 1896.99 MB/s 4096 bytes page aligned: 1887.97 MB/s kernel memzero (optimized): 4096 bytes page aligned: 1898.35 MB/s 4096 bytes page aligned: 1888.97 MB/s 4096 bytes page aligned: 1897.97 MB/s 4096 bytes page aligned: 1889.20 MB/s 4096 bytes page aligned: 1898.33 MB/s libc memset: Mixed from 1 to 1023 (power law), unaligned: 735.51 MB/s Mixed from 1 to 1023 (power law), unaligned: 732.16 MB/s Mixed from 1 to 1023 (power law), unaligned: 735.44 MB/s Mixed from 1 to 1023 (power law), unaligned: 731.94 MB/s Mixed from 1 to 1023 (power law), unaligned: 735.37 MB/s kernel memset (original): Mixed from 1 to 1023 (power law), unaligned: 782.22 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.91 MB/s Mixed from 1 to 1023 (power law), unaligned: 782.22 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.91 MB/s Mixed from 1 to 1023 (power law), unaligned: 785.99 MB/s kernel memset (optimized): Mixed from 1 to 1023 (power law), unaligned: 818.63 MB/s Mixed from 1 to 1023 (power law), unaligned: 818.80 MB/s Mixed from 1 to 1023 (power law), unaligned: 815.12 MB/s Mixed from 1 to 1023 (power law), unaligned: 818.64 MB/s Mixed from 1 to 1023 (power law), unaligned: 814.92 MB/s kernel memzero (original): Mixed from 1 to 1023 (power law), unaligned: 748.04 MB/s Mixed from 1 to 1023 (power law), unaligned: 745.01 MB/s Mixed from 1 to 1023 (power law), unaligned: 748.67 MB/s Mixed from 1 to 1023 (power law), unaligned: 744.85 MB/s Mixed from 1 to 1023 (power law), unaligned: 748.90 MB/s kernel memzero (optimized): Mixed from 1 to 1023 (power law), unaligned: 784.81 MB/s Mixed from 1 to 1023 (power law), unaligned: 781.09 MB/s Mixed from 1 to 1023 (power law), unaligned: 784.40 MB/s Mixed from 1 to 1023 (power law), unaligned: 780.62 MB/s Mixed from 1 to 1023 (power law), unaligned: 784.59 MB/s ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 6:13 ` Willy Tarreau @ 2013-07-14 11:00 ` Harm Hanemaaijer 2013-07-14 13:09 ` Russell King - ARM Linux 2013-07-14 15:21 ` Siarhei Siamashka 0 siblings, 2 replies; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-14 11:00 UTC (permalink / raw) To: linux-arm-kernel Willy Tarreau <w <at> 1wt.eu> writes: > > Please find the results attached. It seems that memcpy improved by 0.8% > though that's not even certain. > What is interesting is that http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html, and several other sources (such as other optimized memcpy implementations) document the cache line size of the Cortex A9 as 32 bytes, which is an anomaly in the armv7 family. However, it looks like the kernel is defining L1_CACHE_BYTES as 64 (L1_CACHE_SHIFT == 6) for all armv7 platforms, which looks like a serious configuring error for Cortex A9. This explains why the large size memcpy results that you posted are not optimal, and also explains the below-par copy_page performance in the current kernel implementation, because copy_page uses L1_CACHE_BYTES to determine the preload strategy, while the current memcpy doesn't (it is hardcoded for L1_CACHE_BYTES of 32). This merits further investigation, and there might potentially be other kernel issues for Cortex A9 (including performance) related to this. To confirm, does running 'zcat /proc/config.gz| grep L1_CACHE_SHIFT' on a Cortex A9 show CONFIG_ARM_L1_CACHE_SHIFT defined as 6? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:00 ` Harm Hanemaaijer @ 2013-07-14 13:09 ` Russell King - ARM Linux 2013-07-14 13:59 ` Harm Hanemaaijer 2013-07-14 15:21 ` Siarhei Siamashka 1 sibling, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2013-07-14 13:09 UTC (permalink / raw) To: linux-arm-kernel On Sun, Jul 14, 2013 at 11:00:50AM +0000, Harm Hanemaaijer wrote: > What is interesting is that > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html, > and several other sources (such as other > optimized memcpy implementations) document the cache line size of the Cortex > A9 as 32 bytes, which is an anomaly in the armv7 family. However, it looks > like the kernel is defining L1_CACHE_BYTES as 64 (L1_CACHE_SHIFT == 6) for > all armv7 platforms, which looks like a serious configuring error for Cortex > A9. You're making wrong assumptions about what L1_CACHE_BYTES is. Firstly, L1_CACHE_BYTES is not dynamic - it's a build time constant. You have to make a decision what value it is to be set to when you build the kernel. This is because it gets used to determine the alignment of structures built into the kernel image, amongst other things, and we can't dynamically relink the kernel at boot time. So please, get out of your mind any idea that L1_CACHE_BYTES somehow relates to the exact CPU you're running on. It doesn't. What it relates to is the *maximum* cache line size of *any* CPU that we will run on. Take a moment to think about why given the above. If you're booting on a 32 byte cache line CPU, will a structure aligned for a 64 byte cache line also be aligned for a 32-byte cache line? How about the reverse case? Now, there are various ARMv7 Cortex CPUs that have 64 byte cache lines out there in the wild - Cortex A8 and Cortex A15 are two examples, both of them are ARMv7 CPUs. As we can't distinguish at run time between these, and we are working for a single zImage kernel, we have to assume that ARMv7 means a 64 byte cache line as far as the L1_CACHE_* constants are concerned. Yes, we used to set it for OMAP3 and some Samsung SoC too, but then others came along and single zImage too - and that all makes trying to reduce it down to the minimum rather pointless. So, no, this is *not* a "serious configuring error" at all. It is totally intended to be this way. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 13:09 ` Russell King - ARM Linux @ 2013-07-14 13:59 ` Harm Hanemaaijer 0 siblings, 0 replies; 18+ messages in thread From: Harm Hanemaaijer @ 2013-07-14 13:59 UTC (permalink / raw) To: linux-arm-kernel Russell King - ARM Linux <linux <at> arm.linux.org.uk> writes: > > You're making wrong assumptions about what L1_CACHE_BYTES is. Thanks for the clarification. I have been focused too much on the concept of a kernel image customized for a single device. I can see how having to support multiple platforms with a single kernel image makes things more difficult, especially when trying to optimize for something. I will have to think about how to manage this when trying to optimize memcpy-related functions. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Call for testing/opinions: Optimized memset/memcpy 2013-07-14 11:00 ` Harm Hanemaaijer 2013-07-14 13:09 ` Russell King - ARM Linux @ 2013-07-14 15:21 ` Siarhei Siamashka 1 sibling, 0 replies; 18+ messages in thread From: Siarhei Siamashka @ 2013-07-14 15:21 UTC (permalink / raw) To: linux-arm-kernel On Sun, 14 Jul 2013 11:00:50 +0000 (UTC) Harm Hanemaaijer <fgenfb@yahoo.com> wrote: > Willy Tarreau <w <at> 1wt.eu> writes: > > > > > Please find the results attached. It seems that memcpy improved by 0.8% > > though that's not even certain. > > > > What is interesting is that > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html, > and several other sources (such as other > optimized memcpy implementations) document the cache line size of the Cortex > A9 as 32 bytes, which is an anomaly in the armv7 family. Yes, the cache line size is 32 bytes in Cortex-A9. However in order to mitigate poor memory memory bandwidth utilization, the L2 cache controller implements 'double linefill' feature: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0246h/CHDHIECI.html But 'double linefill' only first appeared in r3p0 revision of L2C-310 L2 cache controller (also known as PL310) and was a bit buggy in the revisions older than r3p2 according to the errata list: http://infocenter.arm.com/help/topic/com.arm.doc.uan0013b/index.html Which only makes double linefill usable in modern Cortex-A9 based SoCs such as Exynos4412, but unfortunately not in the older Cortex-A9 based systems. When double linefill is enabled, two cache lines are allocated at once in L2, so for the memcpy alike workloads it looks somewhat similar to real 64 byte cache line size. Welcome to the diverse world of ARM hardware :) -- Best regards, Siarhei Siamashka ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2013-07-15 13:15 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-07-13 15:51 Call for testing/opinions: Optimized memset/memcpy Harm Hanemaaijer 2013-07-13 16:48 ` Dr. David Alan Gilbert 2013-07-13 21:13 ` Harm Hanemaaijer 2013-07-15 13:15 ` Catalin Marinas 2013-07-14 11:19 ` Harm Hanemaaijer 2013-07-14 11:32 ` Dr. David Alan Gilbert 2013-07-14 11:37 ` Ard Biesheuvel 2013-07-14 13:13 ` Russell King - ARM Linux 2013-07-14 13:33 ` Harm Hanemaaijer 2013-07-14 14:09 ` Ard Biesheuvel 2013-07-14 14:32 ` Russell King - ARM Linux 2013-07-13 17:24 ` Willy Tarreau 2013-07-13 21:51 ` Harm Hanemaaijer 2013-07-14 6:13 ` Willy Tarreau 2013-07-14 11:00 ` Harm Hanemaaijer 2013-07-14 13:09 ` Russell King - ARM Linux 2013-07-14 13:59 ` Harm Hanemaaijer 2013-07-14 15:21 ` Siarhei Siamashka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.