* [PATCH] x86/cpufeatures: Add feature flag for fast short rep movsb @ 2019-12-12 21:49 Tony Luck 2019-12-12 22:52 ` Borislav Petkov 0 siblings, 1 reply; 8+ messages in thread From: Tony Luck @ 2019-12-12 21:49 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Tony Luck, x86, linux-kernel From the Intel Optimization Reference Manual: 3.7.6.1 Fast Short REP MOVSB Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. Support for fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change in the REP STOS performance. Add an X86_FEATURE_FSRM flag for this. Signed-off-by: Tony Luck <tony.luck@intel.com> --- Net effect of this patch is just to make "fsrm" appear in the flags section of /proc/cpuinfo. Maybe someone can look into whether we should make copy routines that use "rep movsb" check for this flag to optimize copies on older CPUs that don't have it? arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 0652d3eed9bd..ab441b15d582 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -356,6 +356,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_FSRM (18*32+ 4) /* Fast Short Rep Mov */ #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ -- 2.20.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86/cpufeatures: Add feature flag for fast short rep movsb 2019-12-12 21:49 [PATCH] x86/cpufeatures: Add feature flag for fast short rep movsb Tony Luck @ 2019-12-12 22:52 ` Borislav Petkov 2019-12-16 21:42 ` [PATCH] x86/cpufeatures: Add support for fast short rep mov Tony Luck 0 siblings, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2019-12-12 22:52 UTC (permalink / raw) To: Tony Luck; +Cc: Thomas Gleixner, x86, linux-kernel On Thu, Dec 12, 2019 at 01:49:08PM -0800, Tony Luck wrote: > From the Intel Optimization Reference Manual: > > 3.7.6.1 Fast Short REP MOVSB > Beginning with processors based on Ice Lake Client microarchitecture, > REP MOVSB performance of short operations is enhanced. The enhancement > applies to string lengths between 1 and 128 bytes long. Support for > fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID > [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change > in the REP STOS performance. > > Add an X86_FEATURE_FSRM flag for this. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > > Net effect of this patch is just to make "fsrm" appear in the > flags section of /proc/cpuinfo. Maybe someone can look into whether > we should make copy routines that use "rep movsb" check for this > flag to optimize copies on older CPUs that don't have it? We can then add the feature flag too. Just showing it in /proc/cpuinfo without any users is kinda pointless... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] x86/cpufeatures: Add support for fast short rep mov 2019-12-12 22:52 ` Borislav Petkov @ 2019-12-16 21:42 ` Tony Luck 2020-01-07 18:40 ` Borislav Petkov 2020-01-08 10:38 ` [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB tip-bot2 for Tony Luck 0 siblings, 2 replies; 8+ messages in thread From: Tony Luck @ 2019-12-16 21:42 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, Thomas Gleixner, x86, linux-kernel From the Intel Optimization Reference Manual: 3.7.6.1 Fast Short REP MOVSB Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. Support for fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change in the REP STOS performance. Add an X86_FEATURE_FSRM flag for this. memmove() avoids REP MOVSB for short (< 32 byte) copies. Fix it to check FSRM and use REP MOVSB for short copies on systems that support it. Signed-off-by: Tony Luck <tony.luck@intel.com> --- Time (cycles) for memmove() sizes 1..31 with neither source nor destination in cache. 1800 +-+-------+--------+---------+---------+---------+--------+-------+-+ + + + + + + + + 1600 +-+ 'memmove-fsrm' *******-+ | ###### 'memmove-orig' ####### | 1400 +-+ # ##################### +-+ | # ############ | 1200 +-+# ################## +-+ | # | 1000 +-+# +-+ | # | | # | 800 +-# +-+ | # | 600 +-*********************** +-+ | ***************************** | 400 +-+ ******* +-+ | | 200 +-+ +-+ + + + + + + + + 0 +-+-------+--------+---------+---------+---------+--------+-------+-+ 0 5 10 15 20 25 30 35 --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/lib/memmove_64.S | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e9b62498fe75..98c60fa31ced 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -357,6 +357,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_FSRM (18*32+ 4) /* Fast Short Rep Mov */ #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 337830d7a59c..4a23086806e6 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -29,10 +29,7 @@ SYM_FUNC_START_ALIAS(memmove) SYM_FUNC_START(__memmove) - /* Handle more 32 bytes in loop */ mov %rdi, %rax - cmp $0x20, %rdx - jb 1f /* Decide forward/backward copy mode */ cmp %rdi, %rsi @@ -43,6 +40,7 @@ SYM_FUNC_START(__memmove) jg 2f .Lmemmove_begin_forward: + ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS /* @@ -114,6 +112,8 @@ SYM_FUNC_START(__memmove) */ .p2align 4 2: + cmp $0x20, %rdx + jb 1f cmp $680, %rdx jb 6f cmp %dil, %sil -- 2.20.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] x86/cpufeatures: Add support for fast short rep mov 2019-12-16 21:42 ` [PATCH] x86/cpufeatures: Add support for fast short rep mov Tony Luck @ 2020-01-07 18:40 ` Borislav Petkov 2020-01-07 22:36 ` Luck, Tony 2020-01-08 10:38 ` [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB tip-bot2 for Tony Luck 1 sibling, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2020-01-07 18:40 UTC (permalink / raw) To: Tony Luck; +Cc: Thomas Gleixner, x86, linux-kernel On Mon, Dec 16, 2019 at 01:42:54PM -0800, Tony Luck wrote: > From the Intel Optimization Reference Manual: > > 3.7.6.1 Fast Short REP MOVSB > Beginning with processors based on Ice Lake Client microarchitecture, > REP MOVSB performance of short operations is enhanced. The enhancement > applies to string lengths between 1 and 128 bytes long. Support for > fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID > [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change > in the REP STOS performance. > > Add an X86_FEATURE_FSRM flag for this. > > memmove() avoids REP MOVSB for short (< 32 byte) copies. Fix it > to check FSRM and use REP MOVSB for short copies on systems that > support it. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > > --- > > Time (cycles) for memmove() sizes 1..31 with neither source nor > destination in cache. > > 1800 +-+-------+--------+---------+---------+---------+--------+-------+-+ > + + + + + + + + > 1600 +-+ 'memmove-fsrm' *******-+ > | ###### 'memmove-orig' ####### | > 1400 +-+ # ##################### +-+ > | # ############ | > 1200 +-+# ################## +-+ > | # | > 1000 +-+# +-+ > | # | > | # | > 800 +-# +-+ > | # | > 600 +-*********************** +-+ > | ***************************** | > 400 +-+ ******* +-+ > | | > 200 +-+ +-+ > + + + + + + + + > 0 +-+-------+--------+---------+---------+---------+--------+-------+-+ > 0 5 10 15 20 25 30 35 I don't mind this graph being part of the commit message - it shows nicely the speedup even if with some microbenchmark. Or you're not adding it just because it is a microbenchmark and not something more representative? > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/lib/memmove_64.S | 6 +++--- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > index e9b62498fe75..98c60fa31ced 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -357,6 +357,7 @@ > /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ > #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ > #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ > +#define X86_FEATURE_FSRM (18*32+ 4) /* Fast Short Rep Mov */ > #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */ > #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ > #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ > diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S > index 337830d7a59c..4a23086806e6 100644 > --- a/arch/x86/lib/memmove_64.S > +++ b/arch/x86/lib/memmove_64.S > @@ -29,10 +29,7 @@ > SYM_FUNC_START_ALIAS(memmove) > SYM_FUNC_START(__memmove) > > - /* Handle more 32 bytes in loop */ > mov %rdi, %rax > - cmp $0x20, %rdx > - jb 1f > > /* Decide forward/backward copy mode */ > cmp %rdi, %rsi > @@ -43,6 +40,7 @@ SYM_FUNC_START(__memmove) > jg 2f > > .Lmemmove_begin_forward: > + ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM So the enhancement is for string lengths up to two cachelines. Why are you limiting this to 32 bytes? I know, the function handles 32-bytes at a time but what I'd imagine here is having the fastest variant upfront which does REP; MOVSB for all lengths since FSRM means fast short strings and ERMS - and I'm strongly assuming here FSRM *implies* ERMS - means fast "longer" strings, so to speak, so FSRM would mean fast *all length* strings in the end, no? Also, does the copy direction influence the FSRM's REP; MOVSB variant's performance? If not, you can do something like this: SYM_FUNC_START_ALIAS(memmove) SYM_FUNC_START(__memmove) mov %rdi, %rax /* FSRM handles all possible string lengths and directions optimally. */ ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_FSRM cmp $0x20, %rdx jb 1f ... Or? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86/cpufeatures: Add support for fast short rep mov 2020-01-07 18:40 ` Borislav Petkov @ 2020-01-07 22:36 ` Luck, Tony 2020-01-08 10:30 ` Borislav Petkov 0 siblings, 1 reply; 8+ messages in thread From: Luck, Tony @ 2020-01-07 22:36 UTC (permalink / raw) To: Borislav Petkov; +Cc: Thomas Gleixner, x86, linux-kernel On Tue, Jan 07, 2020 at 07:40:03PM +0100, Borislav Petkov wrote: > I don't mind this graph being part of the commit message - it shows > nicely the speedup even if with some microbenchmark. Or you're not > adding it just because it is a microbenchmark and not something more > representative? I'm not sure it should be archived forever in the commit message. The benchmark was run on A-step silicon, so may not be representative of production results. > > .Lmemmove_begin_forward: > > + ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM > > So the enhancement is for string lengths up to two cachelines. Why > are you limiting this to 32 bytes? > > I know, the function handles 32-bytes at a time but what I'd imagine > here is having the fastest variant upfront which does REP; MOVSB for all > lengths since FSRM means fast short strings and ERMS - and I'm strongly > assuming here FSRM *implies* ERMS - means fast "longer" strings, so to > speak, so FSRM would mean fast *all length* strings in the end, no? > > Also, does the copy direction influence the FSRM's REP; MOVSB variant's > performance? If not, you can do something like this: Yes FSRM implies ERMS You can't use REP MOVS for overlapping src/dst strings (not even with the fancy newer, faster, shinier FSRM version). So your suggestion will not work. The old memmove code looked something like: if (len < 32) copy tail (backwards ... 8/4/2/1 bytes. works for both overlap & non-overlap case) return else if overlap src/dst copy backwards 32-byte unrolled copy tail return else if (ERMS) REP MOVS; return else unrolled copy 32-byte copy tail The new one with my changes looks something like if (! overlap src/dst) if (FSRM) rep movs return if (len < 32) copy tail return if (ERMS) rep movs return unrolled copy else if (len < 32) copy tail return copy backwards 32-byte unrolled copy tail -Tony ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] x86/cpufeatures: Add support for fast short rep mov 2020-01-07 22:36 ` Luck, Tony @ 2020-01-08 10:30 ` Borislav Petkov 0 siblings, 0 replies; 8+ messages in thread From: Borislav Petkov @ 2020-01-08 10:30 UTC (permalink / raw) To: Luck, Tony; +Cc: Thomas Gleixner, x86, linux-kernel On Tue, Jan 07, 2020 at 02:36:06PM -0800, Luck, Tony wrote: > Yes FSRM implies ERMS Ok, I've added this comment ontop so that it is clear what's going on there: /* FSRM implies ERMS => no length checks, do the copy directly */ -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB 2019-12-16 21:42 ` [PATCH] x86/cpufeatures: Add support for fast short rep mov Tony Luck 2020-01-07 18:40 ` Borislav Petkov @ 2020-01-08 10:38 ` tip-bot2 for Tony Luck 2020-01-08 11:54 ` Ingo Molnar 1 sibling, 1 reply; 8+ messages in thread From: tip-bot2 for Tony Luck @ 2020-01-08 10:38 UTC (permalink / raw) To: linux-tip-commits; +Cc: Tony Luck, Borislav Petkov, x86, LKML The following commit has been merged into the x86/asm branch of tip: Commit-ID: f444a5ff95dce07cf4353cbb85fc3e785019d430 Gitweb: https://git.kernel.org/tip/f444a5ff95dce07cf4353cbb85fc3e785019d430 Author: Tony Luck <tony.luck@intel.com> AuthorDate: Mon, 16 Dec 2019 13:42:54 -08:00 Committer: Borislav Petkov <bp@suse.de> CommitterDate: Wed, 08 Jan 2020 11:29:25 +01:00 x86/cpufeatures: Add support for fast short REP; MOVSB >From the Intel Optimization Reference Manual: 3.7.6.1 Fast Short REP MOVSB Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. Support for fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change in the REP STOS performance. Add an X86_FEATURE_FSRM flag for this. memmove() avoids REP MOVSB for short (< 32 byte) copies. Check FSRM and use REP MOVSB for short copies on systems that support it. [ bp: Massage and add comment. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191216214254.26492-1-tony.luck@intel.com --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/lib/memmove_64.S | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e9b6249..98c60fa 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -357,6 +357,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_FSRM (18*32+ 4) /* Fast Short Rep Mov */ #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */ #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 337830d..7ff00ea 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -29,10 +29,7 @@ SYM_FUNC_START_ALIAS(memmove) SYM_FUNC_START(__memmove) - /* Handle more 32 bytes in loop */ mov %rdi, %rax - cmp $0x20, %rdx - jb 1f /* Decide forward/backward copy mode */ cmp %rdi, %rsi @@ -42,7 +39,9 @@ SYM_FUNC_START(__memmove) cmp %rdi, %r8 jg 2f + /* FSRM implies ERMS => no length checks, do the copy directly */ .Lmemmove_begin_forward: + ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS /* @@ -114,6 +113,8 @@ SYM_FUNC_START(__memmove) */ .p2align 4 2: + cmp $0x20, %rdx + jb 1f cmp $680, %rdx jb 6f cmp %dil, %sil ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB 2020-01-08 10:38 ` [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB tip-bot2 for Tony Luck @ 2020-01-08 11:54 ` Ingo Molnar 0 siblings, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2020-01-08 11:54 UTC (permalink / raw) To: linux-kernel Cc: linux-tip-commits, Tony Luck, Borislav Petkov, x86, Thomas Gleixner, Peter Zijlstra * tip-bot2 for Tony Luck <tip-bot2@linutronix.de> wrote: > The following commit has been merged into the x86/asm branch of tip: > > Commit-ID: f444a5ff95dce07cf4353cbb85fc3e785019d430 > Gitweb: https://git.kernel.org/tip/f444a5ff95dce07cf4353cbb85fc3e785019d430 > Author: Tony Luck <tony.luck@intel.com> > AuthorDate: Mon, 16 Dec 2019 13:42:54 -08:00 > Committer: Borislav Petkov <bp@suse.de> > CommitterDate: Wed, 08 Jan 2020 11:29:25 +01:00 > > x86/cpufeatures: Add support for fast short REP; MOVSB > > >From the Intel Optimization Reference Manual: > > 3.7.6.1 Fast Short REP MOVSB > Beginning with processors based on Ice Lake Client microarchitecture, > REP MOVSB performance of short operations is enhanced. The enhancement > applies to string lengths between 1 and 128 bytes long. Support for > fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID > [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change > in the REP STOS performance. > > Add an X86_FEATURE_FSRM flag for this. > > memmove() avoids REP MOVSB for short (< 32 byte) copies. Check FSRM and > use REP MOVSB for short copies on systems that support it. > > [ bp: Massage and add comment. ] > > Signed-off-by: Tony Luck <tony.luck@intel.com> > Signed-off-by: Borislav Petkov <bp@suse.de> > Link: https://lkml.kernel.org/r/20191216214254.26492-1-tony.luck@intel.com BTW., just for the record, the 32-bit version of memmove() has a similar cut-off as well, at 680 bytes (!): /* * movs instruction have many startup latency * so we handle small size by general register. */ "cmp $680, %0\n\t" "jb 3f\n\t" ... /* * Start to prepare for backward copy. */ ".p2align 4\n\t" "2:\n\t" "cmp $680, %0\n\t" "jb 5f\n\t" This logic was introduced in 2010 via: 3b4b682becdf: ("x86, mem: Optimize memmove for small size and unaligned cases") However because those patches came without actual performance measurements, I'd be inclined to switch back to the old REP MOVSB version - which would also automatically improve it should anyone run 32-bit kernels on the very latest CPUs. Thanks, Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-01-08 11:54 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-12 21:49 [PATCH] x86/cpufeatures: Add feature flag for fast short rep movsb Tony Luck 2019-12-12 22:52 ` Borislav Petkov 2019-12-16 21:42 ` [PATCH] x86/cpufeatures: Add support for fast short rep mov Tony Luck 2020-01-07 18:40 ` Borislav Petkov 2020-01-07 22:36 ` Luck, Tony 2020-01-08 10:30 ` Borislav Petkov 2020-01-08 10:38 ` [tip: x86/asm] x86/cpufeatures: Add support for fast short REP; MOVSB tip-bot2 for Tony Luck 2020-01-08 11:54 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).