* [PATCH v3 0/2] Fix BTT data corruptions after crash @ 2016-02-11 21:24 Toshi Kani 2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani 2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani 0 siblings, 2 replies; 7+ messages in thread From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw) To: tglx, mingo, hpa, bp, dan.j.williams Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm, linux-kernel Data corruption issues were observed in tests which initiated a system crash/reset while accessing BTT devices. This problem is reproducible. The BTT driver calls pmem_rw_bytes() to update data in pmem devices. This interface calls __copy_user_nocache(), which uses non-temporal stores so that the stores to pmem are persistent. __copy_user_nocache() uses non-temporal stores when a request size is 8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates the BTT map table, which entry size is 4 bytes. Therefore, updates to the map table entries remain cached, and are not written to pmem after a crash. Since the BTT driver makes previous blocks free and uses them for subsequent writes, the map table ends up pointing to blocks allocated for other LBAs after a crash. Patch 1 cleans up __copy_user_nocache() before making changes. Patch 2 makes __copy_user_nocache() handle 4-byte nocache copy. --- v3: - Add a cleanup patch to rename numeric labels to descriptively named labels with .L. (Ingo Molnar, Borislav Petkov) v2: - Add comments (Ingo Molnar). --- Toshi Kani (2): 1/2 x86/lib/copy_user_64.S: cleanup __copy_user_nocache() 2/2 x86/lib/copy_user_64.S: Handle 4-byte nocache copy --- arch/x86/lib/copy_user_64.S | 142 +++++++++++++++++++++++++++++++------------- 1 file changed, 101 insertions(+), 41 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() 2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani @ 2016-02-11 21:24 ` Toshi Kani 2016-02-17 8:02 ` Ingo Molnar 2016-02-17 12:13 ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani 2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani 1 sibling, 2 replies; 7+ messages in thread From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw) To: tglx, mingo, hpa, bp, dan.j.williams Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm, linux-kernel, Toshi Kani Add comments to __copy_user_nocache() to clarify its procedures and alignment requirement. Also change numeric branch target labels to named labels. The labels begin with ".L" and prefix "cun" (Copy User Nocache) to keep them local and unique to the function. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> --- arch/x86/lib/copy_user_64.S | 114 ++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 41 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index 982ce34..23042ff 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -232,17 +232,30 @@ ENDPROC(copy_user_enhanced_fast_string) /* * copy_user_nocache - Uncached memory copy with exception handling - * This will force destination/source out of cache for more performance. + * This will force destination out of cache for more performance. + * + * Note: Cached memory copy is used when destination or size is not + * naturally aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. */ ENTRY(__copy_user_nocache) ASM_STAC + + /* If size is less than 8 bytes, goto byte copy */ cmpl $8,%edx - jb 20f /* less then 8 bytes, go to byte copy loop */ + jb .Lcun_1b_cache_copy_entry + + /* If destination is not 8-byte aligned, "cache" copy to align it */ ALIGN_DESTINATION + + /* Set 4x8-byte copy count and remainder */ movl %edx,%ecx andl $63,%edx shrl $6,%ecx - jz 17f + jz .Lcun_8b_nocache_copy_entry /* jump if count is 0 */ + + /* Perform 4x8-byte nocache loop-copy */ +.Lcun_4x8b_nocache_copy_loop: 1: movq (%rsi),%r8 2: movq 1*8(%rsi),%r9 3: movq 2*8(%rsi),%r10 @@ -262,60 +275,79 @@ ENTRY(__copy_user_nocache) leaq 64(%rsi),%rsi leaq 64(%rdi),%rdi decl %ecx - jnz 1b -17: movl %edx,%ecx + jnz .Lcun_4x8b_nocache_copy_loop + + /* Set 8-byte copy count and remainder */ +.Lcun_8b_nocache_copy_entry: + movl %edx,%ecx andl $7,%edx shrl $3,%ecx - jz 20f -18: movq (%rsi),%r8 -19: movnti %r8,(%rdi) + jz .Lcun_1b_cache_copy_entry /* jump if count is 0 */ + + /* Perform 8-byte nocache loop-copy */ +.Lcun_8b_nocache_copy_loop: +20: movq (%rsi),%r8 +21: movnti %r8,(%rdi) leaq 8(%rsi),%rsi leaq 8(%rdi),%rdi decl %ecx - jnz 18b -20: andl %edx,%edx - jz 23f + jnz .Lcun_8b_nocache_copy_loop + + /* If no byte left, we're done */ +.Lcun_1b_cache_copy_entry: + andl %edx,%edx + jz .Lcun_finish_copy + + /* Perform byte "cache" loop-copy for the remainder */ movl %edx,%ecx -21: movb (%rsi),%al -22: movb %al,(%rdi) +.Lcun_1b_cache_copy_loop: +40: movb (%rsi),%al +41: movb %al,(%rdi) incq %rsi incq %rdi decl %ecx - jnz 21b -23: xorl %eax,%eax + jnz .Lcun_1b_cache_copy_loop + + /* Finished copying; fence the prior stores */ +.Lcun_finish_copy: + xorl %eax,%eax ASM_CLAC sfence ret .section .fixup,"ax" -30: shll $6,%ecx +.Lcun_fixup_4x8b_copy: + shll $6,%ecx addl %ecx,%edx - jmp 60f -40: lea (%rdx,%rcx,8),%rdx - jmp 60f -50: movl %ecx,%edx -60: sfence + jmp .Lcun_fixup_handle_tail +.Lcun_fixup_8b_copy: + lea (%rdx,%rcx,8),%rdx + jmp .Lcun_fixup_handle_tail +.Lcun_fixup_1b_copy: + movl %ecx,%edx +.Lcun_fixup_handle_tail: + sfence jmp copy_user_handle_tail .previous - _ASM_EXTABLE(1b,30b) - _ASM_EXTABLE(2b,30b) - _ASM_EXTABLE(3b,30b) - _ASM_EXTABLE(4b,30b) - _ASM_EXTABLE(5b,30b) - _ASM_EXTABLE(6b,30b) - _ASM_EXTABLE(7b,30b) - _ASM_EXTABLE(8b,30b) - _ASM_EXTABLE(9b,30b) - _ASM_EXTABLE(10b,30b) - _ASM_EXTABLE(11b,30b) - _ASM_EXTABLE(12b,30b) - _ASM_EXTABLE(13b,30b) - _ASM_EXTABLE(14b,30b) - _ASM_EXTABLE(15b,30b) - _ASM_EXTABLE(16b,30b) - _ASM_EXTABLE(18b,40b) - _ASM_EXTABLE(19b,40b) - _ASM_EXTABLE(21b,50b) - _ASM_EXTABLE(22b,50b) + _ASM_EXTABLE(1b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(2b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(3b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(4b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(5b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(6b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(7b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(8b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(9b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(10b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(11b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(12b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(13b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(14b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(15b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(16b,.Lcun_fixup_4x8b_copy) + _ASM_EXTABLE(20b,.Lcun_fixup_8b_copy) + _ASM_EXTABLE(21b,.Lcun_fixup_8b_copy) + _ASM_EXTABLE(40b,.Lcun_fixup_1b_copy) + _ASM_EXTABLE(41b,.Lcun_fixup_1b_copy) ENDPROC(__copy_user_nocache) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() 2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani @ 2016-02-17 8:02 ` Ingo Molnar 2016-02-17 15:52 ` Toshi Kani 2016-02-17 12:13 ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani 1 sibling, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2016-02-17 8:02 UTC (permalink / raw) To: Toshi Kani Cc: tglx, mingo, hpa, bp, dan.j.williams, ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm, linux-kernel * Toshi Kani <toshi.kani@hpe.com> wrote: > Add comments to __copy_user_nocache() to clarify its procedures > and alignment requirement. > > Also change numeric branch target labels to named labels. The > labels begin with ".L" and prefix "cun" (Copy User Nocache) to > keep them local and unique to the function. So the .L labels are local, i.e. they are not emitted into the symbol table. I.e. no need to name them globally! I've done a s/Lcun_/L_/ over the patch. Thanks, Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() 2016-02-17 8:02 ` Ingo Molnar @ 2016-02-17 15:52 ` Toshi Kani 0 siblings, 0 replies; 7+ messages in thread From: Toshi Kani @ 2016-02-17 15:52 UTC (permalink / raw) To: Ingo Molnar Cc: tglx, mingo, hpa, bp, dan.j.williams, ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm, linux-kernel On Wed, 2016-02-17 at 09:02 +0100, Ingo Molnar wrote: > * Toshi Kani <toshi.kani@hpe.com> wrote: > > > Add comments to __copy_user_nocache() to clarify its procedures > > and alignment requirement. > > > > Also change numeric branch target labels to named labels. The > > labels begin with ".L" and prefix "cun" (Copy User Nocache) to > > keep them local and unique to the function. > > So the .L labels are local, i.e. they are not emitted into the symbol > table. > > I.e. no need to name them globally! Right, but I thought there is risk of conflicting the names with other copy functions in the file when they also start using descriptive labels. For instance, ".L_finish_copy" can be easily used by other copy functions as well. > I've done a s/Lcun_/L_/ over the patch. Thanks, -Toshi ^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable 2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani 2016-02-17 8:02 ` Ingo Molnar @ 2016-02-17 12:13 ` tip-bot for Toshi Kani 1 sibling, 0 replies; 7+ messages in thread From: tip-bot for Toshi Kani @ 2016-02-17 12:13 UTC (permalink / raw) To: linux-tip-commits Cc: luto, akpm, stable, mcgrof, bp, tglx, dvlasenk, torvalds, linux-kernel, mingo, toshi.kani, bp, toshi.kani, hpa, peterz, brgerst Commit-ID: ee9737c924706aaa72c2ead93e3ad5644681dc1c Gitweb: http://git.kernel.org/tip/ee9737c924706aaa72c2ead93e3ad5644681dc1c Author: Toshi Kani <toshi.kani@hpe.com> AuthorDate: Thu, 11 Feb 2016 14:24:16 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Wed, 17 Feb 2016 09:10:22 +0100 x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable Add comments to __copy_user_nocache() to clarify its procedures and alignment requirements. Also change numeric branch target labels to named local labels. No code changed: arch/x86/lib/copy_user_64.o: text data bss dec hex filename 1239 0 0 1239 4d7 copy_user_64.o.before 1239 0 0 1239 4d7 copy_user_64.o.after md5: 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.before.asm 58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.after.asm Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brian.boylston@hpe.com Cc: dan.j.williams@intel.com Cc: linux-nvdimm@lists.01.org Cc: micah.parrish@hpe.com Cc: ross.zwisler@linux.intel.com Cc: vishal.l.verma@intel.com Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com [ Small readability edits and added object file comparison. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/lib/copy_user_64.S | 114 ++++++++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 41 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index 982ce34..a644aad 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -232,17 +232,30 @@ ENDPROC(copy_user_enhanced_fast_string) /* * copy_user_nocache - Uncached memory copy with exception handling - * This will force destination/source out of cache for more performance. + * This will force destination out of cache for more performance. + * + * Note: Cached memory copy is used when destination or size is not + * naturally aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. */ ENTRY(__copy_user_nocache) ASM_STAC + + /* If size is less than 8 bytes, go to byte copy */ cmpl $8,%edx - jb 20f /* less then 8 bytes, go to byte copy loop */ + jb .L_1b_cache_copy_entry + + /* If destination is not 8-byte aligned, "cache" copy to align it */ ALIGN_DESTINATION + + /* Set 4x8-byte copy count and remainder */ movl %edx,%ecx andl $63,%edx shrl $6,%ecx - jz 17f + jz .L_8b_nocache_copy_entry /* jump if count is 0 */ + + /* Perform 4x8-byte nocache loop-copy */ +.L_4x8b_nocache_copy_loop: 1: movq (%rsi),%r8 2: movq 1*8(%rsi),%r9 3: movq 2*8(%rsi),%r10 @@ -262,60 +275,79 @@ ENTRY(__copy_user_nocache) leaq 64(%rsi),%rsi leaq 64(%rdi),%rdi decl %ecx - jnz 1b -17: movl %edx,%ecx + jnz .L_4x8b_nocache_copy_loop + + /* Set 8-byte copy count and remainder */ +.L_8b_nocache_copy_entry: + movl %edx,%ecx andl $7,%edx shrl $3,%ecx - jz 20f -18: movq (%rsi),%r8 -19: movnti %r8,(%rdi) + jz .L_1b_cache_copy_entry /* jump if count is 0 */ + + /* Perform 8-byte nocache loop-copy */ +.L_8b_nocache_copy_loop: +20: movq (%rsi),%r8 +21: movnti %r8,(%rdi) leaq 8(%rsi),%rsi leaq 8(%rdi),%rdi decl %ecx - jnz 18b -20: andl %edx,%edx - jz 23f + jnz .L_8b_nocache_copy_loop + + /* If no byte left, we're done */ +.L_1b_cache_copy_entry: + andl %edx,%edx + jz .L_finish_copy + + /* Perform byte "cache" loop-copy for the remainder */ movl %edx,%ecx -21: movb (%rsi),%al -22: movb %al,(%rdi) +.L_1b_cache_copy_loop: +40: movb (%rsi),%al +41: movb %al,(%rdi) incq %rsi incq %rdi decl %ecx - jnz 21b -23: xorl %eax,%eax + jnz .L_1b_cache_copy_loop + + /* Finished copying; fence the prior stores */ +.L_finish_copy: + xorl %eax,%eax ASM_CLAC sfence ret .section .fixup,"ax" -30: shll $6,%ecx +.L_fixup_4x8b_copy: + shll $6,%ecx addl %ecx,%edx - jmp 60f -40: lea (%rdx,%rcx,8),%rdx - jmp 60f -50: movl %ecx,%edx -60: sfence + jmp .L_fixup_handle_tail +.L_fixup_8b_copy: + lea (%rdx,%rcx,8),%rdx + jmp .L_fixup_handle_tail +.L_fixup_1b_copy: + movl %ecx,%edx +.L_fixup_handle_tail: + sfence jmp copy_user_handle_tail .previous - _ASM_EXTABLE(1b,30b) - _ASM_EXTABLE(2b,30b) - _ASM_EXTABLE(3b,30b) - _ASM_EXTABLE(4b,30b) - _ASM_EXTABLE(5b,30b) - _ASM_EXTABLE(6b,30b) - _ASM_EXTABLE(7b,30b) - _ASM_EXTABLE(8b,30b) - _ASM_EXTABLE(9b,30b) - _ASM_EXTABLE(10b,30b) - _ASM_EXTABLE(11b,30b) - _ASM_EXTABLE(12b,30b) - _ASM_EXTABLE(13b,30b) - _ASM_EXTABLE(14b,30b) - _ASM_EXTABLE(15b,30b) - _ASM_EXTABLE(16b,30b) - _ASM_EXTABLE(18b,40b) - _ASM_EXTABLE(19b,40b) - _ASM_EXTABLE(21b,50b) - _ASM_EXTABLE(22b,50b) + _ASM_EXTABLE(1b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(2b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(3b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(4b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(5b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(6b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(7b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(8b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(9b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(10b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(11b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(12b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(13b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(14b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(15b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(16b,.L_fixup_4x8b_copy) + _ASM_EXTABLE(20b,.L_fixup_8b_copy) + _ASM_EXTABLE(21b,.L_fixup_8b_copy) + _ASM_EXTABLE(40b,.L_fixup_1b_copy) + _ASM_EXTABLE(41b,.L_fixup_1b_copy) ENDPROC(__copy_user_nocache) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy 2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani 2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani @ 2016-02-11 21:24 ` Toshi Kani 2016-02-17 12:14 ` [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache () tip-bot for Toshi Kani 1 sibling, 1 reply; 7+ messages in thread From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw) To: tglx, mingo, hpa, bp, dan.j.williams Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm, linux-kernel, Toshi Kani Data corruption issues were observed in tests which initiated a system crash/reset while accessing BTT devices. This problem is reproducible. The BTT driver calls pmem_rw_bytes() to update data in pmem devices. This interface calls __copy_user_nocache(), which uses non-temporal stores so that the stores to pmem are persistent. __copy_user_nocache() uses non-temporal stores when a request size is 8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates the BTT map table, which entry size is 4 bytes. Therefore, updates to the map table entries remain cached, and are not written to pmem after a crash. Change __copy_user_nocache() to use non-temporal store when a request size is 4 bytes. The change extends the current byte-copy path for a less-than-8-bytes request, and does not add any overhead to the regular path. Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com> Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> --- arch/x86/lib/copy_user_64.S | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index 23042ff..9228ce6 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -237,13 +237,14 @@ ENDPROC(copy_user_enhanced_fast_string) * Note: Cached memory copy is used when destination or size is not * naturally aligned. That is: * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. */ ENTRY(__copy_user_nocache) ASM_STAC - /* If size is less than 8 bytes, goto byte copy */ + /* If size is less than 8 bytes, goto 4-byte copy */ cmpl $8,%edx - jb .Lcun_1b_cache_copy_entry + jb .Lcun_4b_nocache_copy_entry /* If destination is not 8-byte aligned, "cache" copy to align it */ ALIGN_DESTINATION @@ -282,7 +283,7 @@ ENTRY(__copy_user_nocache) movl %edx,%ecx andl $7,%edx shrl $3,%ecx - jz .Lcun_1b_cache_copy_entry /* jump if count is 0 */ + jz .Lcun_4b_nocache_copy_entry /* jump if count is 0 */ /* Perform 8-byte nocache loop-copy */ .Lcun_8b_nocache_copy_loop: @@ -294,11 +295,33 @@ ENTRY(__copy_user_nocache) jnz .Lcun_8b_nocache_copy_loop /* If no byte left, we're done */ -.Lcun_1b_cache_copy_entry: +.Lcun_4b_nocache_copy_entry: + andl %edx,%edx + jz .Lcun_finish_copy + + /* If destination is not 4-byte aligned, goto byte copy */ + movl %edi,%ecx + andl $3,%ecx + jnz .Lcun_1b_cache_copy_entry + + /* Set 4-byte copy count (1 or 0) and remainder */ + movl %edx,%ecx + andl $3,%edx + shrl $2,%ecx + jz .Lcun_1b_cache_copy_entry /* jump if count is 0 */ + + /* Perform 4-byte nocache copy */ +30: movl (%rsi),%r8d +31: movnti %r8d,(%rdi) + leaq 4(%rsi),%rsi + leaq 4(%rdi),%rdi + + /* If no byte left, we're done */ andl %edx,%edx jz .Lcun_finish_copy /* Perform byte "cache" loop-copy for the remainder */ +.Lcun_1b_cache_copy_entry: movl %edx,%ecx .Lcun_1b_cache_copy_loop: 40: movb (%rsi),%al @@ -323,6 +346,9 @@ ENTRY(__copy_user_nocache) .Lcun_fixup_8b_copy: lea (%rdx,%rcx,8),%rdx jmp .Lcun_fixup_handle_tail +.Lcun_fixup_4b_copy: + lea (%rdx,%rcx,4),%rdx + jmp .Lcun_fixup_handle_tail .Lcun_fixup_1b_copy: movl %ecx,%edx .Lcun_fixup_handle_tail: @@ -348,6 +374,8 @@ ENTRY(__copy_user_nocache) _ASM_EXTABLE(16b,.Lcun_fixup_4x8b_copy) _ASM_EXTABLE(20b,.Lcun_fixup_8b_copy) _ASM_EXTABLE(21b,.Lcun_fixup_8b_copy) + _ASM_EXTABLE(30b,.Lcun_fixup_4b_copy) + _ASM_EXTABLE(31b,.Lcun_fixup_4b_copy) _ASM_EXTABLE(40b,.Lcun_fixup_1b_copy) _ASM_EXTABLE(41b,.Lcun_fixup_1b_copy) ENDPROC(__copy_user_nocache) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache () 2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani @ 2016-02-17 12:14 ` tip-bot for Toshi Kani 0 siblings, 0 replies; 7+ messages in thread From: tip-bot for Toshi Kani @ 2016-02-17 12:14 UTC (permalink / raw) To: linux-tip-commits Cc: dvlasenk, toshi.kani, luto, hpa, akpm, mingo, linux-kernel, stable, peterz, vishal.l.verma, bp, ross.zwisler, bp, toshi.kani, torvalds, dan.j.williams, brian.boylston, tglx, micah.parrish, mcgrof, brgerst Commit-ID: a82eee7424525e34e98d821dd059ce14560a1e35 Gitweb: http://git.kernel.org/tip/a82eee7424525e34e98d821dd059ce14560a1e35 Author: Toshi Kani <toshi.kani@hpe.com> AuthorDate: Thu, 11 Feb 2016 14:24:17 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Wed, 17 Feb 2016 09:10:23 +0100 x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() Data corruption issues were observed in tests which initiated a system crash/reset while accessing BTT devices. This problem is reproducible. The BTT driver calls pmem_rw_bytes() to update data in pmem devices. This interface calls __copy_user_nocache(), which uses non-temporal stores so that the stores to pmem are persistent. __copy_user_nocache() uses non-temporal stores when a request size is 8 bytes or larger (and is aligned by 8 bytes). The BTT driver updates the BTT map table, which entry size is 4 bytes. Therefore, updates to the map table entries remain cached, and are not written to pmem after a crash. Change __copy_user_nocache() to use non-temporal store when a request size is 4 bytes. The change extends the current byte-copy path for a less-than-8-bytes request, and does not add any overhead to the regular path. Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com> Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com [ Small readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/lib/copy_user_64.S | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index a644aad..27f89c7 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -237,13 +237,14 @@ ENDPROC(copy_user_enhanced_fast_string) * Note: Cached memory copy is used when destination or size is not * naturally aligned. That is: * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. */ ENTRY(__copy_user_nocache) ASM_STAC - /* If size is less than 8 bytes, go to byte copy */ + /* If size is less than 8 bytes, go to 4-byte copy */ cmpl $8,%edx - jb .L_1b_cache_copy_entry + jb .L_4b_nocache_copy_entry /* If destination is not 8-byte aligned, "cache" copy to align it */ ALIGN_DESTINATION @@ -282,7 +283,7 @@ ENTRY(__copy_user_nocache) movl %edx,%ecx andl $7,%edx shrl $3,%ecx - jz .L_1b_cache_copy_entry /* jump if count is 0 */ + jz .L_4b_nocache_copy_entry /* jump if count is 0 */ /* Perform 8-byte nocache loop-copy */ .L_8b_nocache_copy_loop: @@ -294,11 +295,33 @@ ENTRY(__copy_user_nocache) jnz .L_8b_nocache_copy_loop /* If no byte left, we're done */ -.L_1b_cache_copy_entry: +.L_4b_nocache_copy_entry: + andl %edx,%edx + jz .L_finish_copy + + /* If destination is not 4-byte aligned, go to byte copy: */ + movl %edi,%ecx + andl $3,%ecx + jnz .L_1b_cache_copy_entry + + /* Set 4-byte copy count (1 or 0) and remainder */ + movl %edx,%ecx + andl $3,%edx + shrl $2,%ecx + jz .L_1b_cache_copy_entry /* jump if count is 0 */ + + /* Perform 4-byte nocache copy: */ +30: movl (%rsi),%r8d +31: movnti %r8d,(%rdi) + leaq 4(%rsi),%rsi + leaq 4(%rdi),%rdi + + /* If no bytes left, we're done: */ andl %edx,%edx jz .L_finish_copy /* Perform byte "cache" loop-copy for the remainder */ +.L_1b_cache_copy_entry: movl %edx,%ecx .L_1b_cache_copy_loop: 40: movb (%rsi),%al @@ -323,6 +346,9 @@ ENTRY(__copy_user_nocache) .L_fixup_8b_copy: lea (%rdx,%rcx,8),%rdx jmp .L_fixup_handle_tail +.L_fixup_4b_copy: + lea (%rdx,%rcx,4),%rdx + jmp .L_fixup_handle_tail .L_fixup_1b_copy: movl %ecx,%edx .L_fixup_handle_tail: @@ -348,6 +374,8 @@ ENTRY(__copy_user_nocache) _ASM_EXTABLE(16b,.L_fixup_4x8b_copy) _ASM_EXTABLE(20b,.L_fixup_8b_copy) _ASM_EXTABLE(21b,.L_fixup_8b_copy) + _ASM_EXTABLE(30b,.L_fixup_4b_copy) + _ASM_EXTABLE(31b,.L_fixup_4b_copy) _ASM_EXTABLE(40b,.L_fixup_1b_copy) _ASM_EXTABLE(41b,.L_fixup_1b_copy) ENDPROC(__copy_user_nocache) ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-17 14:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani 2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani 2016-02-17 8:02 ` Ingo Molnar 2016-02-17 15:52 ` Toshi Kani 2016-02-17 12:13 ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani 2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani 2016-02-17 12:14 ` [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache () tip-bot for Toshi Kani
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).