linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Fix BTT data corruptions after crash
@ 2016-02-11 21:24 Toshi Kani
  2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani
  2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani
  0 siblings, 2 replies; 7+ messages in thread
From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw)
  To: tglx, mingo, hpa, bp, dan.j.williams
  Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86,
	linux-nvdimm, linux-kernel

Data corruption issues were observed in tests which initiated a system
crash/reset while accessing BTT devices.  This problem is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem devices.
This interface calls __copy_user_nocache(), which uses non-temporal
stores so that the stores to pmem are persistent.

__copy_user_nocache() uses non-temporal stores when a request size is
8 bytes or larger (and is aligned by 8 bytes).  The BTT driver updates
the BTT map table, which entry size is 4 bytes.  Therefore, updates to
the map table entries remain cached, and are not written to pmem after
a crash.  Since the BTT driver makes previous blocks free and uses them
for subsequent writes, the map table ends up pointing to blocks allocated
for other LBAs after a crash.

Patch 1 cleans up __copy_user_nocache() before making changes.
Patch 2 makes __copy_user_nocache() handle 4-byte nocache copy.

---
v3:
 - Add a cleanup patch to rename numeric labels to descriptively named
   labels with .L. (Ingo Molnar, Borislav Petkov)
v2:
 - Add comments (Ingo Molnar).
---
Toshi Kani (2):
 1/2 x86/lib/copy_user_64.S: cleanup __copy_user_nocache()
 2/2 x86/lib/copy_user_64.S: Handle 4-byte nocache copy

---
 arch/x86/lib/copy_user_64.S | 142 +++++++++++++++++++++++++++++++-------------
 1 file changed, 101 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache()
  2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani
@ 2016-02-11 21:24 ` Toshi Kani
  2016-02-17  8:02   ` Ingo Molnar
  2016-02-17 12:13   ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani
  2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani
  1 sibling, 2 replies; 7+ messages in thread
From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw)
  To: tglx, mingo, hpa, bp, dan.j.williams
  Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86,
	linux-nvdimm, linux-kernel, Toshi Kani

Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirement.

Also change numeric branch target labels to named labels.  The
labels begin with ".L" and prefix "cun" (Copy User Nocache) to
keep them local and unique to the function.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
---
 arch/x86/lib/copy_user_64.S |  114 ++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 982ce34..23042ff 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -232,17 +232,30 @@ ENDPROC(copy_user_enhanced_fast_string)
 
 /*
  * copy_user_nocache - Uncached memory copy with exception handling
- * This will force destination/source out of cache for more performance.
+ * This will force destination out of cache for more performance.
+ *
+ * Note: Cached memory copy is used when destination or size is not
+ * naturally aligned. That is:
+ *  - Require 8-byte alignment when size is 8 bytes or larger.
  */
 ENTRY(__copy_user_nocache)
 	ASM_STAC
+
+	/* If size is less than 8 bytes, goto byte copy */
 	cmpl $8,%edx
-	jb 20f		/* less then 8 bytes, go to byte copy loop */
+	jb .Lcun_1b_cache_copy_entry
+
+	/* If destination is not 8-byte aligned, "cache" copy to align it */
 	ALIGN_DESTINATION
+
+	/* Set 4x8-byte copy count and remainder */
 	movl %edx,%ecx
 	andl $63,%edx
 	shrl $6,%ecx
-	jz 17f
+	jz .Lcun_8b_nocache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 4x8-byte nocache loop-copy */
+.Lcun_4x8b_nocache_copy_loop:
 1:	movq (%rsi),%r8
 2:	movq 1*8(%rsi),%r9
 3:	movq 2*8(%rsi),%r10
@@ -262,60 +275,79 @@ ENTRY(__copy_user_nocache)
 	leaq 64(%rsi),%rsi
 	leaq 64(%rdi),%rdi
 	decl %ecx
-	jnz 1b
-17:	movl %edx,%ecx
+	jnz .Lcun_4x8b_nocache_copy_loop
+
+	/* Set 8-byte copy count and remainder */
+.Lcun_8b_nocache_copy_entry:
+	movl %edx,%ecx
 	andl $7,%edx
 	shrl $3,%ecx
-	jz 20f
-18:	movq (%rsi),%r8
-19:	movnti %r8,(%rdi)
+	jz .Lcun_1b_cache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 8-byte nocache loop-copy */
+.Lcun_8b_nocache_copy_loop:
+20:	movq (%rsi),%r8
+21:	movnti %r8,(%rdi)
 	leaq 8(%rsi),%rsi
 	leaq 8(%rdi),%rdi
 	decl %ecx
-	jnz 18b
-20:	andl %edx,%edx
-	jz 23f
+	jnz .Lcun_8b_nocache_copy_loop
+
+	/* If no byte left, we're done */
+.Lcun_1b_cache_copy_entry:
+	andl %edx,%edx
+	jz .Lcun_finish_copy
+
+	/* Perform byte "cache" loop-copy for the remainder */
 	movl %edx,%ecx
-21:	movb (%rsi),%al
-22:	movb %al,(%rdi)
+.Lcun_1b_cache_copy_loop:
+40:	movb (%rsi),%al
+41:	movb %al,(%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz 21b
-23:	xorl %eax,%eax
+	jnz .Lcun_1b_cache_copy_loop
+
+	/* Finished copying; fence the prior stores */
+.Lcun_finish_copy:
+	xorl %eax,%eax
 	ASM_CLAC
 	sfence
 	ret
 
 	.section .fixup,"ax"
-30:	shll $6,%ecx
+.Lcun_fixup_4x8b_copy:
+	shll $6,%ecx
 	addl %ecx,%edx
-	jmp 60f
-40:	lea (%rdx,%rcx,8),%rdx
-	jmp 60f
-50:	movl %ecx,%edx
-60:	sfence
+	jmp .Lcun_fixup_handle_tail
+.Lcun_fixup_8b_copy:
+	lea (%rdx,%rcx,8),%rdx
+	jmp .Lcun_fixup_handle_tail
+.Lcun_fixup_1b_copy:
+	movl %ecx,%edx
+.Lcun_fixup_handle_tail:
+	sfence
 	jmp copy_user_handle_tail
 	.previous
 
-	_ASM_EXTABLE(1b,30b)
-	_ASM_EXTABLE(2b,30b)
-	_ASM_EXTABLE(3b,30b)
-	_ASM_EXTABLE(4b,30b)
-	_ASM_EXTABLE(5b,30b)
-	_ASM_EXTABLE(6b,30b)
-	_ASM_EXTABLE(7b,30b)
-	_ASM_EXTABLE(8b,30b)
-	_ASM_EXTABLE(9b,30b)
-	_ASM_EXTABLE(10b,30b)
-	_ASM_EXTABLE(11b,30b)
-	_ASM_EXTABLE(12b,30b)
-	_ASM_EXTABLE(13b,30b)
-	_ASM_EXTABLE(14b,30b)
-	_ASM_EXTABLE(15b,30b)
-	_ASM_EXTABLE(16b,30b)
-	_ASM_EXTABLE(18b,40b)
-	_ASM_EXTABLE(19b,40b)
-	_ASM_EXTABLE(21b,50b)
-	_ASM_EXTABLE(22b,50b)
+	_ASM_EXTABLE(1b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(2b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(3b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(4b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(5b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(6b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(7b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(8b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(9b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(10b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(11b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(12b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(13b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(14b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(15b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(16b,.Lcun_fixup_4x8b_copy)
+	_ASM_EXTABLE(20b,.Lcun_fixup_8b_copy)
+	_ASM_EXTABLE(21b,.Lcun_fixup_8b_copy)
+	_ASM_EXTABLE(40b,.Lcun_fixup_1b_copy)
+	_ASM_EXTABLE(41b,.Lcun_fixup_1b_copy)
 ENDPROC(__copy_user_nocache)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy
  2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani
  2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani
@ 2016-02-11 21:24 ` Toshi Kani
  2016-02-17 12:14   ` [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache () tip-bot for Toshi Kani
  1 sibling, 1 reply; 7+ messages in thread
From: Toshi Kani @ 2016-02-11 21:24 UTC (permalink / raw)
  To: tglx, mingo, hpa, bp, dan.j.williams
  Cc: ross.zwisler, vishal.l.verma, micah.parrish, brian.boylston, x86,
	linux-nvdimm, linux-kernel, Toshi Kani

Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices.  This problem
is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem
devices.  This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.

__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes).  The
BTT driver updates the BTT map table, which entry size is
4 bytes.  Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.

Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes.  The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.

Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
---
 arch/x86/lib/copy_user_64.S |   36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 23042ff..9228ce6 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -237,13 +237,14 @@ ENDPROC(copy_user_enhanced_fast_string)
  * Note: Cached memory copy is used when destination or size is not
  * naturally aligned. That is:
  *  - Require 8-byte alignment when size is 8 bytes or larger.
+ *  - Require 4-byte alignment when size is 4 bytes.
  */
 ENTRY(__copy_user_nocache)
 	ASM_STAC
 
-	/* If size is less than 8 bytes, goto byte copy */
+	/* If size is less than 8 bytes, goto 4-byte copy */
 	cmpl $8,%edx
-	jb .Lcun_1b_cache_copy_entry
+	jb .Lcun_4b_nocache_copy_entry
 
 	/* If destination is not 8-byte aligned, "cache" copy to align it */
 	ALIGN_DESTINATION
@@ -282,7 +283,7 @@ ENTRY(__copy_user_nocache)
 	movl %edx,%ecx
 	andl $7,%edx
 	shrl $3,%ecx
-	jz .Lcun_1b_cache_copy_entry	/* jump if count is 0 */
+	jz .Lcun_4b_nocache_copy_entry	/* jump if count is 0 */
 
 	/* Perform 8-byte nocache loop-copy */
 .Lcun_8b_nocache_copy_loop:
@@ -294,11 +295,33 @@ ENTRY(__copy_user_nocache)
 	jnz .Lcun_8b_nocache_copy_loop
 
 	/* If no byte left, we're done */
-.Lcun_1b_cache_copy_entry:
+.Lcun_4b_nocache_copy_entry:
+	andl %edx,%edx
+	jz .Lcun_finish_copy
+
+	/* If destination is not 4-byte aligned, goto byte copy */
+	movl %edi,%ecx
+	andl $3,%ecx
+	jnz .Lcun_1b_cache_copy_entry
+
+	/* Set 4-byte copy count (1 or 0) and remainder */
+	movl %edx,%ecx
+	andl $3,%edx
+	shrl $2,%ecx
+	jz .Lcun_1b_cache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 4-byte nocache copy */
+30:	movl (%rsi),%r8d
+31:	movnti %r8d,(%rdi)
+	leaq 4(%rsi),%rsi
+	leaq 4(%rdi),%rdi
+
+	/* If no byte left, we're done */
 	andl %edx,%edx
 	jz .Lcun_finish_copy
 
 	/* Perform byte "cache" loop-copy for the remainder */
+.Lcun_1b_cache_copy_entry:
 	movl %edx,%ecx
 .Lcun_1b_cache_copy_loop:
 40:	movb (%rsi),%al
@@ -323,6 +346,9 @@ ENTRY(__copy_user_nocache)
 .Lcun_fixup_8b_copy:
 	lea (%rdx,%rcx,8),%rdx
 	jmp .Lcun_fixup_handle_tail
+.Lcun_fixup_4b_copy:
+	lea (%rdx,%rcx,4),%rdx
+	jmp .Lcun_fixup_handle_tail
 .Lcun_fixup_1b_copy:
 	movl %ecx,%edx
 .Lcun_fixup_handle_tail:
@@ -348,6 +374,8 @@ ENTRY(__copy_user_nocache)
 	_ASM_EXTABLE(16b,.Lcun_fixup_4x8b_copy)
 	_ASM_EXTABLE(20b,.Lcun_fixup_8b_copy)
 	_ASM_EXTABLE(21b,.Lcun_fixup_8b_copy)
+	_ASM_EXTABLE(30b,.Lcun_fixup_4b_copy)
+	_ASM_EXTABLE(31b,.Lcun_fixup_4b_copy)
 	_ASM_EXTABLE(40b,.Lcun_fixup_1b_copy)
 	_ASM_EXTABLE(41b,.Lcun_fixup_1b_copy)
 ENDPROC(__copy_user_nocache)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache()
  2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani
@ 2016-02-17  8:02   ` Ingo Molnar
  2016-02-17 15:52     ` Toshi Kani
  2016-02-17 12:13   ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2016-02-17  8:02 UTC (permalink / raw)
  To: Toshi Kani
  Cc: tglx, mingo, hpa, bp, dan.j.williams, ross.zwisler,
	vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm,
	linux-kernel


* Toshi Kani <toshi.kani@hpe.com> wrote:

> Add comments to __copy_user_nocache() to clarify its procedures
> and alignment requirement.
> 
> Also change numeric branch target labels to named labels.  The
> labels begin with ".L" and prefix "cun" (Copy User Nocache) to
> keep them local and unique to the function.

So the .L labels are local, i.e. they are not emitted into the symbol table.

I.e. no need to name them globally!

I've done a s/Lcun_/L_/ over the patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani
  2016-02-17  8:02   ` Ingo Molnar
@ 2016-02-17 12:13   ` tip-bot for Toshi Kani
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Toshi Kani @ 2016-02-17 12:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, akpm, stable, mcgrof, bp, tglx, dvlasenk, torvalds,
	linux-kernel, mingo, toshi.kani, bp, toshi.kani, hpa, peterz,
	brgerst

Commit-ID:  ee9737c924706aaa72c2ead93e3ad5644681dc1c
Gitweb:     http://git.kernel.org/tip/ee9737c924706aaa72c2ead93e3ad5644681dc1c
Author:     Toshi Kani <toshi.kani@hpe.com>
AuthorDate: Thu, 11 Feb 2016 14:24:16 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 17 Feb 2016 09:10:22 +0100

x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable

Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirements.

Also change numeric branch target labels to named local labels.

No code changed:

 arch/x86/lib/copy_user_64.o:

    text    data     bss     dec     hex filename
    1239       0       0    1239     4d7 copy_user_64.o.before
    1239       0       0    1239     4d7 copy_user_64.o.after

 md5:
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brian.boylston@hpe.com
Cc: dan.j.williams@intel.com
Cc: linux-nvdimm@lists.01.org
Cc: micah.parrish@hpe.com
Cc: ross.zwisler@linux.intel.com
Cc: vishal.l.verma@intel.com
Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
[ Small readability edits and added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/lib/copy_user_64.S | 114 ++++++++++++++++++++++++++++----------------
 1 file changed, 73 insertions(+), 41 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 982ce34..a644aad 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -232,17 +232,30 @@ ENDPROC(copy_user_enhanced_fast_string)
 
 /*
  * copy_user_nocache - Uncached memory copy with exception handling
- * This will force destination/source out of cache for more performance.
+ * This will force destination out of cache for more performance.
+ *
+ * Note: Cached memory copy is used when destination or size is not
+ * naturally aligned. That is:
+ *  - Require 8-byte alignment when size is 8 bytes or larger.
  */
 ENTRY(__copy_user_nocache)
 	ASM_STAC
+
+	/* If size is less than 8 bytes, go to byte copy */
 	cmpl $8,%edx
-	jb 20f		/* less then 8 bytes, go to byte copy loop */
+	jb .L_1b_cache_copy_entry
+
+	/* If destination is not 8-byte aligned, "cache" copy to align it */
 	ALIGN_DESTINATION
+
+	/* Set 4x8-byte copy count and remainder */
 	movl %edx,%ecx
 	andl $63,%edx
 	shrl $6,%ecx
-	jz 17f
+	jz .L_8b_nocache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 4x8-byte nocache loop-copy */
+.L_4x8b_nocache_copy_loop:
 1:	movq (%rsi),%r8
 2:	movq 1*8(%rsi),%r9
 3:	movq 2*8(%rsi),%r10
@@ -262,60 +275,79 @@ ENTRY(__copy_user_nocache)
 	leaq 64(%rsi),%rsi
 	leaq 64(%rdi),%rdi
 	decl %ecx
-	jnz 1b
-17:	movl %edx,%ecx
+	jnz .L_4x8b_nocache_copy_loop
+
+	/* Set 8-byte copy count and remainder */
+.L_8b_nocache_copy_entry:
+	movl %edx,%ecx
 	andl $7,%edx
 	shrl $3,%ecx
-	jz 20f
-18:	movq (%rsi),%r8
-19:	movnti %r8,(%rdi)
+	jz .L_1b_cache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 8-byte nocache loop-copy */
+.L_8b_nocache_copy_loop:
+20:	movq (%rsi),%r8
+21:	movnti %r8,(%rdi)
 	leaq 8(%rsi),%rsi
 	leaq 8(%rdi),%rdi
 	decl %ecx
-	jnz 18b
-20:	andl %edx,%edx
-	jz 23f
+	jnz .L_8b_nocache_copy_loop
+
+	/* If no byte left, we're done */
+.L_1b_cache_copy_entry:
+	andl %edx,%edx
+	jz .L_finish_copy
+
+	/* Perform byte "cache" loop-copy for the remainder */
 	movl %edx,%ecx
-21:	movb (%rsi),%al
-22:	movb %al,(%rdi)
+.L_1b_cache_copy_loop:
+40:	movb (%rsi),%al
+41:	movb %al,(%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz 21b
-23:	xorl %eax,%eax
+	jnz .L_1b_cache_copy_loop
+
+	/* Finished copying; fence the prior stores */
+.L_finish_copy:
+	xorl %eax,%eax
 	ASM_CLAC
 	sfence
 	ret
 
 	.section .fixup,"ax"
-30:	shll $6,%ecx
+.L_fixup_4x8b_copy:
+	shll $6,%ecx
 	addl %ecx,%edx
-	jmp 60f
-40:	lea (%rdx,%rcx,8),%rdx
-	jmp 60f
-50:	movl %ecx,%edx
-60:	sfence
+	jmp .L_fixup_handle_tail
+.L_fixup_8b_copy:
+	lea (%rdx,%rcx,8),%rdx
+	jmp .L_fixup_handle_tail
+.L_fixup_1b_copy:
+	movl %ecx,%edx
+.L_fixup_handle_tail:
+	sfence
 	jmp copy_user_handle_tail
 	.previous
 
-	_ASM_EXTABLE(1b,30b)
-	_ASM_EXTABLE(2b,30b)
-	_ASM_EXTABLE(3b,30b)
-	_ASM_EXTABLE(4b,30b)
-	_ASM_EXTABLE(5b,30b)
-	_ASM_EXTABLE(6b,30b)
-	_ASM_EXTABLE(7b,30b)
-	_ASM_EXTABLE(8b,30b)
-	_ASM_EXTABLE(9b,30b)
-	_ASM_EXTABLE(10b,30b)
-	_ASM_EXTABLE(11b,30b)
-	_ASM_EXTABLE(12b,30b)
-	_ASM_EXTABLE(13b,30b)
-	_ASM_EXTABLE(14b,30b)
-	_ASM_EXTABLE(15b,30b)
-	_ASM_EXTABLE(16b,30b)
-	_ASM_EXTABLE(18b,40b)
-	_ASM_EXTABLE(19b,40b)
-	_ASM_EXTABLE(21b,50b)
-	_ASM_EXTABLE(22b,50b)
+	_ASM_EXTABLE(1b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(2b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(3b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(4b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(5b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(6b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(7b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(8b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(9b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(10b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(11b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(12b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(13b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(14b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(15b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(16b,.L_fixup_4x8b_copy)
+	_ASM_EXTABLE(20b,.L_fixup_8b_copy)
+	_ASM_EXTABLE(21b,.L_fixup_8b_copy)
+	_ASM_EXTABLE(40b,.L_fixup_1b_copy)
+	_ASM_EXTABLE(41b,.L_fixup_1b_copy)
 ENDPROC(__copy_user_nocache)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache ()
  2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani
@ 2016-02-17 12:14   ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Toshi Kani @ 2016-02-17 12:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, toshi.kani, luto, hpa, akpm, mingo, linux-kernel,
	stable, peterz, vishal.l.verma, bp, ross.zwisler, bp, toshi.kani,
	torvalds, dan.j.williams, brian.boylston, tglx, micah.parrish,
	mcgrof, brgerst

Commit-ID:  a82eee7424525e34e98d821dd059ce14560a1e35
Gitweb:     http://git.kernel.org/tip/a82eee7424525e34e98d821dd059ce14560a1e35
Author:     Toshi Kani <toshi.kani@hpe.com>
AuthorDate: Thu, 11 Feb 2016 14:24:17 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 17 Feb 2016 09:10:23 +0100

x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()

Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices.  This problem
is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem
devices.  This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.

__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes).  The
BTT driver updates the BTT map table, which entry size is
4 bytes.  Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.

Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes.  The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.

Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/lib/copy_user_64.S | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index a644aad..27f89c7 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -237,13 +237,14 @@ ENDPROC(copy_user_enhanced_fast_string)
  * Note: Cached memory copy is used when destination or size is not
  * naturally aligned. That is:
  *  - Require 8-byte alignment when size is 8 bytes or larger.
+ *  - Require 4-byte alignment when size is 4 bytes.
  */
 ENTRY(__copy_user_nocache)
 	ASM_STAC
 
-	/* If size is less than 8 bytes, go to byte copy */
+	/* If size is less than 8 bytes, go to 4-byte copy */
 	cmpl $8,%edx
-	jb .L_1b_cache_copy_entry
+	jb .L_4b_nocache_copy_entry
 
 	/* If destination is not 8-byte aligned, "cache" copy to align it */
 	ALIGN_DESTINATION
@@ -282,7 +283,7 @@ ENTRY(__copy_user_nocache)
 	movl %edx,%ecx
 	andl $7,%edx
 	shrl $3,%ecx
-	jz .L_1b_cache_copy_entry	/* jump if count is 0 */
+	jz .L_4b_nocache_copy_entry	/* jump if count is 0 */
 
 	/* Perform 8-byte nocache loop-copy */
 .L_8b_nocache_copy_loop:
@@ -294,11 +295,33 @@ ENTRY(__copy_user_nocache)
 	jnz .L_8b_nocache_copy_loop
 
 	/* If no byte left, we're done */
-.L_1b_cache_copy_entry:
+.L_4b_nocache_copy_entry:
+	andl %edx,%edx
+	jz .L_finish_copy
+
+	/* If destination is not 4-byte aligned, go to byte copy: */
+	movl %edi,%ecx
+	andl $3,%ecx
+	jnz .L_1b_cache_copy_entry
+
+	/* Set 4-byte copy count (1 or 0) and remainder */
+	movl %edx,%ecx
+	andl $3,%edx
+	shrl $2,%ecx
+	jz .L_1b_cache_copy_entry	/* jump if count is 0 */
+
+	/* Perform 4-byte nocache copy: */
+30:	movl (%rsi),%r8d
+31:	movnti %r8d,(%rdi)
+	leaq 4(%rsi),%rsi
+	leaq 4(%rdi),%rdi
+
+	/* If no bytes left, we're done: */
 	andl %edx,%edx
 	jz .L_finish_copy
 
 	/* Perform byte "cache" loop-copy for the remainder */
+.L_1b_cache_copy_entry:
 	movl %edx,%ecx
 .L_1b_cache_copy_loop:
 40:	movb (%rsi),%al
@@ -323,6 +346,9 @@ ENTRY(__copy_user_nocache)
 .L_fixup_8b_copy:
 	lea (%rdx,%rcx,8),%rdx
 	jmp .L_fixup_handle_tail
+.L_fixup_4b_copy:
+	lea (%rdx,%rcx,4),%rdx
+	jmp .L_fixup_handle_tail
 .L_fixup_1b_copy:
 	movl %ecx,%edx
 .L_fixup_handle_tail:
@@ -348,6 +374,8 @@ ENTRY(__copy_user_nocache)
 	_ASM_EXTABLE(16b,.L_fixup_4x8b_copy)
 	_ASM_EXTABLE(20b,.L_fixup_8b_copy)
 	_ASM_EXTABLE(21b,.L_fixup_8b_copy)
+	_ASM_EXTABLE(30b,.L_fixup_4b_copy)
+	_ASM_EXTABLE(31b,.L_fixup_4b_copy)
 	_ASM_EXTABLE(40b,.L_fixup_1b_copy)
 	_ASM_EXTABLE(41b,.L_fixup_1b_copy)
 ENDPROC(__copy_user_nocache)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache()
  2016-02-17  8:02   ` Ingo Molnar
@ 2016-02-17 15:52     ` Toshi Kani
  0 siblings, 0 replies; 7+ messages in thread
From: Toshi Kani @ 2016-02-17 15:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, mingo, hpa, bp, dan.j.williams, ross.zwisler,
	vishal.l.verma, micah.parrish, brian.boylston, x86, linux-nvdimm,
	linux-kernel

On Wed, 2016-02-17 at 09:02 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hpe.com> wrote:
> 
> > Add comments to __copy_user_nocache() to clarify its procedures
> > and alignment requirement.
> > 
> > Also change numeric branch target labels to named labels.  The
> > labels begin with ".L" and prefix "cun" (Copy User Nocache) to
> > keep them local and unique to the function.
> 
> So the .L labels are local, i.e. they are not emitted into the symbol
> table.
> 
> I.e. no need to name them globally!

Right, but I thought there is risk of conflicting the names with other copy
functions in the file when they also start using descriptive labels.  For
instance, ".L_finish_copy" can be easily used by other copy functions as
well.


> I've done a s/Lcun_/L_/ over the patch.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-17 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11 21:24 [PATCH v3 0/2] Fix BTT data corruptions after crash Toshi Kani
2016-02-11 21:24 ` [PATCH v3 1/2] x86/lib/copy_user_64.S: cleanup __copy_user_nocache() Toshi Kani
2016-02-17  8:02   ` Ingo Molnar
2016-02-17 15:52     ` Toshi Kani
2016-02-17 12:13   ` [tip:x86/urgent] x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable tip-bot for Toshi Kani
2016-02-11 21:24 ` [PATCH v3 2/2] x86/lib/copy_user_64.S: Handle 4-byte nocache copy Toshi Kani
2016-02-17 12:14   ` [tip:x86/urgent] x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache () tip-bot for Toshi Kani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).