All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 20:45 ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

Currently memcpy_mcsafe() is only deployed in the pmem driver when
reading through a /dev/pmemX block device. However, a filesystem in dax
mode mounted on a /dev/pmemX block device will bypass the block layer
and the driver for reads. The filesystem-dax (fsdax) read case uses
dax_direct_access() and copy_to_iter() to bypass the block layer.

The result of the bypass is that the kernel treats machine checks during
read as system fatal (reboot) when they could simply be flagged as an
I/O error, similar to performing reads through the pmem driver. Prevent
this fatal condition by deploying memcpy_mcsafe() in the fsdax read
path.

The main differences between this copy_to_user_mcsafe() and
copy_user_generic_unrolled() are:

* Typical tail/residue handling after a fault retries the copy
  byte-by-byte until the fault happens again. Re-triggering machine
  checks is potentially fatal so the implementation uses source alignment
  and poison alignment assumptions to limit the residue copying to known
  good bytes.

* SMAP coordination is handled external to the assembly with
  __uaccess_begin() and __uaccess_end().

* ITER_KVEC and ITER_BVEC can now end prematurely with an error.

The new MCSAFE_DEBUG facility is proposed as a way to unit test the
exception handling without requiring an ACPI EINJ capable platform.

Thanks to Tony Luck for his review, test, and implementation ideas on
initial versions of this patchset.

---

Dan Williams (6):
      x86, memcpy_mcsafe: update labels in support of write fault handling
      x86, memcpy_mcsafe: return bytes remaining
      x86, memcpy_mcsafe: add write-protection-fault handling
      x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
      dax: use copy_to_iter_mcsafe() in dax_iomap_actor()
      x86, nfit_test: unit test for memcpy_mcsafe()


 arch/x86/Kconfig.debug              |    3 +
 arch/x86/include/asm/mcsafe_debug.h |   50 ++++++++++
 arch/x86/include/asm/string_64.h    |    8 +-
 arch/x86/include/asm/uaccess_64.h   |   14 +++
 arch/x86/lib/memcpy_64.S            |  178 ++++++++++++++++++++++++++++-------
 arch/x86/lib/usercopy_64.c          |   12 ++
 drivers/nvdimm/claim.c              |    3 -
 drivers/nvdimm/pmem.c               |    6 +
 fs/dax.c                            |   20 ++--
 include/linux/string.h              |    4 -
 include/linux/uio.h                 |   10 ++
 lib/iov_iter.c                      |   59 ++++++++++++
 tools/testing/nvdimm/test/nfit.c    |   48 +++++++++
 13 files changed, 360 insertions(+), 55 deletions(-)
 create mode 100644 arch/x86/include/asm/mcsafe_debug.h
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 20:45 ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: Tony Luck, Peter Zijlstra, Linus Torvalds, Borislav Petkov, x86,
	Thomas Gleixner, Andy Lutomirski, Ingo Molnar, Al Viro,
	Andrew Morton, linux-kernel, tony.luck

Currently memcpy_mcsafe() is only deployed in the pmem driver when
reading through a /dev/pmemX block device. However, a filesystem in dax
mode mounted on a /dev/pmemX block device will bypass the block layer
and the driver for reads. The filesystem-dax (fsdax) read case uses
dax_direct_access() and copy_to_iter() to bypass the block layer.

The result of the bypass is that the kernel treats machine checks during
read as system fatal (reboot) when they could simply be flagged as an
I/O error, similar to performing reads through the pmem driver. Prevent
this fatal condition by deploying memcpy_mcsafe() in the fsdax read
path.

The main differences between this copy_to_user_mcsafe() and
copy_user_generic_unrolled() are:

* Typical tail/residue handling after a fault retries the copy
  byte-by-byte until the fault happens again. Re-triggering machine
  checks is potentially fatal so the implementation uses source alignment
  and poison alignment assumptions to limit the residue copying to known
  good bytes.

* SMAP coordination is handled external to the assembly with
  __uaccess_begin() and __uaccess_end().

* ITER_KVEC and ITER_BVEC can now end prematurely with an error.

The new MCSAFE_DEBUG facility is proposed as a way to unit test the
exception handling without requiring an ACPI EINJ capable platform.

Thanks to Tony Luck for his review, test, and implementation ideas on
initial versions of this patchset.

---

Dan Williams (6):
      x86, memcpy_mcsafe: update labels in support of write fault handling
      x86, memcpy_mcsafe: return bytes remaining
      x86, memcpy_mcsafe: add write-protection-fault handling
      x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
      dax: use copy_to_iter_mcsafe() in dax_iomap_actor()
      x86, nfit_test: unit test for memcpy_mcsafe()


 arch/x86/Kconfig.debug              |    3 +
 arch/x86/include/asm/mcsafe_debug.h |   50 ++++++++++
 arch/x86/include/asm/string_64.h    |    8 +-
 arch/x86/include/asm/uaccess_64.h   |   14 +++
 arch/x86/lib/memcpy_64.S            |  178 ++++++++++++++++++++++++++++-------
 arch/x86/lib/usercopy_64.c          |   12 ++
 drivers/nvdimm/claim.c              |    3 -
 drivers/nvdimm/pmem.c               |    6 +
 fs/dax.c                            |   20 ++--
 include/linux/string.h              |    4 -
 include/linux/uio.h                 |   10 ++
 lib/iov_iter.c                      |   59 ++++++++++++
 tools/testing/nvdimm/test/nfit.c    |   48 +++++++++
 13 files changed, 360 insertions(+), 55 deletions(-)
 create mode 100644 arch/x86/include/asm/mcsafe_debug.h

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/6] x86, memcpy_mcsafe: update labels in support of write fault handling
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

The memcpy_mcsafe() implementation handles CPU exceptions when reading
from the source address. Before it can be used for user copies it needs
to grow support for handling write faults. In preparation for adding
that exception handling update the labels for the read cache word X case
(.L_cache_rX) and write cache word X case (.L_cache_wX).

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/memcpy_64.S |   71 ++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 9a53a06e5a3e..6a416a7df8ee 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -204,13 +204,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 	subl $8, %ecx
 	negl %ecx
 	subl %ecx, %edx
-.L_copy_leading_bytes:
+.L_read_leading_bytes:
 	movb (%rsi), %al
+.L_write_leading_bytes:
 	movb %al, (%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz .L_copy_leading_bytes
+	jnz .L_read_leading_bytes
 
 .L_8byte_aligned:
 	/* Figure out how many whole cache lines (64-bytes) to copy */
@@ -220,26 +221,26 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_cache_lines
 
 	/* Loop copying whole cache lines */
-.L_cache_w0: movq (%rsi), %r8
-.L_cache_w1: movq 1*8(%rsi), %r9
-.L_cache_w2: movq 2*8(%rsi), %r10
-.L_cache_w3: movq 3*8(%rsi), %r11
-	movq %r8, (%rdi)
-	movq %r9, 1*8(%rdi)
-	movq %r10, 2*8(%rdi)
-	movq %r11, 3*8(%rdi)
-.L_cache_w4: movq 4*8(%rsi), %r8
-.L_cache_w5: movq 5*8(%rsi), %r9
-.L_cache_w6: movq 6*8(%rsi), %r10
-.L_cache_w7: movq 7*8(%rsi), %r11
-	movq %r8, 4*8(%rdi)
-	movq %r9, 5*8(%rdi)
-	movq %r10, 6*8(%rdi)
-	movq %r11, 7*8(%rdi)
+.L_cache_r0: movq (%rsi), %r8
+.L_cache_r1: movq 1*8(%rsi), %r9
+.L_cache_r2: movq 2*8(%rsi), %r10
+.L_cache_r3: movq 3*8(%rsi), %r11
+.L_cache_w0: movq %r8, (%rdi)
+.L_cache_w1: movq %r9, 1*8(%rdi)
+.L_cache_w2: movq %r10, 2*8(%rdi)
+.L_cache_w3: movq %r11, 3*8(%rdi)
+.L_cache_r4: movq 4*8(%rsi), %r8
+.L_cache_r5: movq 5*8(%rsi), %r9
+.L_cache_r6: movq 6*8(%rsi), %r10
+.L_cache_r7: movq 7*8(%rsi), %r11
+.L_cache_w4: movq %r8, 4*8(%rdi)
+.L_cache_w5: movq %r9, 5*8(%rdi)
+.L_cache_w6: movq %r10, 6*8(%rdi)
+.L_cache_w7: movq %r11, 7*8(%rdi)
 	leaq 64(%rsi), %rsi
 	leaq 64(%rdi), %rdi
 	decl %ecx
-	jnz .L_cache_w0
+	jnz .L_cache_r0
 
 	/* Are there any trailing 8-byte words? */
 .L_no_whole_cache_lines:
@@ -249,13 +250,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_words
 
 	/* Copy trailing words */
-.L_copy_trailing_words:
+.L_read_trailing_words:
 	movq (%rsi), %r8
+.L_write_trailing_words:
 	mov %r8, (%rdi)
 	leaq 8(%rsi), %rsi
 	leaq 8(%rdi), %rdi
 	decl %ecx
-	jnz .L_copy_trailing_words
+	jnz .L_read_trailing_words
 
 	/* Any trailing bytes? */
 .L_no_whole_words:
@@ -264,13 +266,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 
 	/* Copy trailing bytes */
 	movl %edx, %ecx
-.L_copy_trailing_bytes:
+.L_read_trailing_bytes:
 	movb (%rsi), %al
+.L_write_trailing_bytes:
 	movb %al, (%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz .L_copy_trailing_bytes
+	jnz .L_read_trailing_bytes
 
 	/* Copy successful. Return zero */
 .L_done_memcpy_trap:
@@ -287,15 +290,15 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 
 	.previous
 
-	_ASM_EXTABLE_FAULT(.L_copy_leading_bytes, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w0, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w1, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w2, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w3, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w4, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w5, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w6, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w7, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_copy_trailing_words, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_copy_trailing_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r0, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r1, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r2, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r3, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r4, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r5, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r6, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r7, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .L_memcpy_mcsafe_fail)
 #endif

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 1/6] x86, memcpy_mcsafe: update labels in support of write fault handling
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, Andrew Morton, Linus Torvalds,
	Tony Luck, linux-kernel, tony.luck

The memcpy_mcsafe() implementation handles CPU exceptions when reading
from the source address. Before it can be used for user copies it needs
to grow support for handling write faults. In preparation for adding
that exception handling update the labels for the read cache word X case
(.L_cache_rX) and write cache word X case (.L_cache_wX).

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/memcpy_64.S |   71 ++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 9a53a06e5a3e..6a416a7df8ee 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -204,13 +204,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 	subl $8, %ecx
 	negl %ecx
 	subl %ecx, %edx
-.L_copy_leading_bytes:
+.L_read_leading_bytes:
 	movb (%rsi), %al
+.L_write_leading_bytes:
 	movb %al, (%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz .L_copy_leading_bytes
+	jnz .L_read_leading_bytes
 
 .L_8byte_aligned:
 	/* Figure out how many whole cache lines (64-bytes) to copy */
@@ -220,26 +221,26 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_cache_lines
 
 	/* Loop copying whole cache lines */
-.L_cache_w0: movq (%rsi), %r8
-.L_cache_w1: movq 1*8(%rsi), %r9
-.L_cache_w2: movq 2*8(%rsi), %r10
-.L_cache_w3: movq 3*8(%rsi), %r11
-	movq %r8, (%rdi)
-	movq %r9, 1*8(%rdi)
-	movq %r10, 2*8(%rdi)
-	movq %r11, 3*8(%rdi)
-.L_cache_w4: movq 4*8(%rsi), %r8
-.L_cache_w5: movq 5*8(%rsi), %r9
-.L_cache_w6: movq 6*8(%rsi), %r10
-.L_cache_w7: movq 7*8(%rsi), %r11
-	movq %r8, 4*8(%rdi)
-	movq %r9, 5*8(%rdi)
-	movq %r10, 6*8(%rdi)
-	movq %r11, 7*8(%rdi)
+.L_cache_r0: movq (%rsi), %r8
+.L_cache_r1: movq 1*8(%rsi), %r9
+.L_cache_r2: movq 2*8(%rsi), %r10
+.L_cache_r3: movq 3*8(%rsi), %r11
+.L_cache_w0: movq %r8, (%rdi)
+.L_cache_w1: movq %r9, 1*8(%rdi)
+.L_cache_w2: movq %r10, 2*8(%rdi)
+.L_cache_w3: movq %r11, 3*8(%rdi)
+.L_cache_r4: movq 4*8(%rsi), %r8
+.L_cache_r5: movq 5*8(%rsi), %r9
+.L_cache_r6: movq 6*8(%rsi), %r10
+.L_cache_r7: movq 7*8(%rsi), %r11
+.L_cache_w4: movq %r8, 4*8(%rdi)
+.L_cache_w5: movq %r9, 5*8(%rdi)
+.L_cache_w6: movq %r10, 6*8(%rdi)
+.L_cache_w7: movq %r11, 7*8(%rdi)
 	leaq 64(%rsi), %rsi
 	leaq 64(%rdi), %rdi
 	decl %ecx
-	jnz .L_cache_w0
+	jnz .L_cache_r0
 
 	/* Are there any trailing 8-byte words? */
 .L_no_whole_cache_lines:
@@ -249,13 +250,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_words
 
 	/* Copy trailing words */
-.L_copy_trailing_words:
+.L_read_trailing_words:
 	movq (%rsi), %r8
+.L_write_trailing_words:
 	mov %r8, (%rdi)
 	leaq 8(%rsi), %rsi
 	leaq 8(%rdi), %rdi
 	decl %ecx
-	jnz .L_copy_trailing_words
+	jnz .L_read_trailing_words
 
 	/* Any trailing bytes? */
 .L_no_whole_words:
@@ -264,13 +266,14 @@ ENTRY(memcpy_mcsafe_unrolled)
 
 	/* Copy trailing bytes */
 	movl %edx, %ecx
-.L_copy_trailing_bytes:
+.L_read_trailing_bytes:
 	movb (%rsi), %al
+.L_write_trailing_bytes:
 	movb %al, (%rdi)
 	incq %rsi
 	incq %rdi
 	decl %ecx
-	jnz .L_copy_trailing_bytes
+	jnz .L_read_trailing_bytes
 
 	/* Copy successful. Return zero */
 .L_done_memcpy_trap:
@@ -287,15 +290,15 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 
 	.previous
 
-	_ASM_EXTABLE_FAULT(.L_copy_leading_bytes, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w0, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w1, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w2, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w3, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w4, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w5, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w6, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w7, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_copy_trailing_words, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_copy_trailing_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r0, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r1, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r2, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r3, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r4, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r5, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r6, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_r7, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .L_memcpy_mcsafe_fail)
 #endif

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/6] x86, memcpy_mcsafe: return bytes remaining
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

Machine check safe memory copies are currently deployed in the pmem
driver whenever reading from persistent memory media, so that -EIO is
returned rather than triggering a kernel panic. While this protects most
pmem accesses, it is not complete in the filesystem-dax case. When
filesystem-dax is enabled reads may bypass the block layer and the
driver via dax_iomap_actor() and its usage of copy_to_iter().

In preparation for creating a copy_to_iter() variant that can handle
machine checks, teach memcpy_mcsafe() to return the number of bytes
remaining rather than -EFAULT when an exception occurs.

Given that the source buffer is aligned to 8-bytes and that x86 reports
poison in terms of cachelines, we can assume that all reads faults occur
at cacheline boundaries. When an exception occurs we have succeeded in
reading some data before the poisoned cacheline. mcsafe_handle_tail() is
introduced as a common helper to complete the copy operation on the good
data while also being careful to limit the accesses to the known good
cachelines to limit reduce the chance for additional machine check
exceptions.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/string_64.h  |    8 ++-
 arch/x86/include/asm/uaccess_64.h |    3 +
 arch/x86/lib/memcpy_64.S          |   85 +++++++++++++++++++++++++++++++------
 arch/x86/lib/usercopy_64.c        |   12 +++++
 drivers/nvdimm/claim.c            |    3 +
 drivers/nvdimm/pmem.c             |    6 +--
 include/linux/string.h            |    4 +-
 7 files changed, 98 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 533f74c300c2..92ee5e187113 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -116,7 +116,8 @@ int strcmp(const char *cs, const char *ct);
 #endif
 
 #define __HAVE_ARCH_MEMCPY_MCSAFE 1
-__must_check int memcpy_mcsafe_unrolled(void *dst, const void *src, size_t cnt);
+__must_check unsigned long memcpy_mcsafe_unrolled(void *dst, const void *src,
+		size_t cnt);
 DECLARE_STATIC_KEY_FALSE(mcsafe_key);
 
 /**
@@ -131,9 +132,10 @@ DECLARE_STATIC_KEY_FALSE(mcsafe_key);
  * actually do machine check recovery. Everyone else can just
  * use memcpy().
  *
- * Return 0 for success, -EFAULT for fail
+ * Return 0 for success, or number of bytes not copied if there was an
+ * exception.
  */
-static __always_inline __must_check int
+static __always_inline __must_check unsigned long
 memcpy_mcsafe(void *dst, const void *src, size_t cnt)
 {
 #ifdef CONFIG_X86_MCE
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 62546b3a398e..c064a77e8fcb 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -194,4 +194,7 @@ __copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
 unsigned long
 copy_user_handle_tail(char *to, char *from, unsigned len);
 
+unsigned long
+mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit);
+
 #endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 6a416a7df8ee..97b772fcf62f 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -283,22 +283,79 @@ ENDPROC(memcpy_mcsafe_unrolled)
 EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 
 	.section .fixup, "ax"
-	/* Return -EFAULT for any failure */
-.L_memcpy_mcsafe_fail:
-	mov	$-EFAULT, %rax
+	/* Return number of bytes not copied for any failure */
+
+	/*
+	 * For .E_cache_{1,2,3} we have successfully read {8,16,24}
+	 * bytes before crossing into the poison cacheline. Arrange for
+	 * mcsafe_handle_tail to write those {8,16,24} bytes to the
+	 * destination without re-triggering the machine check. %ecx
+	 * contains the limit and %edx contains total bytes remaining.
+	 */
+.E_cache_1:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$8, %ecx
+	jmp mcsafe_handle_tail
+.E_cache_2:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$16, %ecx
+	jmp mcsafe_handle_tail
+.E_cache_3:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$24, %ecx
+	jmp mcsafe_handle_tail
+	/*
+	 * In contrast to .E_cache_{1,2,3}, .E_cache_{5,6,7} have
+	 * successfully copied 32-bytes before crossing into the
+	 * poisoned cacheline.
+	 */
+.E_cache_5:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$8, %ecx
+	jmp .E_cache_upper
+.E_cache_6:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$16, %ecx
+	jmp .E_cache_upper
+.E_cache_7:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$24, %ecx
+	jmp .E_cache_upper
+.E_cache_upper:
+	addq	$32, %rsi
+	addq	$32, %rdi
+	subl	$32, %edx
+	jmp mcsafe_handle_tail
+.E_trailing_words:
+	shll	$3, %ecx
+	jmp .E_leading_bytes
+.E_cache_4:
+	subl	$32, %edx
+.E_cache_0:
+	shll	$6, %ecx
+.E_leading_bytes:
+	addl	%edx, %ecx
+.E_trailing_bytes:
+	mov	%ecx, %eax
 	ret
 
 	.previous
 
-	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r0, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r1, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r2, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r3, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r4, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r5, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r6, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r7, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE_FAULT(.L_cache_r0, .E_cache_0)
+	_ASM_EXTABLE_FAULT(.L_cache_r1, .E_cache_1)
+	_ASM_EXTABLE_FAULT(.L_cache_r2, .E_cache_2)
+	_ASM_EXTABLE_FAULT(.L_cache_r3, .E_cache_3)
+	_ASM_EXTABLE_FAULT(.L_cache_r4, .E_cache_4)
+	_ASM_EXTABLE_FAULT(.L_cache_r5, .E_cache_5)
+	_ASM_EXTABLE_FAULT(.L_cache_r6, .E_cache_6)
+	_ASM_EXTABLE_FAULT(.L_cache_r7, .E_cache_7)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .E_trailing_words)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
 #endif
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 75d3776123cc..e2bcc7d85436 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -75,6 +75,18 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
 	return len;
 }
 
+__visible unsigned long
+mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit)
+{
+	for (; len && limit; --len, --limit, to++) {
+		unsigned long rem = memcpy_mcsafe_unrolled(to, from, 1);
+
+		if (rem)
+			break;
+	}
+	return len;
+}
+
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 /**
  * clean_cache_range - write back a cache range with CLWB
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 30852270484f..2e96b34bc936 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -276,7 +276,8 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	if (rw == READ) {
 		if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
 			return -EIO;
-		return memcpy_mcsafe(buf, nsio->addr + offset, size);
+		if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0)
+			return -EIO;
 	}
 
 	if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..e023d6aa22b5 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -101,15 +101,15 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
 		void *pmem_addr, unsigned int len)
 {
 	unsigned int chunk;
-	int rc;
+	unsigned long rem;
 	void *mem;
 
 	while (len) {
 		mem = kmap_atomic(page);
 		chunk = min_t(unsigned int, len, PAGE_SIZE);
-		rc = memcpy_mcsafe(mem + off, pmem_addr, chunk);
+		rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
 		kunmap_atomic(mem);
-		if (rc)
+		if (rem)
 			return BLK_STS_IOERR;
 		len -= chunk;
 		off = 0;
diff --git a/include/linux/string.h b/include/linux/string.h
index dd39a690c841..4a5a0eb7df51 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -147,8 +147,8 @@ extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
 #endif
 #ifndef __HAVE_ARCH_MEMCPY_MCSAFE
-static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
-		size_t cnt)
+static inline __must_check unsigned long memcpy_mcsafe(void *dst,
+		const void *src, size_t cnt)
 {
 	memcpy(dst, src, cnt);
 	return 0;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/6] x86, memcpy_mcsafe: return bytes remaining
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Tony Luck, Al Viro,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, linux-kernel, tony.luck

Machine check safe memory copies are currently deployed in the pmem
driver whenever reading from persistent memory media, so that -EIO is
returned rather than triggering a kernel panic. While this protects most
pmem accesses, it is not complete in the filesystem-dax case. When
filesystem-dax is enabled reads may bypass the block layer and the
driver via dax_iomap_actor() and its usage of copy_to_iter().

In preparation for creating a copy_to_iter() variant that can handle
machine checks, teach memcpy_mcsafe() to return the number of bytes
remaining rather than -EFAULT when an exception occurs.

Given that the source buffer is aligned to 8-bytes and that x86 reports
poison in terms of cachelines, we can assume that all reads faults occur
at cacheline boundaries. When an exception occurs we have succeeded in
reading some data before the poisoned cacheline. mcsafe_handle_tail() is
introduced as a common helper to complete the copy operation on the good
data while also being careful to limit the accesses to the known good
cachelines to limit reduce the chance for additional machine check
exceptions.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/string_64.h  |    8 ++-
 arch/x86/include/asm/uaccess_64.h |    3 +
 arch/x86/lib/memcpy_64.S          |   85 +++++++++++++++++++++++++++++++------
 arch/x86/lib/usercopy_64.c        |   12 +++++
 drivers/nvdimm/claim.c            |    3 +
 drivers/nvdimm/pmem.c             |    6 +--
 include/linux/string.h            |    4 +-
 7 files changed, 98 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 533f74c300c2..92ee5e187113 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -116,7 +116,8 @@ int strcmp(const char *cs, const char *ct);
 #endif
 
 #define __HAVE_ARCH_MEMCPY_MCSAFE 1
-__must_check int memcpy_mcsafe_unrolled(void *dst, const void *src, size_t cnt);
+__must_check unsigned long memcpy_mcsafe_unrolled(void *dst, const void *src,
+		size_t cnt);
 DECLARE_STATIC_KEY_FALSE(mcsafe_key);
 
 /**
@@ -131,9 +132,10 @@ DECLARE_STATIC_KEY_FALSE(mcsafe_key);
  * actually do machine check recovery. Everyone else can just
  * use memcpy().
  *
- * Return 0 for success, -EFAULT for fail
+ * Return 0 for success, or number of bytes not copied if there was an
+ * exception.
  */
-static __always_inline __must_check int
+static __always_inline __must_check unsigned long
 memcpy_mcsafe(void *dst, const void *src, size_t cnt)
 {
 #ifdef CONFIG_X86_MCE
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 62546b3a398e..c064a77e8fcb 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -194,4 +194,7 @@ __copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
 unsigned long
 copy_user_handle_tail(char *to, char *from, unsigned len);
 
+unsigned long
+mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit);
+
 #endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 6a416a7df8ee..97b772fcf62f 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -283,22 +283,79 @@ ENDPROC(memcpy_mcsafe_unrolled)
 EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 
 	.section .fixup, "ax"
-	/* Return -EFAULT for any failure */
-.L_memcpy_mcsafe_fail:
-	mov	$-EFAULT, %rax
+	/* Return number of bytes not copied for any failure */
+
+	/*
+	 * For .E_cache_{1,2,3} we have successfully read {8,16,24}
+	 * bytes before crossing into the poison cacheline. Arrange for
+	 * mcsafe_handle_tail to write those {8,16,24} bytes to the
+	 * destination without re-triggering the machine check. %ecx
+	 * contains the limit and %edx contains total bytes remaining.
+	 */
+.E_cache_1:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$8, %ecx
+	jmp mcsafe_handle_tail
+.E_cache_2:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$16, %ecx
+	jmp mcsafe_handle_tail
+.E_cache_3:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$24, %ecx
+	jmp mcsafe_handle_tail
+	/*
+	 * In contrast to .E_cache_{1,2,3}, .E_cache_{5,6,7} have
+	 * successfully copied 32-bytes before crossing into the
+	 * poisoned cacheline.
+	 */
+.E_cache_5:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$8, %ecx
+	jmp .E_cache_upper
+.E_cache_6:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$16, %ecx
+	jmp .E_cache_upper
+.E_cache_7:
+	shll	$6, %ecx
+	addl	%ecx, %edx
+	movl	$24, %ecx
+	jmp .E_cache_upper
+.E_cache_upper:
+	addq	$32, %rsi
+	addq	$32, %rdi
+	subl	$32, %edx
+	jmp mcsafe_handle_tail
+.E_trailing_words:
+	shll	$3, %ecx
+	jmp .E_leading_bytes
+.E_cache_4:
+	subl	$32, %edx
+.E_cache_0:
+	shll	$6, %ecx
+.E_leading_bytes:
+	addl	%edx, %ecx
+.E_trailing_bytes:
+	mov	%ecx, %eax
 	ret
 
 	.previous
 
-	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r0, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r1, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r2, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r3, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r4, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r5, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r6, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_r7, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE_FAULT(.L_cache_r0, .E_cache_0)
+	_ASM_EXTABLE_FAULT(.L_cache_r1, .E_cache_1)
+	_ASM_EXTABLE_FAULT(.L_cache_r2, .E_cache_2)
+	_ASM_EXTABLE_FAULT(.L_cache_r3, .E_cache_3)
+	_ASM_EXTABLE_FAULT(.L_cache_r4, .E_cache_4)
+	_ASM_EXTABLE_FAULT(.L_cache_r5, .E_cache_5)
+	_ASM_EXTABLE_FAULT(.L_cache_r6, .E_cache_6)
+	_ASM_EXTABLE_FAULT(.L_cache_r7, .E_cache_7)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .E_trailing_words)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
 #endif
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 75d3776123cc..e2bcc7d85436 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -75,6 +75,18 @@ copy_user_handle_tail(char *to, char *from, unsigned len)
 	return len;
 }
 
+__visible unsigned long
+mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit)
+{
+	for (; len && limit; --len, --limit, to++) {
+		unsigned long rem = memcpy_mcsafe_unrolled(to, from, 1);
+
+		if (rem)
+			break;
+	}
+	return len;
+}
+
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 /**
  * clean_cache_range - write back a cache range with CLWB
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 30852270484f..2e96b34bc936 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -276,7 +276,8 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	if (rw == READ) {
 		if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
 			return -EIO;
-		return memcpy_mcsafe(buf, nsio->addr + offset, size);
+		if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0)
+			return -EIO;
 	}
 
 	if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..e023d6aa22b5 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -101,15 +101,15 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
 		void *pmem_addr, unsigned int len)
 {
 	unsigned int chunk;
-	int rc;
+	unsigned long rem;
 	void *mem;
 
 	while (len) {
 		mem = kmap_atomic(page);
 		chunk = min_t(unsigned int, len, PAGE_SIZE);
-		rc = memcpy_mcsafe(mem + off, pmem_addr, chunk);
+		rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
 		kunmap_atomic(mem);
-		if (rc)
+		if (rem)
 			return BLK_STS_IOERR;
 		len -= chunk;
 		off = 0;
diff --git a/include/linux/string.h b/include/linux/string.h
index dd39a690c841..4a5a0eb7df51 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -147,8 +147,8 @@ extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
 #endif
 #ifndef __HAVE_ARCH_MEMCPY_MCSAFE
-static inline __must_check int memcpy_mcsafe(void *dst, const void *src,
-		size_t cnt)
+static inline __must_check unsigned long memcpy_mcsafe(void *dst,
+		const void *src, size_t cnt)
 {
 	memcpy(dst, src, cnt);
 	return 0;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/6] x86, memcpy_mcsafe: add write-protection-fault handling
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

In preparation for using memcpy_mcsafe() to handle user copies it needs
to be to handle write-protection faults while writing user pages. Add
MMU-fault handlers alongside the machine-check exception handlers.

Note that the machine check fault exception handling makes assumptions
about source buffer alignment and poison alignment. In the write fault
case, given the destination buffer is arbitrarily aligned, it needs a
separate / additional fault handling approach. The mcsafe_handle_tail()
helper is reused. The @limit argument is set to @len since there is no
safety concern about retriggering an MMU fault, and this simplifies the
assembly.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/memcpy_64.S |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 97b772fcf62f..fc9c1f594c71 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -345,6 +345,16 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 	mov	%ecx, %eax
 	ret
 
+.E_write_cache_X:
+	shll	$6, %ecx
+	jmp	.E_handle_tail
+.E_write_trailing_words:
+	shll	$3, %ecx
+.E_handle_tail:
+	addl	%edx, %ecx
+	movl	%ecx, %edx
+	jmp mcsafe_handle_tail
+
 	.previous
 
 	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
@@ -358,4 +368,15 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 	_ASM_EXTABLE_FAULT(.L_cache_r7, .E_cache_7)
 	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .E_trailing_words)
 	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
+	_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE(.L_cache_w0, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w1, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w2, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w3, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w4, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w5, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w6, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w7, .E_write_cache_X)
+	_ASM_EXTABLE(.L_write_trailing_words, .E_write_trailing_words)
+	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
 #endif

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/6] x86, memcpy_mcsafe: add write-protection-fault handling
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Tony Luck, Al Viro,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, linux-kernel, tony.luck

In preparation for using memcpy_mcsafe() to handle user copies it needs
to be to handle write-protection faults while writing user pages. Add
MMU-fault handlers alongside the machine-check exception handlers.

Note that the machine check fault exception handling makes assumptions
about source buffer alignment and poison alignment. In the write fault
case, given the destination buffer is arbitrarily aligned, it needs a
separate / additional fault handling approach. The mcsafe_handle_tail()
helper is reused. The @limit argument is set to @len since there is no
safety concern about retriggering an MMU fault, and this simplifies the
assembly.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/lib/memcpy_64.S |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 97b772fcf62f..fc9c1f594c71 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -345,6 +345,16 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 	mov	%ecx, %eax
 	ret
 
+.E_write_cache_X:
+	shll	$6, %ecx
+	jmp	.E_handle_tail
+.E_write_trailing_words:
+	shll	$3, %ecx
+.E_handle_tail:
+	addl	%edx, %ecx
+	movl	%ecx, %edx
+	jmp mcsafe_handle_tail
+
 	.previous
 
 	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
@@ -358,4 +368,15 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled)
 	_ASM_EXTABLE_FAULT(.L_cache_r7, .E_cache_7)
 	_ASM_EXTABLE_FAULT(.L_read_trailing_words, .E_trailing_words)
 	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
+	_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE(.L_cache_w0, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w1, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w2, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w3, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w4, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w5, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w6, .E_write_cache_X)
+	_ASM_EXTABLE(.L_cache_w7, .E_write_cache_X)
+	_ASM_EXTABLE(.L_write_trailing_words, .E_write_trailing_words)
+	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
 #endif

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess_64.h |   11 +++++++
 include/linux/uio.h               |   10 ++++++
 lib/iov_iter.c                    |   59 +++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c064a77e8fcb..e0e2cbdf3e2b 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -47,6 +47,17 @@ copy_user_generic(void *to, const void *from, unsigned len)
 }
 
 static __always_inline __must_check unsigned long
+copy_to_user_mcsafe(void *to, const void *from, unsigned len)
+{
+	unsigned long ret;
+
+	__uaccess_begin();
+	ret = memcpy_mcsafe(to, from, len);
+	__uaccess_end();
+	return ret;
+}
+
+static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
 	int ret = 0;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index e67e12adb136..0f9923321983 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -92,6 +92,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i);
 
 size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
@@ -107,6 +108,15 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 }
 
 static __always_inline __must_check
+size_t copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (unlikely(!check_copy_size(addr, bytes, true)))
+		return 0;
+	else
+		return _copy_to_iter_mcsafe(addr, bytes, i);
+}
+
+static __always_inline __must_check
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(!check_copy_size(addr, bytes, false)))
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 970212670b6a..e1a52c49e79c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -139,6 +139,15 @@ static int copyout(void __user *to, const void *from, size_t n)
 	return n;
 }
 
+static int copyout_mcsafe(void __user *to, const void *from, size_t n)
+{
+	if (access_ok(VERIFY_WRITE, to, n)) {
+		kasan_check_read(from, n);
+		n = copy_to_user_mcsafe((__force void *) to, from, n);
+	}
+	return n;
+}
+
 static int copyin(void *to, const void __user *from, size_t n)
 {
 	if (access_ok(VERIFY_READ, from, n)) {
@@ -461,6 +470,19 @@ static void memcpy_to_page(struct page *page, size_t offset, const char *from, s
 	kunmap_atomic(to);
 }
 
+static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
+		const char *from, size_t len)
+{
+	unsigned long ret;
+	char *to;
+
+	to = kmap_atomic(page);
+	ret = memcpy_mcsafe(to + offset, from, len);
+	kunmap_atomic(to);
+
+	return ret;
+}
+
 static void memzero_page(struct page *page, size_t offset, size_t len)
 {
 	char *addr = kmap_atomic(page);
@@ -573,6 +595,43 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 }
 EXPORT_SYMBOL(_copy_to_iter);
 
+size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+{
+	const char *from = addr;
+	unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
+
+	if (unlikely(i->type & ITER_PIPE)) {
+		WARN_ON(1);
+		return 0;
+	}
+	if (iter_is_iovec(i))
+		might_fault();
+	iterate_and_advance(i, bytes, v,
+		copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		({
+		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
+                               (from += v.bv_len) - v.bv_len, v.bv_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
+		}),
+		({
+		rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len,
+				v.iov_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
+		})
+	)
+
+	return bytes;
+}
+EXPORT_SYMBOL(_copy_to_iter_mcsafe);
+
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Tony Luck, Al Viro,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, linux-kernel, tony.luck

Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/uaccess_64.h |   11 +++++++
 include/linux/uio.h               |   10 ++++++
 lib/iov_iter.c                    |   59 +++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c064a77e8fcb..e0e2cbdf3e2b 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -47,6 +47,17 @@ copy_user_generic(void *to, const void *from, unsigned len)
 }
 
 static __always_inline __must_check unsigned long
+copy_to_user_mcsafe(void *to, const void *from, unsigned len)
+{
+	unsigned long ret;
+
+	__uaccess_begin();
+	ret = memcpy_mcsafe(to, from, len);
+	__uaccess_end();
+	return ret;
+}
+
+static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
 	int ret = 0;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index e67e12adb136..0f9923321983 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -92,6 +92,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i);
 
 size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
+size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
@@ -107,6 +108,15 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 }
 
 static __always_inline __must_check
+size_t copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (unlikely(!check_copy_size(addr, bytes, true)))
+		return 0;
+	else
+		return _copy_to_iter_mcsafe(addr, bytes, i);
+}
+
+static __always_inline __must_check
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(!check_copy_size(addr, bytes, false)))
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 970212670b6a..e1a52c49e79c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -139,6 +139,15 @@ static int copyout(void __user *to, const void *from, size_t n)
 	return n;
 }
 
+static int copyout_mcsafe(void __user *to, const void *from, size_t n)
+{
+	if (access_ok(VERIFY_WRITE, to, n)) {
+		kasan_check_read(from, n);
+		n = copy_to_user_mcsafe((__force void *) to, from, n);
+	}
+	return n;
+}
+
 static int copyin(void *to, const void __user *from, size_t n)
 {
 	if (access_ok(VERIFY_READ, from, n)) {
@@ -461,6 +470,19 @@ static void memcpy_to_page(struct page *page, size_t offset, const char *from, s
 	kunmap_atomic(to);
 }
 
+static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
+		const char *from, size_t len)
+{
+	unsigned long ret;
+	char *to;
+
+	to = kmap_atomic(page);
+	ret = memcpy_mcsafe(to + offset, from, len);
+	kunmap_atomic(to);
+
+	return ret;
+}
+
 static void memzero_page(struct page *page, size_t offset, size_t len)
 {
 	char *addr = kmap_atomic(page);
@@ -573,6 +595,43 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 }
 EXPORT_SYMBOL(_copy_to_iter);
 
+size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+{
+	const char *from = addr;
+	unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
+
+	if (unlikely(i->type & ITER_PIPE)) {
+		WARN_ON(1);
+		return 0;
+	}
+	if (iter_is_iovec(i))
+		might_fault();
+	iterate_and_advance(i, bytes, v,
+		copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		({
+		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
+                               (from += v.bv_len) - v.bv_len, v.bv_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
+		}),
+		({
+		rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len,
+				v.iov_len);
+		if (rem) {
+			curr_addr = (unsigned long) from;
+			bytes = curr_addr - s_addr - rem;
+			return bytes;
+		}
+		})
+	)
+
+	return bytes;
+}
+EXPORT_SYMBOL(_copy_to_iter_mcsafe);
+
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
 	char *to = addr;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 5/6] dax: use copy_to_iter_mcsafe() in dax_iomap_actor()
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

Protect the dax read(2) path from media errors with
copy_to_iter_mcsafe(). If a machine check truncates a transfer we abort
the remainder of the transfer and communicate the number of bytes
successfully completed.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index aaec72ded1b6..e7894ab791cb 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -991,6 +991,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
 	ssize_t ret = 0;
+	size_t xfer;
 	int id;
 
 	if (iov_iter_rw(iter) == READ) {
@@ -1054,18 +1055,19 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		 * vfs_write(), depending on which operation we are doing.
 		 */
 		if (iov_iter_rw(iter) == WRITE)
-			map_len = dax_copy_from_iter(dax_dev, pgoff, kaddr,
+			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
 					map_len, iter);
 		else
-			map_len = copy_to_iter(kaddr, map_len, iter);
-		if (map_len <= 0) {
-			ret = map_len ? map_len : -EFAULT;
-			break;
-		}
+			xfer = copy_to_iter_mcsafe(kaddr, map_len, iter);
 
-		pos += map_len;
-		length -= map_len;
-		done += map_len;
+		pos += xfer;
+		length -= xfer;
+		done += xfer;
+
+		if (xfer == 0)
+			ret = -EFAULT;
+		if (xfer < map_len)
+			break;
 	}
 	dax_read_unlock(id);
 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 5/6] dax: use copy_to_iter_mcsafe() in dax_iomap_actor()
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Tony Luck, Al Viro,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, linux-kernel, tony.luck

Protect the dax read(2) path from media errors with
copy_to_iter_mcsafe(). If a machine check truncates a transfer we abort
the remainder of the transfer and communicate the number of bytes
successfully completed.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index aaec72ded1b6..e7894ab791cb 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -991,6 +991,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
 	ssize_t ret = 0;
+	size_t xfer;
 	int id;
 
 	if (iov_iter_rw(iter) == READ) {
@@ -1054,18 +1055,19 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		 * vfs_write(), depending on which operation we are doing.
 		 */
 		if (iov_iter_rw(iter) == WRITE)
-			map_len = dax_copy_from_iter(dax_dev, pgoff, kaddr,
+			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
 					map_len, iter);
 		else
-			map_len = copy_to_iter(kaddr, map_len, iter);
-		if (map_len <= 0) {
-			ret = map_len ? map_len : -EFAULT;
-			break;
-		}
+			xfer = copy_to_iter_mcsafe(kaddr, map_len, iter);
 
-		pos += map_len;
-		length -= map_len;
-		done += map_len;
+		pos += xfer;
+		length -= xfer;
+		done += xfer;
+
+		if (xfer == 0)
+			ret = -EFAULT;
+		if (xfer < map_len)
+			break;
 	}
 	dax_read_unlock(id);
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/6] x86, nfit_test: unit test for memcpy_mcsafe()
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 20:45   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: tony.luck, Peter Zijlstra, x86, linux-kernel, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

Given the fact that the ACPI "EINJ" (error injection) facility is not
universally available, implement software infrastructure to validate the
memcpy_mcsafe() exception handling implementation.

For each potential read exception point in memcpy_mcsafe(), inject a
emulated exception point at the address identified by 'mcsafe_inject'
variable. With this infrastructure implement a test to validate that the
'bytes remaining' calculation is correct for a range of various source
buffer alignments.

This code is compiled out by default. The CONFIG_MCSAFE_DEBUG
configuration symbol needs to be manually enabled by editing
Kconfig.debug. I.e. this functionality can not be accidentally enabled
by a user / distro, it's only for development.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/Kconfig.debug              |    3 ++
 arch/x86/include/asm/mcsafe_debug.h |   50 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/memcpy_64.S            |   39 ++++++++++++++++++++++-----
 tools/testing/nvdimm/test/nfit.c    |   48 ++++++++++++++++++++++++++++++++++
 4 files changed, 132 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/include/asm/mcsafe_debug.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 192e4d2f9efc..8bdec78a405f 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -72,6 +72,9 @@ config EARLY_PRINTK_USB_XDBC
 	  You should normally say N here, unless you want to debug early
 	  crashes or need a very simple printk logging facility.
 
+config MCSAFE_DEBUG
+	def_bool n
+
 config X86_PTDUMP_CORE
 	def_bool n
 
diff --git a/arch/x86/include/asm/mcsafe_debug.h b/arch/x86/include/asm/mcsafe_debug.h
new file mode 100644
index 000000000000..0f85d24b46c5
--- /dev/null
+++ b/arch/x86/include/asm/mcsafe_debug.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _MCSAFE_DEBUG_H_
+#define _MCSAFE_DEBUG_H_
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_MCSAFE_DEBUG
+extern unsigned long mcsafe_inject;
+
+static inline void set_mcsafe_inject(void *addr)
+{
+	if (addr)
+		mcsafe_inject = (unsigned long) addr;
+	else
+		mcsafe_inject = ~0UL;
+}
+#else /* CONFIG_MCSAFE_DEBUG */
+static inline void set_mcsafe_inject(void *addr)
+{
+}
+#endif /* CONFIG_MCSAFE_DEBUG */
+
+#else /* __ASSEMBLY__ */
+#include <asm/export.h>
+
+#ifdef CONFIG_MCSAFE_DEBUG
+.macro MCSAFE_DEBUG_CTL
+	.pushsection .data
+	.align 8
+	.globl mcsafe_inject
+	mcsafe_inject:
+		.quad 0
+	EXPORT_SYMBOL_GPL(mcsafe_inject)
+	.popsection
+.endm
+
+.macro MCSAFE_DEBUG offset reg count target
+	leaq \offset(\reg), %r9
+	addq \count, %r9
+	cmp mcsafe_inject, %r9
+	jg \target
+.endm
+#else
+.macro MCSAFE_DEBUG_CTL
+.endm
+
+.macro MCSAFE_DEBUG offset reg count target
+.endm
+#endif /* CONFIG_MCSAFE_DEBUG */
+#endif /* __ASSEMBLY__ */
+#endif /* _MCSAFE_DEBUG_H_ */
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index fc9c1f594c71..e47e8efe3e29 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -3,6 +3,7 @@
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/cpufeatures.h>
+#include <asm/mcsafe_debug.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
 
@@ -183,6 +184,9 @@ ENTRY(memcpy_orig)
 ENDPROC(memcpy_orig)
 
 #ifndef CONFIG_UML
+
+MCSAFE_DEBUG_CTL
+
 /*
  * memcpy_mcsafe_unrolled - memory copy with machine check exception handling
  * Note that we only catch machine checks when reading the source addresses.
@@ -205,6 +209,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 	negl %ecx
 	subl %ecx, %edx
 .L_read_leading_bytes:
+	MCSAFE_DEBUG 0 %rsi $1 .E_leading_bytes
 	movb (%rsi), %al
 .L_write_leading_bytes:
 	movb %al, (%rdi)
@@ -221,18 +226,34 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_cache_lines
 
 	/* Loop copying whole cache lines */
-.L_cache_r0: movq (%rsi), %r8
-.L_cache_r1: movq 1*8(%rsi), %r9
-.L_cache_r2: movq 2*8(%rsi), %r10
-.L_cache_r3: movq 3*8(%rsi), %r11
+.L_cache_r0:
+	MCSAFE_DEBUG 0 %rsi $8 .E_cache_0
+	movq (%rsi), %r8
+.L_cache_r1:
+	MCSAFE_DEBUG 1*8 %rsi $8 .E_cache_1
+	movq 1*8(%rsi), %r9
+.L_cache_r2:
+	MCSAFE_DEBUG 2*8 %rsi $8 .E_cache_2
+	movq 2*8(%rsi), %r10
+.L_cache_r3:
+	MCSAFE_DEBUG 3*8 %rsi $8 .E_cache_3
+	movq 3*8(%rsi), %r11
 .L_cache_w0: movq %r8, (%rdi)
 .L_cache_w1: movq %r9, 1*8(%rdi)
 .L_cache_w2: movq %r10, 2*8(%rdi)
 .L_cache_w3: movq %r11, 3*8(%rdi)
-.L_cache_r4: movq 4*8(%rsi), %r8
-.L_cache_r5: movq 5*8(%rsi), %r9
-.L_cache_r6: movq 6*8(%rsi), %r10
-.L_cache_r7: movq 7*8(%rsi), %r11
+.L_cache_r4:
+	MCSAFE_DEBUG 4*8 %rsi $8 .E_cache_4
+	movq 4*8(%rsi), %r8
+.L_cache_r5:
+	MCSAFE_DEBUG 5*8 %rsi $8 .E_cache_5
+	movq 5*8(%rsi), %r9
+.L_cache_r6:
+	MCSAFE_DEBUG 6*8 %rsi $8 .E_cache_6
+	movq 6*8(%rsi), %r10
+.L_cache_r7:
+	MCSAFE_DEBUG 7*8 %rsi $8 .E_cache_7
+	movq 7*8(%rsi), %r11
 .L_cache_w4: movq %r8, 4*8(%rdi)
 .L_cache_w5: movq %r9, 5*8(%rdi)
 .L_cache_w6: movq %r10, 6*8(%rdi)
@@ -251,6 +272,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 
 	/* Copy trailing words */
 .L_read_trailing_words:
+	MCSAFE_DEBUG 0 %rsi $8 .E_trailing_words
 	movq (%rsi), %r8
 .L_write_trailing_words:
 	mov %r8, (%rdi)
@@ -267,6 +289,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 	/* Copy trailing bytes */
 	movl %edx, %ecx
 .L_read_trailing_bytes:
+	MCSAFE_DEBUG 0 %rsi $1 .E_trailing_bytes
 	movb (%rsi), %al
 .L_write_trailing_bytes:
 	movb %al, (%rdi)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 4ea385be528f..dc039e91711e 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -29,6 +29,8 @@
 #include "nfit_test.h"
 #include "../watermark.h"
 
+#include <asm/mcsafe_debug.h>
+
 /*
  * Generate an NFIT table to describe the following topology:
  *
@@ -2681,6 +2683,51 @@ static struct platform_driver nfit_test_driver = {
 	.id_table = nfit_test_id,
 };
 
+static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
+
+void mcsafe_test(void)
+{
+	bool do_inject = false;
+	int i;
+
+	if (IS_ENABLED(CONFIG_MCSAFE_DEBUG)) {
+		pr_info("%s: run...\n", __func__);
+	} else {
+		pr_info("%s: disabled, skip.\n", __func__);
+		return;
+	}
+
+retry:
+	for (i = 0; i < 512; i++) {
+		unsigned long expect, rem;
+		void *src, *dst;
+
+		if (do_inject) {
+			set_mcsafe_inject(&mcsafe_buf[1024]);
+			expect = 512 - i;
+		} else {
+			set_mcsafe_inject(NULL);
+			expect = 0;
+		}
+
+		dst = &mcsafe_buf[2048];
+		src = &mcsafe_buf[1024 - i];
+		rem = memcpy_mcsafe_unrolled(dst, src, 512);
+		if (rem == expect)
+			continue;
+		pr_info("%s: copy(%#lx, %#lx, %d) offset: %d got: %ld expect: %ld\n",
+				__func__, ((unsigned long) dst) & ~PAGE_MASK,
+				((unsigned long ) src) & ~PAGE_MASK,
+				512, i, rem, expect);
+	}
+
+	if (!do_inject) {
+		do_inject = true;
+		goto retry;
+	}
+	set_mcsafe_inject(NULL);
+}
+
 static __init int nfit_test_init(void)
 {
 	int rc, i;
@@ -2689,6 +2736,7 @@ static __init int nfit_test_init(void)
 	libnvdimm_test();
 	acpi_nfit_test();
 	device_dax_test();
+	mcsafe_test();
 
 	nfit_test_setup(nfit_test_lookup, nfit_test_evaluate_dsm);
 

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/6] x86, nfit_test: unit test for memcpy_mcsafe()
@ 2018-05-01 20:45   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 20:45 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: x86, Ingo Molnar, Borislav Petkov, Tony Luck, Al Viro,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Tony Luck, linux-kernel, tony.luck

Given the fact that the ACPI "EINJ" (error injection) facility is not
universally available, implement software infrastructure to validate the
memcpy_mcsafe() exception handling implementation.

For each potential read exception point in memcpy_mcsafe(), inject a
emulated exception point at the address identified by 'mcsafe_inject'
variable. With this infrastructure implement a test to validate that the
'bytes remaining' calculation is correct for a range of various source
buffer alignments.

This code is compiled out by default. The CONFIG_MCSAFE_DEBUG
configuration symbol needs to be manually enabled by editing
Kconfig.debug. I.e. this functionality can not be accidentally enabled
by a user / distro, it's only for development.

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/Kconfig.debug              |    3 ++
 arch/x86/include/asm/mcsafe_debug.h |   50 +++++++++++++++++++++++++++++++++++
 arch/x86/lib/memcpy_64.S            |   39 ++++++++++++++++++++++-----
 tools/testing/nvdimm/test/nfit.c    |   48 ++++++++++++++++++++++++++++++++++
 4 files changed, 132 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/include/asm/mcsafe_debug.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 192e4d2f9efc..8bdec78a405f 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -72,6 +72,9 @@ config EARLY_PRINTK_USB_XDBC
 	  You should normally say N here, unless you want to debug early
 	  crashes or need a very simple printk logging facility.
 
+config MCSAFE_DEBUG
+	def_bool n
+
 config X86_PTDUMP_CORE
 	def_bool n
 
diff --git a/arch/x86/include/asm/mcsafe_debug.h b/arch/x86/include/asm/mcsafe_debug.h
new file mode 100644
index 000000000000..0f85d24b46c5
--- /dev/null
+++ b/arch/x86/include/asm/mcsafe_debug.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _MCSAFE_DEBUG_H_
+#define _MCSAFE_DEBUG_H_
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_MCSAFE_DEBUG
+extern unsigned long mcsafe_inject;
+
+static inline void set_mcsafe_inject(void *addr)
+{
+	if (addr)
+		mcsafe_inject = (unsigned long) addr;
+	else
+		mcsafe_inject = ~0UL;
+}
+#else /* CONFIG_MCSAFE_DEBUG */
+static inline void set_mcsafe_inject(void *addr)
+{
+}
+#endif /* CONFIG_MCSAFE_DEBUG */
+
+#else /* __ASSEMBLY__ */
+#include <asm/export.h>
+
+#ifdef CONFIG_MCSAFE_DEBUG
+.macro MCSAFE_DEBUG_CTL
+	.pushsection .data
+	.align 8
+	.globl mcsafe_inject
+	mcsafe_inject:
+		.quad 0
+	EXPORT_SYMBOL_GPL(mcsafe_inject)
+	.popsection
+.endm
+
+.macro MCSAFE_DEBUG offset reg count target
+	leaq \offset(\reg), %r9
+	addq \count, %r9
+	cmp mcsafe_inject, %r9
+	jg \target
+.endm
+#else
+.macro MCSAFE_DEBUG_CTL
+.endm
+
+.macro MCSAFE_DEBUG offset reg count target
+.endm
+#endif /* CONFIG_MCSAFE_DEBUG */
+#endif /* __ASSEMBLY__ */
+#endif /* _MCSAFE_DEBUG_H_ */
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index fc9c1f594c71..e47e8efe3e29 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -3,6 +3,7 @@
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/cpufeatures.h>
+#include <asm/mcsafe_debug.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
 
@@ -183,6 +184,9 @@ ENTRY(memcpy_orig)
 ENDPROC(memcpy_orig)
 
 #ifndef CONFIG_UML
+
+MCSAFE_DEBUG_CTL
+
 /*
  * memcpy_mcsafe_unrolled - memory copy with machine check exception handling
  * Note that we only catch machine checks when reading the source addresses.
@@ -205,6 +209,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 	negl %ecx
 	subl %ecx, %edx
 .L_read_leading_bytes:
+	MCSAFE_DEBUG 0 %rsi $1 .E_leading_bytes
 	movb (%rsi), %al
 .L_write_leading_bytes:
 	movb %al, (%rdi)
@@ -221,18 +226,34 @@ ENTRY(memcpy_mcsafe_unrolled)
 	jz .L_no_whole_cache_lines
 
 	/* Loop copying whole cache lines */
-.L_cache_r0: movq (%rsi), %r8
-.L_cache_r1: movq 1*8(%rsi), %r9
-.L_cache_r2: movq 2*8(%rsi), %r10
-.L_cache_r3: movq 3*8(%rsi), %r11
+.L_cache_r0:
+	MCSAFE_DEBUG 0 %rsi $8 .E_cache_0
+	movq (%rsi), %r8
+.L_cache_r1:
+	MCSAFE_DEBUG 1*8 %rsi $8 .E_cache_1
+	movq 1*8(%rsi), %r9
+.L_cache_r2:
+	MCSAFE_DEBUG 2*8 %rsi $8 .E_cache_2
+	movq 2*8(%rsi), %r10
+.L_cache_r3:
+	MCSAFE_DEBUG 3*8 %rsi $8 .E_cache_3
+	movq 3*8(%rsi), %r11
 .L_cache_w0: movq %r8, (%rdi)
 .L_cache_w1: movq %r9, 1*8(%rdi)
 .L_cache_w2: movq %r10, 2*8(%rdi)
 .L_cache_w3: movq %r11, 3*8(%rdi)
-.L_cache_r4: movq 4*8(%rsi), %r8
-.L_cache_r5: movq 5*8(%rsi), %r9
-.L_cache_r6: movq 6*8(%rsi), %r10
-.L_cache_r7: movq 7*8(%rsi), %r11
+.L_cache_r4:
+	MCSAFE_DEBUG 4*8 %rsi $8 .E_cache_4
+	movq 4*8(%rsi), %r8
+.L_cache_r5:
+	MCSAFE_DEBUG 5*8 %rsi $8 .E_cache_5
+	movq 5*8(%rsi), %r9
+.L_cache_r6:
+	MCSAFE_DEBUG 6*8 %rsi $8 .E_cache_6
+	movq 6*8(%rsi), %r10
+.L_cache_r7:
+	MCSAFE_DEBUG 7*8 %rsi $8 .E_cache_7
+	movq 7*8(%rsi), %r11
 .L_cache_w4: movq %r8, 4*8(%rdi)
 .L_cache_w5: movq %r9, 5*8(%rdi)
 .L_cache_w6: movq %r10, 6*8(%rdi)
@@ -251,6 +272,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 
 	/* Copy trailing words */
 .L_read_trailing_words:
+	MCSAFE_DEBUG 0 %rsi $8 .E_trailing_words
 	movq (%rsi), %r8
 .L_write_trailing_words:
 	mov %r8, (%rdi)
@@ -267,6 +289,7 @@ ENTRY(memcpy_mcsafe_unrolled)
 	/* Copy trailing bytes */
 	movl %edx, %ecx
 .L_read_trailing_bytes:
+	MCSAFE_DEBUG 0 %rsi $1 .E_trailing_bytes
 	movb (%rsi), %al
 .L_write_trailing_bytes:
 	movb %al, (%rdi)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 4ea385be528f..dc039e91711e 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -29,6 +29,8 @@
 #include "nfit_test.h"
 #include "../watermark.h"
 
+#include <asm/mcsafe_debug.h>
+
 /*
  * Generate an NFIT table to describe the following topology:
  *
@@ -2681,6 +2683,51 @@ static struct platform_driver nfit_test_driver = {
 	.id_table = nfit_test_id,
 };
 
+static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
+
+void mcsafe_test(void)
+{
+	bool do_inject = false;
+	int i;
+
+	if (IS_ENABLED(CONFIG_MCSAFE_DEBUG)) {
+		pr_info("%s: run...\n", __func__);
+	} else {
+		pr_info("%s: disabled, skip.\n", __func__);
+		return;
+	}
+
+retry:
+	for (i = 0; i < 512; i++) {
+		unsigned long expect, rem;
+		void *src, *dst;
+
+		if (do_inject) {
+			set_mcsafe_inject(&mcsafe_buf[1024]);
+			expect = 512 - i;
+		} else {
+			set_mcsafe_inject(NULL);
+			expect = 0;
+		}
+
+		dst = &mcsafe_buf[2048];
+		src = &mcsafe_buf[1024 - i];
+		rem = memcpy_mcsafe_unrolled(dst, src, 512);
+		if (rem == expect)
+			continue;
+		pr_info("%s: copy(%#lx, %#lx, %d) offset: %d got: %ld expect: %ld\n",
+				__func__, ((unsigned long) dst) & ~PAGE_MASK,
+				((unsigned long ) src) & ~PAGE_MASK,
+				512, i, rem, expect);
+	}
+
+	if (!do_inject) {
+		do_inject = true;
+		goto retry;
+	}
+	set_mcsafe_inject(NULL);
+}
+
 static __init int nfit_test_init(void)
 {
 	int rc, i;
@@ -2689,6 +2736,7 @@ static __init int nfit_test_init(void)
 	libnvdimm_test();
 	acpi_nfit_test();
 	device_dax_test();
+	mcsafe_test();
 
 	nfit_test_setup(nfit_test_lookup, nfit_test_evaluate_dsm);
 

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-01 20:45 ` Dan Williams
@ 2018-05-01 21:05   ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-01 21:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> The result of the bypass is that the kernel treats machine checks during
> read as system fatal (reboot) when they could simply be flagged as an
> I/O error, similar to performing reads through the pmem driver. Prevent
> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> path.

How about just changing the rules, and go the old "Don't do that then" way?

IOW, get rid of the whole idea that MCS errors should be fatal. It's wrong
and pointless anyway.

The while approach seems fundamentally buggered, if you ever want to mmap
one of these things. And don't you want that?

So why continue down a fundamentally broken path?

                  Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 21:05   ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-01 21:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> The result of the bypass is that the kernel treats machine checks during
> read as system fatal (reboot) when they could simply be flagged as an
> I/O error, similar to performing reads through the pmem driver. Prevent
> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> path.

How about just changing the rules, and go the old "Don't do that then" way?

IOW, get rid of the whole idea that MCS errors should be fatal. It's wrong
and pointless anyway.

The while approach seems fundamentally buggered, if you ever want to mmap
one of these things. And don't you want that?

So why continue down a fundamentally broken path?

                  Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
  2018-05-01 20:45   ` Dan Williams
@ 2018-05-01 22:17     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-05-01 22:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, kbuild-all,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Al Viro

Hi Dan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc3 next-20180501]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/use-memcpy_mcsafe-for-copy_to_iter/20180502-045742
config: i386-randconfig-s1-201817 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/iov_iter.c: In function 'copyout_mcsafe':
>> lib/iov_iter.c:146:7: error: implicit declaration of function 'copy_to_user_mcsafe' [-Werror=implicit-function-declaration]
      n = copy_to_user_mcsafe((__force void *) to, from, n);
          ^~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/copy_to_user_mcsafe +146 lib/iov_iter.c

   141	
   142	static int copyout_mcsafe(void __user *to, const void *from, size_t n)
   143	{
   144		if (access_ok(VERIFY_WRITE, to, n)) {
   145			kasan_check_read(from, n);
 > 146			n = copy_to_user_mcsafe((__force void *) to, from, n);
   147		}
   148		return n;
   149	}
   150	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
@ 2018-05-01 22:17     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-05-01 22:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, x86, Ingo Molnar, Borislav Petkov,
	Tony Luck, Al Viro, Thomas Gleixner, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

Hi Dan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc3 next-20180501]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/use-memcpy_mcsafe-for-copy_to_iter/20180502-045742
config: i386-randconfig-s1-201817 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/iov_iter.c: In function 'copyout_mcsafe':
>> lib/iov_iter.c:146:7: error: implicit declaration of function 'copy_to_user_mcsafe' [-Werror=implicit-function-declaration]
      n = copy_to_user_mcsafe((__force void *) to, from, n);
          ^~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/copy_to_user_mcsafe +146 lib/iov_iter.c

   141	
   142	static int copyout_mcsafe(void __user *to, const void *from, size_t n)
   143	{
   144		if (access_ok(VERIFY_WRITE, to, n)) {
   145			kasan_check_read(from, n);
 > 146			n = copy_to_user_mcsafe((__force void *) to, from, n);
   147		}
   148		return n;
   149	}
   150	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28794 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
  2018-05-01 20:45   ` Dan Williams
@ 2018-05-01 22:49     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-05-01 22:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, kbuild-all,
	Thomas Gleixner, Linus Torvalds, Andrew Morton, Al Viro

Hi Dan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc3 next-20180501]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/use-memcpy_mcsafe-for-copy_to_iter/20180502-045742
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/iov_iter.c: In function 'copyout_mcsafe':
>> lib/iov_iter.c:146:7: error: implicit declaration of function 'copy_to_user_mcsafe'; did you mean 'copy_to_iter_mcsafe'? [-Werror=implicit-function-declaration]
      n = copy_to_user_mcsafe((__force void *) to, from, n);
          ^~~~~~~~~~~~~~~~~~~
          copy_to_iter_mcsafe
   cc1: some warnings being treated as errors

vim +146 lib/iov_iter.c

   141	
   142	static int copyout_mcsafe(void __user *to, const void *from, size_t n)
   143	{
   144		if (access_ok(VERIFY_WRITE, to, n)) {
   145			kasan_check_read(from, n);
 > 146			n = copy_to_user_mcsafe((__force void *) to, from, n);
   147		}
   148		return n;
   149	}
   150	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
@ 2018-05-01 22:49     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-05-01 22:49 UTC (permalink / raw)
  To: Dan Williams
  Cc: kbuild-all, linux-nvdimm, x86, Ingo Molnar, Borislav Petkov,
	Tony Luck, Al Viro, Thomas Gleixner, Andy Lutomirski,
	Peter Zijlstra, Andrew Morton, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

Hi Dan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc3 next-20180501]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Williams/use-memcpy_mcsafe-for-copy_to_iter/20180502-045742
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/iov_iter.c: In function 'copyout_mcsafe':
>> lib/iov_iter.c:146:7: error: implicit declaration of function 'copy_to_user_mcsafe'; did you mean 'copy_to_iter_mcsafe'? [-Werror=implicit-function-declaration]
      n = copy_to_user_mcsafe((__force void *) to, from, n);
          ^~~~~~~~~~~~~~~~~~~
          copy_to_iter_mcsafe
   cc1: some warnings being treated as errors

vim +146 lib/iov_iter.c

   141	
   142	static int copyout_mcsafe(void __user *to, const void *from, size_t n)
   143	{
   144		if (access_ok(VERIFY_WRITE, to, n)) {
   145			kasan_check_read(from, n);
 > 146			n = copy_to_user_mcsafe((__force void *) to, from, n);
   147		}
   148		return n;
   149	}
   150	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6302 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-01 21:05   ` Linus Torvalds
@ 2018-05-01 23:02     ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 23:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> The result of the bypass is that the kernel treats machine checks during
>> read as system fatal (reboot) when they could simply be flagged as an
>> I/O error, similar to performing reads through the pmem driver. Prevent
>> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
>> path.
>
> How about just changing the rules, and go the old "Don't do that then" way?
>
> IOW, get rid of the whole idea that MCS errors should be fatal. It's wrong
> and pointless anyway.
>
> The while approach seems fundamentally buggered, if you ever want to mmap
> one of these things. And don't you want that?
>
> So why continue down a fundamentally broken path?

I'm confused. Are you talking about getting rid of the block-layer
bypass or changing how MCS errors are handled? If it's the former I've
gotten push back in the past trying to remove the bypass, but I feel
better about my chances to slay that beast wielding the +5 Hammer of
Linus. If it's the latter, MCS error handling, I don't see how get
around something like copy_to_iter_mcsafe().

You mention mmap. Yes, we want the predominant access model to be
dax-mmap for Persistent Memory, but there's still the question about
what to do with media errors. To date we are trying to mirror the
error handling model for System Memory, i.e. SIGBUS to the process
that consumed the error. Is that error handling model also problematic
in your view?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 23:02     ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 23:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> The result of the bypass is that the kernel treats machine checks during
>> read as system fatal (reboot) when they could simply be flagged as an
>> I/O error, similar to performing reads through the pmem driver. Prevent
>> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
>> path.
>
> How about just changing the rules, and go the old "Don't do that then" way?
>
> IOW, get rid of the whole idea that MCS errors should be fatal. It's wrong
> and pointless anyway.
>
> The while approach seems fundamentally buggered, if you ever want to mmap
> one of these things. And don't you want that?
>
> So why continue down a fundamentally broken path?

I'm confused. Are you talking about getting rid of the block-layer
bypass or changing how MCS errors are handled? If it's the former I've
gotten push back in the past trying to remove the bypass, but I feel
better about my chances to slay that beast wielding the +5 Hammer of
Linus. If it's the latter, MCS error handling, I don't see how get
around something like copy_to_iter_mcsafe().

You mention mmap. Yes, we want the predominant access model to be
dax-mmap for Persistent Memory, but there's still the question about
what to do with media errors. To date we are trying to mirror the
error handling model for System Memory, i.e. SIGBUS to the process
that consumed the error. Is that error handling model also problematic
in your view?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-01 23:02     ` Dan Williams
@ 2018-05-01 23:28       ` Andy Lutomirski
  -1 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2018-05-01 23:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra, X86 ML, LKML,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On Tue, May 1, 2018 at 4:02 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
> > wrote:
> >
> >> The result of the bypass is that the kernel treats machine checks
during
> >> read as system fatal (reboot) when they could simply be flagged as an
> >> I/O error, similar to performing reads through the pmem driver. Prevent
> >> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> >> path.
> >
> > How about just changing the rules, and go the old "Don't do that then"
way?
> >
> > IOW, get rid of the whole idea that MCS errors should be fatal. It's
wrong
> > and pointless anyway.
> >
> > The while approach seems fundamentally buggered, if you ever want to
mmap
> > one of these things. And don't you want that?
> >
> > So why continue down a fundamentally broken path?

> I'm confused. Are you talking about getting rid of the block-layer
> bypass or changing how MCS errors are handled? If it's the former I've
> gotten push back in the past trying to remove the bypass, but I feel
> better about my chances to slay that beast wielding the +5 Hammer of
> Linus. If it's the latter, MCS error handling, I don't see how get
> around something like copy_to_iter_mcsafe().

> You mention mmap. Yes, we want the predominant access model to be
> dax-mmap for Persistent Memory, but there's still the question about
> what to do with media errors. To date we are trying to mirror the
> error handling model for System Memory, i.e. SIGBUS to the process
> that consumed the error. Is that error handling model also problematic
> in your view?

I'm not sure exactly what you mean here, but my understanding of the status
quo is that memory errors in user code are non-fatal but that memory errors
in kernel code are fatal unless there's an appropriate extable entry.  The
old iov_iter code assumes that memcpy() on kernel addresses can't fail.
I'm not sure how else it could work.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 23:28       ` Andy Lutomirski
  0 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2018-05-01 23:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linus Torvalds, linux-nvdimm, Tony Luck, Peter Zijlstra,
	Borislav Petkov, X86 ML, Thomas Gleixner, Ingo Molnar, Al Viro,
	Andrew Morton, LKML

On Tue, May 1, 2018 at 4:02 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
> > wrote:
> >
> >> The result of the bypass is that the kernel treats machine checks
during
> >> read as system fatal (reboot) when they could simply be flagged as an
> >> I/O error, similar to performing reads through the pmem driver. Prevent
> >> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> >> path.
> >
> > How about just changing the rules, and go the old "Don't do that then"
way?
> >
> > IOW, get rid of the whole idea that MCS errors should be fatal. It's
wrong
> > and pointless anyway.
> >
> > The while approach seems fundamentally buggered, if you ever want to
mmap
> > one of these things. And don't you want that?
> >
> > So why continue down a fundamentally broken path?

> I'm confused. Are you talking about getting rid of the block-layer
> bypass or changing how MCS errors are handled? If it's the former I've
> gotten push back in the past trying to remove the bypass, but I feel
> better about my chances to slay that beast wielding the +5 Hammer of
> Linus. If it's the latter, MCS error handling, I don't see how get
> around something like copy_to_iter_mcsafe().

> You mention mmap. Yes, we want the predominant access model to be
> dax-mmap for Persistent Memory, but there's still the question about
> what to do with media errors. To date we are trying to mirror the
> error handling model for System Memory, i.e. SIGBUS to the process
> that consumed the error. Is that error handling model also problematic
> in your view?

I'm not sure exactly what you mean here, but my understanding of the status
quo is that memory errors in user code are non-fatal but that memory errors
in kernel code are fatal unless there's an appropriate extable entry.  The
old iov_iter code assumes that memcpy() on kernel addresses can't fail.
I'm not sure how else it could work.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-01 23:28       ` Andy Lutomirski
@ 2018-05-01 23:31         ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 23:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra, X86 ML, LKML,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On Tue, May 1, 2018 at 4:28 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 4:02 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
>> > wrote:
>> >
>> >> The result of the bypass is that the kernel treats machine checks
> during
>> >> read as system fatal (reboot) when they could simply be flagged as an
>> >> I/O error, similar to performing reads through the pmem driver. Prevent
>> >> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
>> >> path.
>> >
>> > How about just changing the rules, and go the old "Don't do that then"
> way?
>> >
>> > IOW, get rid of the whole idea that MCS errors should be fatal. It's
> wrong
>> > and pointless anyway.
>> >
>> > The while approach seems fundamentally buggered, if you ever want to
> mmap
>> > one of these things. And don't you want that?
>> >
>> > So why continue down a fundamentally broken path?
>
>> I'm confused. Are you talking about getting rid of the block-layer
>> bypass or changing how MCS errors are handled? If it's the former I've
>> gotten push back in the past trying to remove the bypass, but I feel
>> better about my chances to slay that beast wielding the +5 Hammer of
>> Linus. If it's the latter, MCS error handling, I don't see how get
>> around something like copy_to_iter_mcsafe().
>
>> You mention mmap. Yes, we want the predominant access model to be
>> dax-mmap for Persistent Memory, but there's still the question about
>> what to do with media errors. To date we are trying to mirror the
>> error handling model for System Memory, i.e. SIGBUS to the process
>> that consumed the error. Is that error handling model also problematic
>> in your view?
>
> I'm not sure exactly what you mean here, but my understanding of the status
> quo is that memory errors in user code are non-fatal but that memory errors
> in kernel code are fatal unless there's an appropriate extable entry.  The
> old iov_iter code assumes that memcpy() on kernel addresses can't fail.
> I'm not sure how else it could work.

Right, I'm trying to clarify the "IOW, get rid of the whole idea that
MCS errors should be fatal" comment. Especially as I am about to go
fix memory_failure() to understand that ZONE_DEVICE pages != typical
"struct page", and do the right thing with respect to un-mapping
userspace dax mapped pages.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-01 23:31         ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-01 23:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, linux-nvdimm, Tony Luck, Peter Zijlstra,
	Borislav Petkov, X86 ML, Thomas Gleixner, Ingo Molnar, Al Viro,
	Andrew Morton, LKML

On Tue, May 1, 2018 at 4:28 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 4:02 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> On Tue, May 1, 2018 at 2:05 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Tue, May 1, 2018 at 1:55 PM Dan Williams <dan.j.williams@intel.com>
>> > wrote:
>> >
>> >> The result of the bypass is that the kernel treats machine checks
> during
>> >> read as system fatal (reboot) when they could simply be flagged as an
>> >> I/O error, similar to performing reads through the pmem driver. Prevent
>> >> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
>> >> path.
>> >
>> > How about just changing the rules, and go the old "Don't do that then"
> way?
>> >
>> > IOW, get rid of the whole idea that MCS errors should be fatal. It's
> wrong
>> > and pointless anyway.
>> >
>> > The while approach seems fundamentally buggered, if you ever want to
> mmap
>> > one of these things. And don't you want that?
>> >
>> > So why continue down a fundamentally broken path?
>
>> I'm confused. Are you talking about getting rid of the block-layer
>> bypass or changing how MCS errors are handled? If it's the former I've
>> gotten push back in the past trying to remove the bypass, but I feel
>> better about my chances to slay that beast wielding the +5 Hammer of
>> Linus. If it's the latter, MCS error handling, I don't see how get
>> around something like copy_to_iter_mcsafe().
>
>> You mention mmap. Yes, we want the predominant access model to be
>> dax-mmap for Persistent Memory, but there's still the question about
>> what to do with media errors. To date we are trying to mirror the
>> error handling model for System Memory, i.e. SIGBUS to the process
>> that consumed the error. Is that error handling model also problematic
>> in your view?
>
> I'm not sure exactly what you mean here, but my understanding of the status
> quo is that memory errors in user code are non-fatal but that memory errors
> in kernel code are fatal unless there's an appropriate extable entry.  The
> old iov_iter code assumes that memcpy() on kernel addresses can't fail.
> I'm not sure how else it could work.

Right, I'm trying to clarify the "IOW, get rid of the whole idea that
MCS errors should be fatal" comment. Especially as I am about to go
fix memory_failure() to understand that ZONE_DEVICE pages != typical
"struct page", and do the right thing with respect to un-mapping
userspace dax mapped pages.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-01 23:02     ` Dan Williams
@ 2018-05-02  0:09       ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  0:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 4:03 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> I'm confused. Are you talking about getting rid of the block-layer
> bypass or changing how MCS errors are handled?

The latter.

> If it's the latter, MCS error handling, I don't see how get
> around something like copy_to_iter_mcsafe().

So the basic issue is that since everybody wants mmap() to be at least an
option (and preferably one of the _main_ options), I think that the whole
"MCS errors are fatal" is fundamentally flawed.

Which means that MCS errors can't be fatal.

Which in turn means that the whole "special memcpy" seems very suspect.

Can't we just do

  - use a normal memcpy()

  - basically set an "IO error flag" on MCE.

  - for a user access the IO error flag potentially causes a SIGBUS as you
mention, but even there it's not 100% clear that's necessarily possible or
a good idea (I'm assuming that it can be damned hard to figure out _who_
caused the problem if it was a cached write that causes an MCE much much
later).

  - for the kernel, the "IO error flag" can hopefully be then (again,
assuming you can correlate the MCE with the right process) be turned into
EIO.

> You mention mmap. Yes, we want the predominant access model to be
> dax-mmap for Persistent Memory, but there's still the question about
> what to do with media errors. To date we are trying to mirror the
> error handling model for System Memory, i.e. SIGBUS to the process
> that consumed the error. Is that error handling model also problematic
> in your view?

See above: if you can handle user space errors "gracefully" (ie with a
SIGBUS, no crazy "system fatal (reboot)" garbage), then I really don't see
why you can't do the same for the kernel accesses.

IOW, why do we need that special "copy_to_iter_mcsafe()", when a normal
"copy_to_iter()" should just work (and basically _has_ to work) anyway?

Put another way: I think the whole basic premis of your patch is wrong,
because (to quote your original patch descriptor), the fundamental starting
point is garbage:

    The result of the bypass is that the kernel treats machine checks during
    read as system fatal (reboot) [..]

See? If you are able to map that memory into user space, and recover, then
why the whole crazy "system fatal" thing for kernel accesses?

             Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  0:09       ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  0:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 4:03 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> I'm confused. Are you talking about getting rid of the block-layer
> bypass or changing how MCS errors are handled?

The latter.

> If it's the latter, MCS error handling, I don't see how get
> around something like copy_to_iter_mcsafe().

So the basic issue is that since everybody wants mmap() to be at least an
option (and preferably one of the _main_ options), I think that the whole
"MCS errors are fatal" is fundamentally flawed.

Which means that MCS errors can't be fatal.

Which in turn means that the whole "special memcpy" seems very suspect.

Can't we just do

  - use a normal memcpy()

  - basically set an "IO error flag" on MCE.

  - for a user access the IO error flag potentially causes a SIGBUS as you
mention, but even there it's not 100% clear that's necessarily possible or
a good idea (I'm assuming that it can be damned hard to figure out _who_
caused the problem if it was a cached write that causes an MCE much much
later).

  - for the kernel, the "IO error flag" can hopefully be then (again,
assuming you can correlate the MCE with the right process) be turned into
EIO.

> You mention mmap. Yes, we want the predominant access model to be
> dax-mmap for Persistent Memory, but there's still the question about
> what to do with media errors. To date we are trying to mirror the
> error handling model for System Memory, i.e. SIGBUS to the process
> that consumed the error. Is that error handling model also problematic
> in your view?

See above: if you can handle user space errors "gracefully" (ie with a
SIGBUS, no crazy "system fatal (reboot)" garbage), then I really don't see
why you can't do the same for the kernel accesses.

IOW, why do we need that special "copy_to_iter_mcsafe()", when a normal
"copy_to_iter()" should just work (and basically _has_ to work) anyway?

Put another way: I think the whole basic premis of your patch is wrong,
because (to quote your original patch descriptor), the fundamental starting
point is garbage:

    The result of the bypass is that the kernel treats machine checks during
    read as system fatal (reboot) [..]

See? If you are able to map that memory into user space, and recover, then
why the whole crazy "system fatal" thing for kernel accesses?

             Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  0:09       ` Linus Torvalds
@ 2018-05-02  2:25         ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  2:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 5:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 4:03 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> I'm confused. Are you talking about getting rid of the block-layer
>> bypass or changing how MCS errors are handled?
>
> The latter.
>
>> If it's the latter, MCS error handling, I don't see how get
>> around something like copy_to_iter_mcsafe().
>
> So the basic issue is that since everybody wants mmap() to be at least an
> option (and preferably one of the _main_ options), I think that the whole
> "MCS errors are fatal" is fundamentally flawed.
>
> Which means that MCS errors can't be fatal.
>
> Which in turn means that the whole "special memcpy" seems very suspect.
>
> Can't we just do
>
>   - use a normal memcpy()
>
>   - basically set an "IO error flag" on MCE.
>
>   - for a user access the IO error flag potentially causes a SIGBUS as you
> mention, but even there it's not 100% clear that's necessarily possible or
> a good idea (I'm assuming that it can be damned hard to figure out _who_
> caused the problem if it was a cached write that causes an MCE much much
> later).

Writes don't trigger MCE. Only consumed poison / media errors trigger
MCE. I.e. even a read-modify-write operation to write-back a partially
dirty cacheline will not trigger an MCE because the read is not
consumed by the core only the cache. We'll get notified when that
happens, but only by CMCI interrupt not an MCE exception.

>   - for the kernel, the "IO error flag" can hopefully be then (again,
> assuming you can correlate the MCE with the right process) be turned into
> EIO.

This is precisely the current implementation / usage of
memcpy_mcsafe(). Reads go through the driver and the driver does the
right / simple thing to turn an MCE into EIO. I'd like to make this
the only model and kill the driver bypass in fs/dax.c so that the vfs
does not need to contend with these low level architecture details.

To be clear I'm not against dax specific optimization that does not go
through the block layer, but it should still be a driver call.

>> You mention mmap. Yes, we want the predominant access model to be
>> dax-mmap for Persistent Memory, but there's still the question about
>> what to do with media errors. To date we are trying to mirror the
>> error handling model for System Memory, i.e. SIGBUS to the process
>> that consumed the error. Is that error handling model also problematic
>> in your view?
>
> See above: if you can handle user space errors "gracefully" (ie with a
> SIGBUS, no crazy "system fatal (reboot)" garbage), then I really don't see
> why you can't do the same for the kernel accesses.
>
> IOW, why do we need that special "copy_to_iter_mcsafe()", when a normal
> "copy_to_iter()" should just work (and basically _has_ to work) anyway?
>
> Put another way: I think the whole basic premis of your patch is wrong,
> because (to quote your original patch descriptor), the fundamental starting
> point is garbage:
>
>     The result of the bypass is that the kernel treats machine checks during
>     read as system fatal (reboot) [..]
>
> See? If you are able to map that memory into user space, and recover, then
> why the whole crazy "system fatal" thing for kernel accesses?

Right, but the only way to make MCE non-fatal is to teach the machine
check handler about recoverable conditions. This patch teaches the
machine check handler how to recover copy_to_iter() errors.

We already have copy_from_iter_flushcache() that is used as a 'struct
dax_operations' op. I can do the same for this copy_to_iter() case so
at least it's up to the driver and not the vfs (fs/dax.c) to decide
how to handle this case.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  2:25         ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  2:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 5:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 4:03 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> I'm confused. Are you talking about getting rid of the block-layer
>> bypass or changing how MCS errors are handled?
>
> The latter.
>
>> If it's the latter, MCS error handling, I don't see how get
>> around something like copy_to_iter_mcsafe().
>
> So the basic issue is that since everybody wants mmap() to be at least an
> option (and preferably one of the _main_ options), I think that the whole
> "MCS errors are fatal" is fundamentally flawed.
>
> Which means that MCS errors can't be fatal.
>
> Which in turn means that the whole "special memcpy" seems very suspect.
>
> Can't we just do
>
>   - use a normal memcpy()
>
>   - basically set an "IO error flag" on MCE.
>
>   - for a user access the IO error flag potentially causes a SIGBUS as you
> mention, but even there it's not 100% clear that's necessarily possible or
> a good idea (I'm assuming that it can be damned hard to figure out _who_
> caused the problem if it was a cached write that causes an MCE much much
> later).

Writes don't trigger MCE. Only consumed poison / media errors trigger
MCE. I.e. even a read-modify-write operation to write-back a partially
dirty cacheline will not trigger an MCE because the read is not
consumed by the core only the cache. We'll get notified when that
happens, but only by CMCI interrupt not an MCE exception.

>   - for the kernel, the "IO error flag" can hopefully be then (again,
> assuming you can correlate the MCE with the right process) be turned into
> EIO.

This is precisely the current implementation / usage of
memcpy_mcsafe(). Reads go through the driver and the driver does the
right / simple thing to turn an MCE into EIO. I'd like to make this
the only model and kill the driver bypass in fs/dax.c so that the vfs
does not need to contend with these low level architecture details.

To be clear I'm not against dax specific optimization that does not go
through the block layer, but it should still be a driver call.

>> You mention mmap. Yes, we want the predominant access model to be
>> dax-mmap for Persistent Memory, but there's still the question about
>> what to do with media errors. To date we are trying to mirror the
>> error handling model for System Memory, i.e. SIGBUS to the process
>> that consumed the error. Is that error handling model also problematic
>> in your view?
>
> See above: if you can handle user space errors "gracefully" (ie with a
> SIGBUS, no crazy "system fatal (reboot)" garbage), then I really don't see
> why you can't do the same for the kernel accesses.
>
> IOW, why do we need that special "copy_to_iter_mcsafe()", when a normal
> "copy_to_iter()" should just work (and basically _has_ to work) anyway?
>
> Put another way: I think the whole basic premis of your patch is wrong,
> because (to quote your original patch descriptor), the fundamental starting
> point is garbage:
>
>     The result of the bypass is that the kernel treats machine checks during
>     read as system fatal (reboot) [..]
>
> See? If you are able to map that memory into user space, and recover, then
> why the whole crazy "system fatal" thing for kernel accesses?

Right, but the only way to make MCE non-fatal is to teach the machine
check handler about recoverable conditions. This patch teaches the
machine check handler how to recover copy_to_iter() errors.

We already have copy_from_iter_flushcache() that is used as a 'struct
dax_operations' op. I can do the same for this copy_to_iter() case so
at least it's up to the driver and not the vfs (fs/dax.c) to decide
how to handle this case.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  2:25         ` Dan Williams
@ 2018-05-02  2:53           ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  2:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 7:26 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> Right, but the only way to make MCE non-fatal is to teach the machine
> check handler about recoverable conditions. This patch teaches the
> machine check handler how to recover copy_to_iter() errors.

Why not just unmap the page and remap a new page  in its place? Meaning
that it needs absolutely no special error handling in the callers.

IOW, treat it *exactly*  like the whole page poisoning.

We _have_ the technology. Why does this code think it's such a special
snow-flake?

              Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  2:53           ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  2:53 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 7:26 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> Right, but the only way to make MCE non-fatal is to teach the machine
> check handler about recoverable conditions. This patch teaches the
> machine check handler how to recover copy_to_iter() errors.

Why not just unmap the page and remap a new page  in its place? Meaning
that it needs absolutely no special error handling in the callers.

IOW, treat it *exactly*  like the whole page poisoning.

We _have_ the technology. Why does this code think it's such a special
snow-flake?

              Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  2:53           ` Linus Torvalds
@ 2018-05-02  3:02             ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 7:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 7:26 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> Right, but the only way to make MCE non-fatal is to teach the machine
>> check handler about recoverable conditions. This patch teaches the
>> machine check handler how to recover copy_to_iter() errors.
>
> Why not just unmap the page and remap a new page  in its place? Meaning
> that it needs absolutely no special error handling in the callers.
>
> IOW, treat it *exactly*  like the whole page poisoning.
>
> We _have_ the technology. Why does this code think it's such a special
> snow-flake?

Because dax. There's no page cache indirection games we can play here
to poison a page and map in another page. The mapped page is 1:1
associated with the filesystem block and physical memory address.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  3:02             ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 7:53 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 7:26 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> Right, but the only way to make MCE non-fatal is to teach the machine
>> check handler about recoverable conditions. This patch teaches the
>> machine check handler how to recover copy_to_iter() errors.
>
> Why not just unmap the page and remap a new page  in its place? Meaning
> that it needs absolutely no special error handling in the callers.
>
> IOW, treat it *exactly*  like the whole page poisoning.
>
> We _have_ the technology. Why does this code think it's such a special
> snow-flake?

Because dax. There's no page cache indirection games we can play here
to poison a page and map in another page. The mapped page is 1:1
associated with the filesystem block and physical memory address.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:02             ` Dan Williams
@ 2018-05-02  3:13               ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  3:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> Because dax. There's no page cache indirection games we can play here
> to poison a page and map in another page. The mapped page is 1:1
> associated with the filesystem block and physical memory address.

I'm not talking page cache indirection.

I'm talking literally mapping a different page into the kernel virtual
address space that the failing read was done for.

But you seem to be right that we don't actually support that. I'm guessing
the hwpoison code has never had to run in that kind of situation and will
just give up.

That would seem to be sad. It really feels like the obvious solution to any
MCE's - just map a dummy page at the address that causes problems.

That can have bad effects for real memory (because who knows what internal
kernel data structure might be in there), but would seem to be the
_optimal_ solution for some  random pmem access. And makes it absolutely
trivial to just return to the execution that got  the error exception.

                Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  3:13               ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  3:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> Because dax. There's no page cache indirection games we can play here
> to poison a page and map in another page. The mapped page is 1:1
> associated with the filesystem block and physical memory address.

I'm not talking page cache indirection.

I'm talking literally mapping a different page into the kernel virtual
address space that the failing read was done for.

But you seem to be right that we don't actually support that. I'm guessing
the hwpoison code has never had to run in that kind of situation and will
just give up.

That would seem to be sad. It really feels like the obvious solution to any
MCE's - just map a dummy page at the address that causes problems.

That can have bad effects for real memory (because who knows what internal
kernel data structure might be in there), but would seem to be the
_optimal_ solution for some  random pmem access. And makes it absolutely
trivial to just return to the execution that got  the error exception.

                Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:13               ` Linus Torvalds
@ 2018-05-02  3:20                 ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> Because dax. There's no page cache indirection games we can play here
>> to poison a page and map in another page. The mapped page is 1:1
>> associated with the filesystem block and physical memory address.
>
> I'm not talking page cache indirection.
>
> I'm talking literally mapping a different page into the kernel virtual
> address space that the failing read was done for.
>
> But you seem to be right that we don't actually support that. I'm guessing
> the hwpoison code has never had to run in that kind of situation and will
> just give up.
>
> That would seem to be sad. It really feels like the obvious solution to any
> MCE's - just map a dummy page at the address that causes problems.
>
> That can have bad effects for real memory (because who knows what internal
> kernel data structure might be in there), but would seem to be the
> _optimal_ solution for some  random pmem access. And makes it absolutely
> trivial to just return to the execution that got  the error exception.

The other property of pmem that we need to contend with that makes it
a snowflake relative to typical memory is that errors can be repaired
by sending a slow-path command to the DIMM device. We trap block-layer
writes in the pmem driver that target known 'badblocks' and send the
sideband command to clear the error along with the new data.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  3:20                 ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> Because dax. There's no page cache indirection games we can play here
>> to poison a page and map in another page. The mapped page is 1:1
>> associated with the filesystem block and physical memory address.
>
> I'm not talking page cache indirection.
>
> I'm talking literally mapping a different page into the kernel virtual
> address space that the failing read was done for.
>
> But you seem to be right that we don't actually support that. I'm guessing
> the hwpoison code has never had to run in that kind of situation and will
> just give up.
>
> That would seem to be sad. It really feels like the obvious solution to any
> MCE's - just map a dummy page at the address that causes problems.
>
> That can have bad effects for real memory (because who knows what internal
> kernel data structure might be in there), but would seem to be the
> _optimal_ solution for some  random pmem access. And makes it absolutely
> trivial to just return to the execution that got  the error exception.

The other property of pmem that we need to contend with that makes it
a snowflake relative to typical memory is that errors can be repaired
by sending a slow-path command to the DIMM device. We trap block-layer
writes in the pmem driver that target known 'badblocks' and send the
sideband command to clear the error along with the new data.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:20                 ` Dan Williams
@ 2018-05-02  3:22                   ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 8:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>>
>>> Because dax. There's no page cache indirection games we can play here
>>> to poison a page and map in another page. The mapped page is 1:1
>>> associated with the filesystem block and physical memory address.
>>
>> I'm not talking page cache indirection.
>>
>> I'm talking literally mapping a different page into the kernel virtual
>> address space that the failing read was done for.
>>
>> But you seem to be right that we don't actually support that. I'm guessing
>> the hwpoison code has never had to run in that kind of situation and will
>> just give up.
>>
>> That would seem to be sad. It really feels like the obvious solution to any
>> MCE's - just map a dummy page at the address that causes problems.
>>
>> That can have bad effects for real memory (because who knows what internal
>> kernel data structure might be in there), but would seem to be the
>> _optimal_ solution for some  random pmem access. And makes it absolutely
>> trivial to just return to the execution that got  the error exception.
>
> The other property of pmem that we need to contend with that makes it
> a snowflake relative to typical memory is that errors can be repaired
> by sending a slow-path command to the DIMM device. We trap block-layer
> writes in the pmem driver that target known 'badblocks' and send the
> sideband command to clear the error along with the new data.

All that to say that having a typical RAM page covering poisoned pmem
would complicate the 'clear badblocks' implementation.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  3:22                   ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  3:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 8:20 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Tue, May 1, 2018 at 8:03 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>>
>>> Because dax. There's no page cache indirection games we can play here
>>> to poison a page and map in another page. The mapped page is 1:1
>>> associated with the filesystem block and physical memory address.
>>
>> I'm not talking page cache indirection.
>>
>> I'm talking literally mapping a different page into the kernel virtual
>> address space that the failing read was done for.
>>
>> But you seem to be right that we don't actually support that. I'm guessing
>> the hwpoison code has never had to run in that kind of situation and will
>> just give up.
>>
>> That would seem to be sad. It really feels like the obvious solution to any
>> MCE's - just map a dummy page at the address that causes problems.
>>
>> That can have bad effects for real memory (because who knows what internal
>> kernel data structure might be in there), but would seem to be the
>> _optimal_ solution for some  random pmem access. And makes it absolutely
>> trivial to just return to the execution that got  the error exception.
>
> The other property of pmem that we need to contend with that makes it
> a snowflake relative to typical memory is that errors can be repaired
> by sending a slow-path command to the DIMM device. We trap block-layer
> writes in the pmem driver that target known 'badblocks' and send the
> sideband command to clear the error along with the new data.

All that to say that having a typical RAM page covering poisoned pmem
would complicate the 'clear badblocks' implementation.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:22                   ` Dan Williams
@ 2018-05-02  3:33                     ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  3:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> All that to say that having a typical RAM page covering poisoned pmem
> would complicate the 'clear badblocks' implementation.

Ugh, ok.

I guess the good news is that your patches aren't so big, and don't really
affect anything else.

But can we at least take this to be the impetus for just getting rid of
that disgusting unrolled memcpy? Ablout half of the lines in the patch set
comes from that thing.

Is anybody seriously going to use pmem with some in-order chip that can't
even get something as simple as a memory copy loop right? "git blame"
fingers Tony Luck, I think he may have been influenced by the fumes from
Itanium.

I  have some dim memory of "rep movs doesn't work well for pmem", but does
it *seriously* need unrolling to cacheline boundaries? And if it does, who
designed it, and why is anybody using it?

              Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  3:33                     ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  3:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
wrote:

> All that to say that having a typical RAM page covering poisoned pmem
> would complicate the 'clear badblocks' implementation.

Ugh, ok.

I guess the good news is that your patches aren't so big, and don't really
affect anything else.

But can we at least take this to be the impetus for just getting rid of
that disgusting unrolled memcpy? Ablout half of the lines in the patch set
comes from that thing.

Is anybody seriously going to use pmem with some in-order chip that can't
even get something as simple as a memory copy loop right? "git blame"
fingers Tony Luck, I think he may have been influenced by the fumes from
Itanium.

I  have some dim memory of "rep movs doesn't work well for pmem", but does
it *seriously* need unrolling to cacheline boundaries? And if it does, who
designed it, and why is anybody using it?

              Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:33                     ` Linus Torvalds
@ 2018-05-02  4:00                       ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  4:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 8:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> All that to say that having a typical RAM page covering poisoned pmem
>> would complicate the 'clear badblocks' implementation.
>
> Ugh, ok.
>
> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.
>
> But can we at least take this to be the impetus for just getting rid of
> that disgusting unrolled memcpy? Ablout half of the lines in the patch set
> comes from that thing.
>
> Is anybody seriously going to use pmem with some in-order chip that can't
> even get something as simple as a memory copy loop right? "git blame"
> fingers Tony Luck, I think he may have been influenced by the fumes from
> Itanium.
>
> I  have some dim memory of "rep movs doesn't work well for pmem", but does
> it *seriously* need unrolling to cacheline boundaries? And if it does, who
> designed it, and why is anybody using it?
>

I think this is an FAQ from the original submission, in fact some guy
named "Linus Torvalds" asked [1]:

---

>  - why does this use the complex - and slower, on modern machines -
> unrolled manual memory copy, when you might as well just use a single
>
>      rep ; movsb
>
>     which not only makes it smaller, but makes the exception fixup trivial.

Because current generation cpus don't give a recoverable machine
check if we consume with a "rep ; movsb" :-(
When we have that we can pick the best copy function based
on the capabilities of the cpu we are running on.

---

[1]: https://lkml.org/lkml/2016/2/18/608
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  4:00                       ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  4:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 8:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>
>> All that to say that having a typical RAM page covering poisoned pmem
>> would complicate the 'clear badblocks' implementation.
>
> Ugh, ok.
>
> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.
>
> But can we at least take this to be the impetus for just getting rid of
> that disgusting unrolled memcpy? Ablout half of the lines in the patch set
> comes from that thing.
>
> Is anybody seriously going to use pmem with some in-order chip that can't
> even get something as simple as a memory copy loop right? "git blame"
> fingers Tony Luck, I think he may have been influenced by the fumes from
> Itanium.
>
> I  have some dim memory of "rep movs doesn't work well for pmem", but does
> it *seriously* need unrolling to cacheline boundaries? And if it does, who
> designed it, and why is anybody using it?
>

I think this is an FAQ from the original submission, in fact some guy
named "Linus Torvalds" asked [1]:

---

>  - why does this use the complex - and slower, on modern machines -
> unrolled manual memory copy, when you might as well just use a single
>
>      rep ; movsb
>
>     which not only makes it smaller, but makes the exception fixup trivial.

Because current generation cpus don't give a recoverable machine
check if we consume with a "rep ; movsb" :-(
When we have that we can pick the best copy function based
on the capabilities of the cpu we are running on.

---

[1]: https://lkml.org/lkml/2016/2/18/608

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  4:00                       ` Dan Williams
@ 2018-05-02  4:14                         ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  4:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.williams@intel.com>
wrote:
> >
> > I  have some dim memory of "rep movs doesn't work well for pmem", but
does
> > it *seriously* need unrolling to cacheline boundaries? And if it does,
who
> > designed it, and why is anybody using it?
> >

> I think this is an FAQ from the original submission, in fact some guy
> named "Linus Torvalds" asked [1]:

Oh, I already mentioned that  I remembered that "rep movs" didn't work well.

But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".

Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
the unrolling that it will already do on its own?

I may have gotten that answered too, but there's no comment in the code
about why it's such a disgusting mess, so I've long since forgotten _why_
it's such a disgusting mess.

That loop unrolling _used_ to be "hey, it's simple".

Now it's "Hey, that's truly disgusting", with the separate fault handling
for every single case in the unrolled loop.

Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
labels, and getting the number rof bytes copied right.

And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
COULD GET RID OF ALL OF THOSE?"

Maybe you already did ask yourself.  But I'm asking because it sure isn't
explained in the code.

             Linus
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  4:14                         ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2018-05-02  4:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.williams@intel.com>
wrote:
> >
> > I  have some dim memory of "rep movs doesn't work well for pmem", but
does
> > it *seriously* need unrolling to cacheline boundaries? And if it does,
who
> > designed it, and why is anybody using it?
> >

> I think this is an FAQ from the original submission, in fact some guy
> named "Linus Torvalds" asked [1]:

Oh, I already mentioned that  I remembered that "rep movs" didn't work well.

But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".

Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
the unrolling that it will already do on its own?

I may have gotten that answered too, but there's no comment in the code
about why it's such a disgusting mess, so I've long since forgotten _why_
it's such a disgusting mess.

That loop unrolling _used_ to be "hey, it's simple".

Now it's "Hey, that's truly disgusting", with the separate fault handling
for every single case in the unrolled loop.

Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
labels, and getting the number rof bytes copied right.

And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
COULD GET RID OF ALL OF THOSE?"

Maybe you already did ask yourself.  But I'm asking because it sure isn't
explained in the code.

             Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  4:14                         ` Linus Torvalds
@ 2018-05-02  5:37                           ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  5:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Borislav Petkov, Al Viro,
	Thomas Gleixner, Andrew Morton

On Tue, May 1, 2018 at 9:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>> >
>> > I  have some dim memory of "rep movs doesn't work well for pmem", but
> does
>> > it *seriously* need unrolling to cacheline boundaries? And if it does,
> who
>> > designed it, and why is anybody using it?
>> >
>
>> I think this is an FAQ from the original submission, in fact some guy
>> named "Linus Torvalds" asked [1]:
>
> Oh, I already mentioned that  I remembered that "rep movs" didn't work well.
>
> But there's a big gap between "just use 'rep movs' and 'do some cacheline
> unrollong'".
>
> Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
> the unrolling that it will already do on its own?
>
> I may have gotten that answered too, but there's no comment in the code
> about why it's such a disgusting mess, so I've long since forgotten _why_
> it's such a disgusting mess.
>
> That loop unrolling _used_ to be "hey, it's simple".
>
> Now it's "Hey, that's truly disgusting", with the separate fault handling
> for every single case in the unrolled loop.
>
> Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
> labels, and getting the number rof bytes copied right.
>
> And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
> COULD GET RID OF ALL OF THOSE?"
>
> Maybe you already did ask yourself.  But I'm asking because it sure isn't
> explained in the code.

Ah, sorry. Yeah, I don't see a good reason to keep the unrolling. It
would definitely clean up the fault handling, I'll respin.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  5:37                           ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02  5:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-nvdimm, Tony Luck, Peter Zijlstra, Borislav Petkov,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 1, 2018 at 9:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 1, 2018 at 9:00 PM Dan Williams <dan.j.williams@intel.com>
> wrote:
>> >
>> > I  have some dim memory of "rep movs doesn't work well for pmem", but
> does
>> > it *seriously* need unrolling to cacheline boundaries? And if it does,
> who
>> > designed it, and why is anybody using it?
>> >
>
>> I think this is an FAQ from the original submission, in fact some guy
>> named "Linus Torvalds" asked [1]:
>
> Oh, I already mentioned that  I remembered that "rep movs" didn't work well.
>
> But there's a big gap between "just use 'rep movs' and 'do some cacheline
> unrollong'".
>
> Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
> the unrolling that it will already do on its own?
>
> I may have gotten that answered too, but there's no comment in the code
> about why it's such a disgusting mess, so I've long since forgotten _why_
> it's such a disgusting mess.
>
> That loop unrolling _used_ to be "hey, it's simple".
>
> Now it's "Hey, that's truly disgusting", with the separate fault handling
> for every single case in the unrolled loop.
>
> Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
> labels, and getting the number rof bytes copied right.
>
> And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
> COULD GET RID OF ALL OF THOSE?"
>
> Maybe you already did ask yourself.  But I'm asking because it sure isn't
> explained in the code.

Ah, sorry. Yeah, I don't see a good reason to keep the unrolling. It
would definitely clean up the fault handling, I'll respin.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  2:25         ` Dan Williams
@ 2018-05-02  8:30           ` Borislav Petkov
  -1 siblings, 0 replies; 56+ messages in thread
From: Borislav Petkov @ 2018-05-02  8:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On Tue, May 01, 2018 at 07:25:57PM -0700, Dan Williams wrote:
> Right, but the only way to make MCE non-fatal is to teach the machine
> check handler about recoverable conditions. This patch teaches the
> machine check handler how to recover copy_to_iter() errors.

Yeah, about that: maybe we talked about this at the time but does the
actual MCE signature state the error was caused by a read from an nvdimm
range?

Because if so, we could lower the severity of the error when we look at
it in the #MC handler and do some more graceful handling later...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02  8:30           ` Borislav Petkov
  0 siblings, 0 replies; 56+ messages in thread
From: Borislav Petkov @ 2018-05-02  8:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linus Torvalds, linux-nvdimm, Tony Luck, Peter Zijlstra,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Tue, May 01, 2018 at 07:25:57PM -0700, Dan Williams wrote:
> Right, but the only way to make MCE non-fatal is to teach the machine
> check handler about recoverable conditions. This patch teaches the
> machine check handler how to recover copy_to_iter() errors.

Yeah, about that: maybe we talked about this at the time but does the
actual MCE signature state the error was caused by a read from an nvdimm
range?

Because if so, we could lower the severity of the error when we look at
it in the #MC handler and do some more graceful handling later...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  8:30           ` Borislav Petkov
@ 2018-05-02 13:52             ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02 13:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Ingo Molnar, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On Wed, May 2, 2018 at 1:30 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, May 01, 2018 at 07:25:57PM -0700, Dan Williams wrote:
>> Right, but the only way to make MCE non-fatal is to teach the machine
>> check handler about recoverable conditions. This patch teaches the
>> machine check handler how to recover copy_to_iter() errors.
>
> Yeah, about that: maybe we talked about this at the time but does the
> actual MCE signature state the error was caused by a read from an nvdimm
> range?

It does not, and this routine would still need to support emulated
persistent memory, or physical address ranges that the administrator
has forced the kernel to treat as pmem that are otherwise not known to
platform firmware.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02 13:52             ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02 13:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, linux-nvdimm, Tony Luck, Peter Zijlstra,
	the arch/x86 maintainers, Thomas Gleixner, Andy Lutomirski,
	Ingo Molnar, Al Viro, Andrew Morton, Linux Kernel Mailing List

On Wed, May 2, 2018 at 1:30 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, May 01, 2018 at 07:25:57PM -0700, Dan Williams wrote:
>> Right, but the only way to make MCE non-fatal is to teach the machine
>> check handler about recoverable conditions. This patch teaches the
>> machine check handler how to recover copy_to_iter() errors.
>
> Yeah, about that: maybe we talked about this at the time but does the
> actual MCE signature state the error was caused by a read from an nvdimm
> range?

It does not, and this routine would still need to support emulated
persistent memory, or physical address ranges that the administrator
has forced the kernel to treat as pmem that are otherwise not known to
platform firmware.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02  3:33                     ` Linus Torvalds
@ 2018-05-02 16:19                       ` Andy Lutomirski
  -1 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2018-05-02 16:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tony Luck, Andrew Morton, linux-nvdimm, Peter Zijlstra, X86 ML,
	LKML, Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner

On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
<torvalds@linux-foundation.org>
wrote:

> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
> wrote:

> > All that to say that having a typical RAM page covering poisoned pmem
> > would complicate the 'clear badblocks' implementation.

> Ugh, ok.

> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.


I pondered this a bit.  Doing better might be a big pain in the arse.  The
interesting case is where ordinary kernel code (memcpy, plain old memory
operands, etc) access faulty pmem.  This means that there's no extable
entry around.  If we actually try to recover, we have a few problems:

  - We can't sanely skip the instruction without causing random errors.

  - If the access was through the kernel direct map, then we could plausibly
remap a different page in place of the faulty page.  The problem is that,
if the page is *writable* and we share it between more than one faulty
page, then we're enabling a giant information leak.  But we still need to
figure out how we're supposed to invalidate the old mapping from a random,
potentially atomic context.

  - If the access is through kmap or similar, then we're talking about
modifying a PTE out from under kernel code that really isn't expecting us
to modify it.

  - How are we supposed to signal the process or fail a syscall?  The fault
could have come from interrupt context, softirq context, kernel thread
context, etc, and figuring out who's to blame seems quite awkward and
fragile.

All that being said, I suspect that we still have issues even with accesses
to user VAs that are protected by extable entries.  The whole #MC mechanism
is a supremely shitty interface for recoverable errors (especially on
Intel), and I'm a bit scared of what happens if the offending access is,
say, inside a perf NMI.

Dan, is there any chance you could put some pressure on the architecture
folks to invent an entirely new, less shitty way to tell the OS about
recoverable memory errors?  And to make it testable by normal people?
Needing big metal EINJ hardware to test the house of cards that is #MC is
just awful and means that there are few enough kernel developers that are
actually able to test that I can probably count them on one hand.  And I'm
not one of them...
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02 16:19                       ` Andy Lutomirski
  0 siblings, 0 replies; 56+ messages in thread
From: Andy Lutomirski @ 2018-05-02 16:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, linux-nvdimm, Tony Luck, Peter Zijlstra,
	Borislav Petkov, X86 ML, Thomas Gleixner, Ingo Molnar, Al Viro,
	Andrew Morton, LKML

On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
<torvalds@linux-foundation.org>
wrote:

> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
> wrote:

> > All that to say that having a typical RAM page covering poisoned pmem
> > would complicate the 'clear badblocks' implementation.

> Ugh, ok.

> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.


I pondered this a bit.  Doing better might be a big pain in the arse.  The
interesting case is where ordinary kernel code (memcpy, plain old memory
operands, etc) access faulty pmem.  This means that there's no extable
entry around.  If we actually try to recover, we have a few problems:

  - We can't sanely skip the instruction without causing random errors.

  - If the access was through the kernel direct map, then we could plausibly
remap a different page in place of the faulty page.  The problem is that,
if the page is *writable* and we share it between more than one faulty
page, then we're enabling a giant information leak.  But we still need to
figure out how we're supposed to invalidate the old mapping from a random,
potentially atomic context.

  - If the access is through kmap or similar, then we're talking about
modifying a PTE out from under kernel code that really isn't expecting us
to modify it.

  - How are we supposed to signal the process or fail a syscall?  The fault
could have come from interrupt context, softirq context, kernel thread
context, etc, and figuring out who's to blame seems quite awkward and
fragile.

All that being said, I suspect that we still have issues even with accesses
to user VAs that are protected by extable entries.  The whole #MC mechanism
is a supremely shitty interface for recoverable errors (especially on
Intel), and I'm a bit scared of what happens if the offending access is,
say, inside a perf NMI.

Dan, is there any chance you could put some pressure on the architecture
folks to invent an entirely new, less shitty way to tell the OS about
recoverable memory errors?  And to make it testable by normal people?
Needing big metal EINJ hardware to test the house of cards that is #MC is
just awful and means that there are few enough kernel developers that are
actually able to test that I can probably count them on one hand.  And I'm
not one of them...

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
  2018-05-02 16:19                       ` Andy Lutomirski
@ 2018-05-02 17:47                         ` Dan Williams
  -1 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02 17:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tony Luck, linux-nvdimm, Peter Zijlstra, X86 ML, LKML,
	Ingo Molnar, Borislav Petkov, Al Viro, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On Wed, May 2, 2018 at 9:19 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
> <torvalds@linux-foundation.org>
> wrote:
>
>> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>
>> > All that to say that having a typical RAM page covering poisoned pmem
>> > would complicate the 'clear badblocks' implementation.
>
>> Ugh, ok.
>
>> I guess the good news is that your patches aren't so big, and don't really
>> affect anything else.
>
>
> I pondered this a bit.  Doing better might be a big pain in the arse.  The
> interesting case is where ordinary kernel code (memcpy, plain old memory
> operands, etc) access faulty pmem.  This means that there's no extable
> entry around.  If we actually try to recover, we have a few problems:
>
>   - We can't sanely skip the instruction without causing random errors.
>
>   - If the access was through the kernel direct map, then we could plausibly
> remap a different page in place of the faulty page.  The problem is that,
> if the page is *writable* and we share it between more than one faulty
> page, then we're enabling a giant information leak.  But we still need to
> figure out how we're supposed to invalidate the old mapping from a random,
> potentially atomic context.
>
>   - If the access is through kmap or similar, then we're talking about
> modifying a PTE out from under kernel code that really isn't expecting us
> to modify it.
>
>   - How are we supposed to signal the process or fail a syscall?  The fault
> could have come from interrupt context, softirq context, kernel thread
> context, etc, and figuring out who's to blame seems quite awkward and
> fragile.
>
> All that being said, I suspect that we still have issues even with accesses
> to user VAs that are protected by extable entries.  The whole #MC mechanism
> is a supremely shitty interface for recoverable errors (especially on
> Intel), and I'm a bit scared of what happens if the offending access is,
> say, inside a perf NMI.
>
> Dan, is there any chance you could put some pressure on the architecture
> folks to invent an entirely new, less shitty way to tell the OS about
> recoverable memory errors?  And to make it testable by normal people?
> Needing big metal EINJ hardware to test the house of cards that is #MC is
> just awful and means that there are few enough kernel developers that are
> actually able to test that I can probably count them on one hand.  And I'm
> not one of them...

I feel this testing pain too. The EINJ facility is not ubiquitous
which is why I punted and wrote patch 6 to unit test this. You're
right that does not scale for all the potential places we'd like to be
able to safely handle memory errors in the kernel.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
@ 2018-05-02 17:47                         ` Dan Williams
  0 siblings, 0 replies; 56+ messages in thread
From: Dan Williams @ 2018-05-02 17:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, linux-nvdimm, Tony Luck, Peter Zijlstra,
	Borislav Petkov, X86 ML, Thomas Gleixner, Ingo Molnar, Al Viro,
	Andrew Morton, LKML

On Wed, May 2, 2018 at 9:19 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
> <torvalds@linux-foundation.org>
> wrote:
>
>> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>
>> > All that to say that having a typical RAM page covering poisoned pmem
>> > would complicate the 'clear badblocks' implementation.
>
>> Ugh, ok.
>
>> I guess the good news is that your patches aren't so big, and don't really
>> affect anything else.
>
>
> I pondered this a bit.  Doing better might be a big pain in the arse.  The
> interesting case is where ordinary kernel code (memcpy, plain old memory
> operands, etc) access faulty pmem.  This means that there's no extable
> entry around.  If we actually try to recover, we have a few problems:
>
>   - We can't sanely skip the instruction without causing random errors.
>
>   - If the access was through the kernel direct map, then we could plausibly
> remap a different page in place of the faulty page.  The problem is that,
> if the page is *writable* and we share it between more than one faulty
> page, then we're enabling a giant information leak.  But we still need to
> figure out how we're supposed to invalidate the old mapping from a random,
> potentially atomic context.
>
>   - If the access is through kmap or similar, then we're talking about
> modifying a PTE out from under kernel code that really isn't expecting us
> to modify it.
>
>   - How are we supposed to signal the process or fail a syscall?  The fault
> could have come from interrupt context, softirq context, kernel thread
> context, etc, and figuring out who's to blame seems quite awkward and
> fragile.
>
> All that being said, I suspect that we still have issues even with accesses
> to user VAs that are protected by extable entries.  The whole #MC mechanism
> is a supremely shitty interface for recoverable errors (especially on
> Intel), and I'm a bit scared of what happens if the offending access is,
> say, inside a perf NMI.
>
> Dan, is there any chance you could put some pressure on the architecture
> folks to invent an entirely new, less shitty way to tell the OS about
> recoverable memory errors?  And to make it testable by normal people?
> Needing big metal EINJ hardware to test the house of cards that is #MC is
> just awful and means that there are few enough kernel developers that are
> actually able to test that I can probably count them on one hand.  And I'm
> not one of them...

I feel this testing pain too. The EINJ facility is not ubiquitous
which is why I punted and wrote patch 6 to unit test this. You're
right that does not scale for all the potential places we'd like to be
able to safely handle memory errors in the kernel.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-05-02 17:47 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 20:45 [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter() Dan Williams
2018-05-01 20:45 ` Dan Williams
2018-05-01 20:45 ` [PATCH 1/6] x86, memcpy_mcsafe: update labels in support of write fault handling Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 2/6] x86, memcpy_mcsafe: return bytes remaining Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 3/6] x86, memcpy_mcsafe: add write-protection-fault handling Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 22:17   ` kbuild test robot
2018-05-01 22:17     ` kbuild test robot
2018-05-01 22:49   ` kbuild test robot
2018-05-01 22:49     ` kbuild test robot
2018-05-01 20:45 ` [PATCH 5/6] dax: use copy_to_iter_mcsafe() in dax_iomap_actor() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 6/6] x86, nfit_test: unit test for memcpy_mcsafe() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 21:05 ` [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter() Linus Torvalds
2018-05-01 21:05   ` Linus Torvalds
2018-05-01 23:02   ` Dan Williams
2018-05-01 23:02     ` Dan Williams
2018-05-01 23:28     ` Andy Lutomirski
2018-05-01 23:28       ` Andy Lutomirski
2018-05-01 23:31       ` Dan Williams
2018-05-01 23:31         ` Dan Williams
2018-05-02  0:09     ` Linus Torvalds
2018-05-02  0:09       ` Linus Torvalds
2018-05-02  2:25       ` Dan Williams
2018-05-02  2:25         ` Dan Williams
2018-05-02  2:53         ` Linus Torvalds
2018-05-02  2:53           ` Linus Torvalds
2018-05-02  3:02           ` Dan Williams
2018-05-02  3:02             ` Dan Williams
2018-05-02  3:13             ` Linus Torvalds
2018-05-02  3:13               ` Linus Torvalds
2018-05-02  3:20               ` Dan Williams
2018-05-02  3:20                 ` Dan Williams
2018-05-02  3:22                 ` Dan Williams
2018-05-02  3:22                   ` Dan Williams
2018-05-02  3:33                   ` Linus Torvalds
2018-05-02  3:33                     ` Linus Torvalds
2018-05-02  4:00                     ` Dan Williams
2018-05-02  4:00                       ` Dan Williams
2018-05-02  4:14                       ` Linus Torvalds
2018-05-02  4:14                         ` Linus Torvalds
2018-05-02  5:37                         ` Dan Williams
2018-05-02  5:37                           ` Dan Williams
2018-05-02 16:19                     ` Andy Lutomirski
2018-05-02 16:19                       ` Andy Lutomirski
2018-05-02 17:47                       ` Dan Williams
2018-05-02 17:47                         ` Dan Williams
2018-05-02  8:30         ` Borislav Petkov
2018-05-02  8:30           ` Borislav Petkov
2018-05-02 13:52           ` Dan Williams
2018-05-02 13:52             ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.