[PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages

* [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages
@ 2018-06-03  5:22 Dan Williams
  2018-06-03  5:22 ` [PATCH v2 01/11] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Dan Williams @ 2018-06-03  5:22 UTC (permalink / raw)
  To: linux-nvdimm
  Cc: linux-edac, Tony Luck, Borislav Petkov, Jérôme Glisse,
	Jan Kara, H. Peter Anvin, x86, Thomas Gleixner,
	Christoph Hellwig, Ross Zwisler, Matthew Wilcox, Ingo Molnar,
	Michal Hocko, Naoya Horiguchi, Souptick Joarder, linux-mm,
	linux-fsdevel, jack

Changes since v1 [1]:
* Rework the locking to not use lock_page() instead use a combination of
  rcu_read_lock(), xa_lock_irq(&mapping->pages), and igrab() to validate
  that dax pages are still associated with the given mapping, and to
  prevent the address_space from being freed while memory_failure() is
  busy. (Jan)

* Fix use of MF_COUNT_INCREASED in madvise_inject_error() to account for
  the case where the injected error is a dax mapping and the pinned
  reference needs to be dropped. (Naoya)

* Clarify with a comment that VM_FAULT_NOPAGE may not always indicate a
  mapping of the storage capacity, it could also indicate the zero page.
  (Jan)

[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015932.html

---

As it stands, memory_failure() gets thoroughly confused by dev_pagemap
backed mappings. The recovery code has specific enabling for several
possible page states and needs new enabling to handle poison in dax
mappings.

In order to support reliable reverse mapping of user space addresses:

1/ Add new locking in the memory_failure() rmap path to prevent races
that would typically be handled by the page lock.

2/ Since dev_pagemap pages are hidden from the page allocator and the
"compound page" accounting machinery, add a mechanism to determine the
size of the mapping that encompasses a given poisoned pfn.

3/ Given pmem errors can be repaired, change the speculatively accessed
poison protection, mce_unmap_kpfn(), to be reversible and otherwise
allow ongoing access from the kernel.

---

Dan Williams (11):
      device-dax: Convert to vmf_insert_mixed and vm_fault_t
      device-dax: Cleanup vm_fault de-reference chains
      device-dax: Enable page_mapping()
      device-dax: Set page->index
      filesystem-dax: Set page->index
      mm, madvise_inject_error: Let memory_failure() optionally take a page reference
      x86, memory_failure: Introduce {set,clear}_mce_nospec()
      mm, memory_failure: Pass page size to kill_proc()
      mm, memory_failure: Fix page->mapping assumptions relative to the page lock
      mm, memory_failure: Teach memory_failure() about dev_pagemap pages
      libnvdimm, pmem: Restore page attributes when clearing errors

 arch/x86/include/asm/set_memory.h         |   29 ++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |   15 --
 arch/x86/kernel/cpu/mcheck/mce.c          |   38 -----
 drivers/dax/device.c                      |   97 ++++++++-----
 drivers/nvdimm/pmem.c                     |   26 ++++
 drivers/nvdimm/pmem.h                     |   13 ++
 fs/dax.c                                  |   16 ++
 include/linux/huge_mm.h                   |    5 -
 include/linux/mm.h                        |    1 
 include/linux/set_memory.h                |   14 ++
 mm/huge_memory.c                          |    4 -
 mm/madvise.c                              |   18 ++
 mm/memory-failure.c                       |  209 ++++++++++++++++++++++++++---
 13 files changed, 366 insertions(+), 119 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread