linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm@lists.01.org, linux-edac@vger.kernel.org,
	"Tony Luck" <tony.luck@intel.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Jérôme Glisse" <jglisse@redhat.com>, "Jan Kara" <jack@suse.cz>,
	"H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, "Thomas Gleixner" <tglx@linutronix.de>,
	"Christoph Hellwig" <hch@lst.de>,
	"Ross Zwisler" <ross.zwisler@linux.intel.com>,
	"Matthew Wilcox" <mawilcox@microsoft.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Naoya Horiguchi" <n-horiguchi@ah.jp.nec.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages
Date: Mon, 4 Jun 2018 14:40:31 +0200	[thread overview]
Message-ID: <20180604124031.GP19202@dhcp22.suse.cz> (raw)
In-Reply-To: <152800336321.17112.3300876636370683279.stgit@dwillia2-desk3.amr.corp.intel.com>

On Sat 02-06-18 22:22:43, Dan Williams wrote:
> Changes since v1 [1]:
> * Rework the locking to not use lock_page() instead use a combination of
>   rcu_read_lock(), xa_lock_irq(&mapping->pages), and igrab() to validate
>   that dax pages are still associated with the given mapping, and to
>   prevent the address_space from being freed while memory_failure() is
>   busy. (Jan)
> 
> * Fix use of MF_COUNT_INCREASED in madvise_inject_error() to account for
>   the case where the injected error is a dax mapping and the pinned
>   reference needs to be dropped. (Naoya)
> 
> * Clarify with a comment that VM_FAULT_NOPAGE may not always indicate a
>   mapping of the storage capacity, it could also indicate the zero page.
>   (Jan)
> 
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015932.html
> 
> ---
> 
> As it stands, memory_failure() gets thoroughly confused by dev_pagemap
> backed mappings. The recovery code has specific enabling for several
> possible page states and needs new enabling to handle poison in dax
> mappings.
> 
> In order to support reliable reverse mapping of user space addresses:
> 
> 1/ Add new locking in the memory_failure() rmap path to prevent races
> that would typically be handled by the page lock.
> 
> 2/ Since dev_pagemap pages are hidden from the page allocator and the
> "compound page" accounting machinery, add a mechanism to determine the
> size of the mapping that encompasses a given poisoned pfn.
> 
> 3/ Given pmem errors can be repaired, change the speculatively accessed
> poison protection, mce_unmap_kpfn(), to be reversible and otherwise
> allow ongoing access from the kernel.

This doesn't really describe the problem you are trying to solve and why
do you believe that HWPoison is the best way to handle it. As things
stand HWPoison is rather ad-hoc and I am not sure adding more to it is
really great without some deep reconsidering how the whole thing is done
right now IMHO. Are you actually trying to solve some real world problem
or you merely want to make soft offlining work properly?

> ---
> 
> Dan Williams (11):
>       device-dax: Convert to vmf_insert_mixed and vm_fault_t
>       device-dax: Cleanup vm_fault de-reference chains
>       device-dax: Enable page_mapping()
>       device-dax: Set page->index
>       filesystem-dax: Set page->index
>       mm, madvise_inject_error: Let memory_failure() optionally take a page reference
>       x86, memory_failure: Introduce {set,clear}_mce_nospec()
>       mm, memory_failure: Pass page size to kill_proc()
>       mm, memory_failure: Fix page->mapping assumptions relative to the page lock
>       mm, memory_failure: Teach memory_failure() about dev_pagemap pages
>       libnvdimm, pmem: Restore page attributes when clearing errors
> 
> 
>  arch/x86/include/asm/set_memory.h         |   29 ++++
>  arch/x86/kernel/cpu/mcheck/mce-internal.h |   15 --
>  arch/x86/kernel/cpu/mcheck/mce.c          |   38 -----
>  drivers/dax/device.c                      |   97 ++++++++-----
>  drivers/nvdimm/pmem.c                     |   26 ++++
>  drivers/nvdimm/pmem.h                     |   13 ++
>  fs/dax.c                                  |   16 ++
>  include/linux/huge_mm.h                   |    5 -
>  include/linux/mm.h                        |    1 
>  include/linux/set_memory.h                |   14 ++
>  mm/huge_memory.c                          |    4 -
>  mm/madvise.c                              |   18 ++
>  mm/memory-failure.c                       |  209 ++++++++++++++++++++++++++---
>  13 files changed, 366 insertions(+), 119 deletions(-)

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2018-06-04 12:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-03  5:22 [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-03  5:22 ` [PATCH v2 01/11] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-06-03  5:22 ` [PATCH v2 02/11] device-dax: Cleanup vm_fault de-reference chains Dan Williams
2018-06-03  5:22 ` [PATCH v2 03/11] device-dax: Enable page_mapping() Dan Williams
2018-06-03  5:23 ` [PATCH v2 04/11] device-dax: Set page->index Dan Williams
2018-06-03  5:23 ` [PATCH v2 05/11] filesystem-dax: " Dan Williams
2018-06-03  5:23 ` [PATCH v2 06/11] mm, madvise_inject_error: Let memory_failure() optionally take a page reference Dan Williams
2018-06-03  5:23 ` [PATCH v2 07/11] x86, memory_failure: Introduce {set, clear}_mce_nospec() Dan Williams
2018-06-04 17:08   ` Luck, Tony
2018-06-04 17:39     ` Dan Williams
2018-06-04 18:08       ` Luck, Tony
2018-06-04 18:35         ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 08/11] mm, memory_failure: Pass page size to kill_proc() Dan Williams
2018-06-03  5:23 ` [PATCH v2 09/11] mm, memory_failure: Fix page->mapping assumptions relative to the page lock Dan Williams
2018-06-03  5:23 ` [PATCH v2 10/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages Dan Williams
2018-06-03  5:23 ` [PATCH v2 11/11] libnvdimm, pmem: Restore page attributes when clearing errors Dan Williams
2018-06-04 12:40 ` Michal Hocko [this message]
2018-06-04 14:31   ` [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-05 14:11     ` Michal Hocko
2018-06-05 14:33       ` Dan Williams
2018-06-06  7:39         ` Michal Hocko
2018-06-06 13:44           ` Dan Williams
2018-06-07 14:37             ` Michal Hocko
2018-06-07 16:52               ` Dan Williams
2018-06-11  7:50                 ` Michal Hocko
2018-06-11 14:44                   ` Dan Williams
2018-06-11 14:56                     ` Michal Hocko
2018-06-11 15:19                       ` Dan Williams
2018-06-11 17:35                         ` Andi Kleen
2018-06-12  1:50                         ` Naoya Horiguchi
2018-06-12  1:58                           ` Dan Williams
2018-06-12  4:04                           ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180604124031.GP19202@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mawilcox@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).