Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: nao.horiguchi@gmail.com
To: linux-mm@kvack.org
Cc: mhocko@kernel.org, akpm@linux-foundation.org,
	mike.kravetz@oracle.com, osalvador@suse.de, tony.luck@intel.com,
	david@redhat.com, aneesh.kumar@linux.vnet.ibm.com,
	zeil@yandex-team.ru, cai@lca.pw, naoya.horiguchi@nec.com,
	linux-kernel@vger.kernel.org
Subject: [PATCH v5 00/16] HWPOISON: soft offline rework
Date: Fri, 31 Jul 2020 12:20:56 +0000
Message-ID: <20200731122112.11263-1-nao.horiguchi@gmail.com> (raw)

This patchset is the latest version of soft offline rework patchset
targetted for v5.9.

Main focus of this series is to stabilize soft offline.  Historically soft
offlined pages have suffered from racy conditions because PageHWPoison is
used to a little too aggressively, which (directly or indirectly) invades
other mm code which cares little about hwpoison.  This results in unexpected
behavior or kernel panic, which is very far from soft offline's "do not
disturb userspace or other kernel component" policy.

Main point of this change set is to contain target page "via buddy allocator",
where we first free the target page as we do for normal pages, and remove
from buddy only when we confirm that it reaches free list. There is surely
race window of page allocation, but that's fine because someone really want
that page and the page is still working, so soft offline can happily give up.

v4 from Oscar tries to handle the race around reallocation, but that part
seems still work in progress, so I decide to separate it for changes into
v5.9.  Thank you for your contribution, Oscar.

The issue reported by Qian Cai is fixed by patch 16/16.

This patchset is based on v5.8-rc7-mmotm-2020-07-27-18-18, but I applied
this series after reverting previous version.
Maybe https://github.com/Naoya-Horiguchi/linux/commits/soft-offline-rework.v5
shows what I did more precisely.

Any other comment/suggestion/help would be appreciated.

Thanks,
Naoya Horiguchi
---
Previous versions:
  v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
  v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
  v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/
  v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/
---
Summary:

Naoya Horiguchi (8):
      mm,hwpoison: cleanup unused PageHuge() check
      mm, hwpoison: remove recalculating hpage
      mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
      mm,hwpoison-inject: don't pin for hwpoison_filter
      mm,hwpoison: remove MF_COUNT_INCREASED
      mm,hwpoison: remove flag argument from soft offline functions
      mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
      mm,hwpoison: double-check page count in __get_any_page()

Oscar Salvador (8):
      mm,madvise: Refactor madvise_inject_error
      mm,hwpoison: Un-export get_hwpoison_page and make it static
      mm,hwpoison: Kill put_hwpoison_page
      mm,hwpoison: Unify THP handling for hard and soft offline
      mm,hwpoison: Rework soft offline for free pages
      mm,hwpoison: Rework soft offline for in-use pages
      mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
      mm,hwpoison: Return 0 if the page is already poisoned in soft-offline

 drivers/base/memory.c      |   2 +-
 include/linux/mm.h         |  12 +-
 include/linux/page-flags.h |   6 +-
 include/ras/ras_event.h    |   3 +
 mm/hwpoison-inject.c       |  18 +--
 mm/madvise.c               |  39 +++---
 mm/memory-failure.c        | 334 ++++++++++++++++++++-------------------------
 mm/migrate.c               |  11 +-
 mm/page_alloc.c            |  60 ++++++--
 9 files changed, 233 insertions(+), 252 deletions(-)


             reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 12:20 nao.horiguchi [this message]
2020-07-31 12:20 ` [PATCH v5 01/16] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 02/16] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 03/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 04/16] mm,madvise: Refactor madvise_inject_error nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 05/16] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 06/16] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 07/16] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 08/16] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 09/16] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 10/16] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 11/16] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 12/16] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 13/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 15/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
     [not found] ` <20200803123954.GA4631@lca.pw>
2020-08-03 13:36   ` [PATCH v5 00/16] HWPOISON: soft offline rework HORIGUCHI NAOYA(堀口 直也)
     [not found]     ` <20200803151907.GA8894@lca.pw>
2020-08-05 20:43       ` HORIGUCHI NAOYA(堀口 直也)
     [not found] ` <20200803190709.GB8894@lca.pw>
2020-08-04  1:16   ` HORIGUCHI NAOYA(堀口 直也)
2020-08-04  1:49     ` Qian Cai
2020-08-04  8:13       ` osalvador
2020-08-05 20:44       ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200731122112.11263-1-nao.horiguchi@gmail.com \
    --to=nao.horiguchi@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=zeil@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git