linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/12] HWPOISON: soft offline rework
@ 2020-08-06 18:49 nao.horiguchi
  2020-08-06 18:49 ` [PATCH v6 01/12] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: nao.horiguchi @ 2020-08-06 18:49 UTC (permalink / raw)
  To: linux-mm
  Cc: mhocko, akpm, mike.kravetz, osalvador, tony.luck, david,
	aneesh.kumar, zeil, cai, naoya.horiguchi, linux-kernel

Hi,

This patchset is the latest version of soft offline rework patchset
targetted for v5.9.

Since v5, I dropped some patches which tweak refcount handling in
madvise_inject_error() to avoid the "unknown refcount page" error.
I don't confirm the fix (that didn't reproduce with v5 in my environment),
but this change surely call soft_offline_page() after holding refcount,
so the error should not happen any more.

Dropped patches
- mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
- mm,madvise: Refactor madvise_inject_error
- mm,hwpoison: remove MF_COUNT_INCREASED
- mm,hwpoison: remove flag argument from soft offline functions

Thanks,
Naoya Horiguchi

Quoting cover letter of v5:
----
Main focus of this series is to stabilize soft offline.  Historically soft
offlined pages have suffered from racy conditions because PageHWPoison is
used to a little too aggressively, which (directly or indirectly) invades
other mm code which cares little about hwpoison.  This results in unexpected
behavior or kernel panic, which is very far from soft offline's "do not
disturb userspace or other kernel component" policy.

Main point of this change set is to contain target page "via buddy allocator",
where we first free the target page as we do for normal pages, and remove
from buddy only when we confirm that it reaches free list. There is surely
race window of page allocation, but that's fine because someone really want
that page and the page is still working, so soft offline can happily give up.

v4 from Oscar tries to handle the race around reallocation, but that part
seems still work in progress, so I decide to separate it for changes into
v5.9.  Thank you for your contribution, Oscar.

---
Previous versions:
  v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
  v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
  v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/
  v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/
  v5: https://lore.kernel.org/linux-mm/20200805204354.GA16406@hori.linux.bs1.fc.nec.co.jp/T/#t
---
Summary:

Naoya Horiguchi (5):
      mm,hwpoison: cleanup unused PageHuge() check
      mm, hwpoison: remove recalculating hpage
      mm,hwpoison-inject: don't pin for hwpoison_filter
      mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
      mm,hwpoison: double-check page count in __get_any_page()

Oscar Salvador (7):
      mm,hwpoison: Un-export get_hwpoison_page and make it static
      mm,hwpoison: Kill put_hwpoison_page
      mm,hwpoison: Unify THP handling for hard and soft offline
      mm,hwpoison: Rework soft offline for free pages
      mm,hwpoison: Rework soft offline for in-use pages
      mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
      mm,hwpoison: Return 0 if the page is already poisoned in soft-offline

 include/linux/mm.h         |   3 +-
 include/linux/page-flags.h |   6 +-
 include/ras/ras_event.h    |   3 +
 mm/hwpoison-inject.c       |  18 +--
 mm/madvise.c               |   5 -
 mm/memory-failure.c        | 307 +++++++++++++++++++++------------------------
 mm/migrate.c               |  11 +-
 mm/page_alloc.c            |  60 +++++++--
 8 files changed, 203 insertions(+), 210 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-09-19  8:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 18:49 [PATCH v6 00/12] HWPOISON: soft offline rework nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 01/12] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 02/12] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 03/12] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 04/12] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 05/12] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 06/12] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 07/12] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 08/12] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-09-18  7:58   ` osalvador
2020-09-19  0:23     ` Andrew Morton
2020-09-19  8:26       ` osalvador
2020-08-06 18:49 ` [PATCH v6 09/12] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 10/12] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 11/12] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 12/12] mm,hwpoison: double-check page count in __get_any_page() nao.horiguchi
2020-08-24 12:21   ` Oscar Salvador
2020-08-10 15:22 ` [PATCH v6 00/12] HWPOISON: soft offline rework Qian Cai
2020-08-11  3:11   ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11  3:45     ` Qian Cai
2020-08-11  3:56       ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11 17:39     ` Qian Cai
2020-08-11 19:32       ` Naoya Horiguchi
2020-08-11 22:06         ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).