Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Qian Cai <cai@lca.pw>
Cc: "nao.horiguchi@gmail.com" <nao.horiguchi@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"osalvador@suse.de" <osalvador@suse.de>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"aneesh.kumar@linux.vnet.ibm.com"
	<aneesh.kumar@linux.vnet.ibm.com>,
	"zeil@yandex-team.ru" <zeil@yandex-team.ru>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 00/16] HWPOISON: soft offline rework
Date: Tue, 4 Aug 2020 01:16:45 +0000
Message-ID: <20200804011644.GA25028@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20200803190709.GB8894@lca.pw>

On Mon, Aug 03, 2020 at 03:07:09PM -0400, Qian Cai wrote:
> On Fri, Jul 31, 2020 at 12:20:56PM +0000, nao.horiguchi@gmail.com wrote:
> > This patchset is the latest version of soft offline rework patchset
> > targetted for v5.9.
> > 
> > Main focus of this series is to stabilize soft offline.  Historically soft
> > offlined pages have suffered from racy conditions because PageHWPoison is
> > used to a little too aggressively, which (directly or indirectly) invades
> > other mm code which cares little about hwpoison.  This results in unexpected
> > behavior or kernel panic, which is very far from soft offline's "do not
> > disturb userspace or other kernel component" policy.
> > 
> > Main point of this change set is to contain target page "via buddy allocator",
> > where we first free the target page as we do for normal pages, and remove
> > from buddy only when we confirm that it reaches free list. There is surely
> > race window of page allocation, but that's fine because someone really want
> > that page and the page is still working, so soft offline can happily give up.
> > 
> > v4 from Oscar tries to handle the race around reallocation, but that part
> > seems still work in progress, so I decide to separate it for changes into
> > v5.9.  Thank you for your contribution, Oscar.
> > 
> > The issue reported by Qian Cai is fixed by patch 16/16.
> > 
> > This patchset is based on v5.8-rc7-mmotm-2020-07-27-18-18, but I applied
> > this series after reverting previous version.
> > Maybe https://github.com/Naoya-Horiguchi/linux/commits/soft-offline-rework.v5
> > shows what I did more precisely.
> > 
> > Any other comment/suggestion/help would be appreciated.
> 
> There is another issue with this patchset (with and without the patch [1]).
> 
> [1] https://lore.kernel.org/lkml/20200803133657.GA13307@hori.linux.bs1.fc.nec.co.jp/
> 
> Arm64 using 512M-size hugepages starts to fail allocations prematurely.
> 
> # ./random 1
> - start: migrate_huge_offline
> - use NUMA nodes 0,1.
> - mmap and free 2147483648 bytes hugepages on node 0
> - mmap and free 2147483648 bytes hugepages on node 1
> madvise: Cannot allocate memory
> 
> [  284.388061][ T3706] soft offline: 0x956000: hugepage isolation failed: 0, page count 2, type 17ffff80001000e (referenced|uptodate|dirty|head)
> [  284.400777][ T3706] Soft offlining pfn 0x8e000 at process virtual address 0xffff80000000
> [  284.893412][ T3706] Soft offlining pfn 0x8a000 at process virtual address 0xffff60000000
> [  284.901539][ T3706] soft offline: 0x8a000: hugepage isolation failed: 0, page count 2, type 7ffff80001000e (referenced|uptodate|dirty|head)
> [  284.914129][ T3706] Soft offlining pfn 0x8c000 at process virtual address 0xffff80000000
> [  285.433497][ T3706] Soft offlining pfn 0x88000 at process virtual address 0xffff60000000
> [  285.720377][ T3706] Soft offlining pfn 0x8a000 at process virtual address 0xffff80000000
> [  286.281620][ T3706] Soft offlining pfn 0xa000 at process virtual address 0xffff60000000
> [  286.290065][ T3706] soft offline: 0xa000: hugepage migration failed -12, type 7ffff80001000e (referenced|uptodate|dirty|head)

I think that this is due to the lack of contiguous memory.
This test program iterates soft offlining many times for hugepages,
so finally one page in every 512MB will be removed from buddy, then we
can't allocate hugepage any more even if we have enough free pages.
This is not good for heavy hugepage users, but that should be intended.

It seems that random.c calls madvise(MADV_SOFT_OFFLINE) for 2 hugepages,
and iterates it 1000 (==NR_LOOP) times, so if the system doesn't have
enough memory to cover the range of 2000 hugepages (1000GB in the Arm64
system), this ENOMEM should reproduce as expected.

> 
> Reverting this patchset and its dependency patchset [2] (reverting the
> dependency alone did not help) fixed it,

But it's still not clear to me why this was not visible before this
patchset, so I need more check for it.

Thanks,
Naoya Horiguchi

> 
> # ./random 1
> - start: migrate_huge_offline
> - use NUMA nodes 0,1.
> - mmap and free 2147483648 bytes hugepages on node 0
> - mmap and free 2147483648 bytes hugepages on node 1
> - pass: mmap_offline_node_huge
> 
> [2] https://lore.kernel.org/linux-mm/1594622517-20681-1-git-send-email-iamjoonsoo.kim@lge.com/ 
> 
> > 
> > Thanks,
> > Naoya Horiguchi
> > ---
> > Previous versions:
> >   v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
> >   v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
> >   v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/
> >   v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/
> > ---
> > Summary:
> > 
> > Naoya Horiguchi (8):
> >       mm,hwpoison: cleanup unused PageHuge() check
> >       mm, hwpoison: remove recalculating hpage
> >       mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> >       mm,hwpoison-inject: don't pin for hwpoison_filter
> >       mm,hwpoison: remove MF_COUNT_INCREASED
> >       mm,hwpoison: remove flag argument from soft offline functions
> >       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> >       mm,hwpoison: double-check page count in __get_any_page()
> > 
> > Oscar Salvador (8):
> >       mm,madvise: Refactor madvise_inject_error
> >       mm,hwpoison: Un-export get_hwpoison_page and make it static
> >       mm,hwpoison: Kill put_hwpoison_page
> >       mm,hwpoison: Unify THP handling for hard and soft offline
> >       mm,hwpoison: Rework soft offline for free pages
> >       mm,hwpoison: Rework soft offline for in-use pages
> >       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
> >       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> > 
> >  drivers/base/memory.c      |   2 +-
> >  include/linux/mm.h         |  12 +-
> >  include/linux/page-flags.h |   6 +-
> >  include/ras/ras_event.h    |   3 +
> >  mm/hwpoison-inject.c       |  18 +--
> >  mm/madvise.c               |  39 +++---
> >  mm/memory-failure.c        | 334 ++++++++++++++++++++-------------------------
> >  mm/migrate.c               |  11 +-
> >  mm/page_alloc.c            |  60 ++++++--
> >  9 files changed, 233 insertions(+), 252 deletions(-)
> 

  parent reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 12:20 nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 01/16] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 02/16] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-07-31 12:20 ` [PATCH v5 03/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 04/16] mm,madvise: Refactor madvise_inject_error nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 05/16] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 06/16] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 07/16] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 08/16] mm,hwpoison: remove MF_COUNT_INCREASED nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 09/16] mm,hwpoison: remove flag argument from soft offline functions nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 10/16] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 11/16] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 12/16] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 13/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-07-31 12:21 ` [PATCH v5 15/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
     [not found] ` <20200803123954.GA4631@lca.pw>
2020-08-03 13:36   ` [PATCH v5 00/16] HWPOISON: soft offline rework HORIGUCHI NAOYA(堀口 直也)
     [not found]     ` <20200803151907.GA8894@lca.pw>
2020-08-05 20:43       ` HORIGUCHI NAOYA(堀口 直也)
     [not found] ` <20200803190709.GB8894@lca.pw>
2020-08-04  1:16   ` HORIGUCHI NAOYA(堀口 直也) [this message]
2020-08-04  1:49     ` Qian Cai
2020-08-04  8:13       ` osalvador
2020-08-05 20:44       ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804011644.GA25028@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=zeil@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git