Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>,
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages
Date: Wed, 23 Oct 2019 02:01:33 +0000
Message-ID: <20191023020133.GA24383@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20191022095852.GB20429@linux>

On Tue, Oct 22, 2019 at 11:58:52AM +0200, Oscar Salvador wrote:
> On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote:
> > Hmm, that might be a misunderstanding on my end. I thought that it is
> > the MCE handler to say whether the failure is recoverable or not. If yes
> > then we can touch the content of the memory (that would imply the
> > migration). Other than that both paths should be essentially the same,
> > no? Well unrecoverable case would be essentially force migration failure
> > path.
> > 
> > MADV_HWPOISON is explicitly documented to test MCE handling IIUC:
> > : This feature is intended for testing of memory error-handling
> > : code; it is available only if the kernel was configured with
> > : CONFIG_MEMORY_FAILURE.
> > 
> > There is no explicit note about the type of the error that is injected
> > but I think it is reasonably safe to assume this is a recoverable one.
> 
> MADV_HWPOISON stands for hard-offline.
> MADV_SOFT_OFFLINE stands for soft-offline.

Maybe MADV_HWPOISON should've be named like MADV_HARD_OFFLINE, although
it's API and hard to change once implemented.

> 
> MADV_SOFT_OFFLINE (since Linux 2.6.33)
>               Soft offline the pages in the range specified by addr and
>               length.  The memory of each page in the specified range is
>               preserved (i.e., when next accessed, the same content will be
>               visible, but in a new physical page frame), and the original
>               page is offlined (i.e., no longer used, and taken out of
>               normal memory management).  The effect of the
>               MADV_SOFT_OFFLINE operation is invisible to (i.e., does not
>               change the semantics of) the calling process.
> 
>               This feature is intended for testing of memory error-handling
>               code;

Although this expression might not clear enough, madvise(MADV_HWPOISON or
MADV_SOFT_OFFLINE) only covers memory error handling part, not MCE handling
part.  We have some other injection methods in the lower layers like
mce-inject and APEI.

> it is available only if the kernel was configured with
>               CONFIG_MEMORY_FAILURE.
> 
> 
> But both follow different approaches.
> 
> I think it is up to some controlers to trigger soft-offline or hard-offline:

Yes, I think so.  One usecase of soft offline is triggered by CMCI interrupt
in Intel CPU.  CMCI handler stores corrected error events in /dev/mcelog.
mcelogd polls on this device file and if corrected errors occur often enough
(IIRC the default threshold is "10 events in 24 hours",) mcelogd triggers
soft-offline via soft_offline_page under /sys.

OTOH, hard-offline is triggered directly (accurately over ring buffer to separate
context) from MCE handler.  mcelogd logs MCE events but does not involve in
page offline logic.

> 
> static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
> {
> #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
> 	...
>         /* iff following two events can be handled properly by now */
>         if (sec_sev == GHES_SEV_CORRECTED &&
>             (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>                 flags = MF_SOFT_OFFLINE;
>         if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>                 flags = 0;
> 
>         if (flags != -1)
>                 memory_failure_queue(pfn, flags);
> 	...
> #endif
>  }
> 
> 
> static void memory_failure_work_func(struct work_struct *work)
> {
> 	...
> 	for (;;) {
> 		spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> 		gotten = kfifo_get(&mf_cpu->fifo, &entry);
> 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> 		if (!gotten)
> 			break;
> 		if (entry.flags & MF_SOFT_OFFLINE)
> 			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
> 		else
> 			memory_failure(entry.pfn, entry.flags);
> 	}
>  }
> 
> AFAICS, for hard-offline case, a recovered event would be if:
> 
> - the page to shut down is already free
> - the page was unmapped
> 
> In some cases we need to kill the process if it holds dirty pages.

One caveat is that even if the process maps dirty error pages, we
don't have to kill it unless the error data is consumed.

Thanks,
Naoya Horiguchi


  parent reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 14:21 [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 01/16] mm,hwpoison: cleanup unused PageHuge() check Oscar Salvador
2019-10-18 11:48   ` Michal Hocko
2019-10-21  7:00     ` Naoya Horiguchi
2019-10-21 12:16       ` Michal Hocko
2019-11-12 12:22       ` Aneesh Kumar K.V
2019-11-13  6:02         ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 02/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED Oscar Salvador
2019-10-18 11:52   ` Michal Hocko
2019-10-21  7:02     ` Naoya Horiguchi
2019-10-21 12:20       ` Michal Hocko
2019-10-17 14:21 ` [RFC PATCH v2 03/16] mm,madvise: Refactor madvise_inject_error Oscar Salvador
2019-10-21  7:03   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 04/16] mm,hwpoison-inject: don't pin for hwpoison_filter Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 05/16] mm,hwpoison: Un-export get_hwpoison_page and make it static Oscar Salvador
2019-10-21  7:03   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 06/16] mm,hwpoison: Kill put_hwpoison_page Oscar Salvador
2019-10-21  7:04   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 07/16] mm,hwpoison: remove MF_COUNT_INCREASED Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 08/16] mm,hwpoison: remove flag argument from soft offline functions Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 09/16] mm,hwpoison: Unify THP handling for hard and soft offline Oscar Salvador
2019-10-21  7:04   ` Naoya Horiguchi
2019-10-21  9:51     ` [PATCH 17/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP Naoya Horiguchi
2019-10-22  8:00       ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Oscar Salvador
2019-10-18 12:06   ` Michal Hocko
2019-10-21 12:58     ` Oscar Salvador
2019-10-21 15:41       ` Michal Hocko
2019-10-22  7:46         ` Oscar Salvador
2019-10-22  8:26           ` Michal Hocko
2019-10-22  8:35             ` Oscar Salvador
2019-10-22  9:22               ` Michal Hocko
2019-10-22  9:58                 ` Oscar Salvador
2019-10-22 10:24                   ` Michal Hocko
2019-10-22 10:33                     ` Oscar Salvador
2019-10-23  2:15                       ` Naoya Horiguchi
2019-10-23  2:01                   ` Naoya Horiguchi [this message]
2019-10-21  7:45   ` Naoya Horiguchi
2019-10-22  8:00     ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 11/16] mm,hwpoison: Rework soft offline for in-use pages Oscar Salvador
2019-10-18 12:39   ` Michal Hocko
2019-10-21 13:48     ` Oscar Salvador
2019-10-21 14:06       ` Michal Hocko
2019-10-22  7:56         ` Oscar Salvador
2019-10-22  8:30           ` Michal Hocko
2019-10-22  9:40             ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 12/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 13/16] mm,hwpoison: Take pages off the buddy when hard-offlining Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline Oscar Salvador
2019-10-21  9:20   ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 15/16] mm/hwpoison-inject: Rip off duplicated checks Oscar Salvador
2019-10-21  9:40   ` David Hildenbrand
2019-10-22  7:57     ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 16/16] mm, soft-offline: convert parameter to pfn Oscar Salvador
2019-10-18  8:15   ` David Hildenbrand
2020-06-11 16:43 ` [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline Dmitry Yakunin
2020-06-15  6:19   ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191023020133.GA24383@hori.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git