From: Michal Hocko <mhocko@kernel.org>
To: Oscar Salvador <osalvador@suse.de>
Cc: n-horiguchi@ah.jp.nec.com, mike.kravetz@oracle.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages
Date: Tue, 22 Oct 2019 12:24:57 +0200 [thread overview]
Message-ID: <20191022102457.GJ9379@dhcp22.suse.cz> (raw)
In-Reply-To: <20191022095852.GB20429@linux>
On Tue 22-10-19 11:58:52, Oscar Salvador wrote:
> On Tue, Oct 22, 2019 at 11:22:56AM +0200, Michal Hocko wrote:
> > Hmm, that might be a misunderstanding on my end. I thought that it is
> > the MCE handler to say whether the failure is recoverable or not. If yes
> > then we can touch the content of the memory (that would imply the
> > migration). Other than that both paths should be essentially the same,
> > no? Well unrecoverable case would be essentially force migration failure
> > path.
> >
> > MADV_HWPOISON is explicitly documented to test MCE handling IIUC:
> > : This feature is intended for testing of memory error-handling
> > : code; it is available only if the kernel was configured with
> > : CONFIG_MEMORY_FAILURE.
> >
> > There is no explicit note about the type of the error that is injected
> > but I think it is reasonably safe to assume this is a recoverable one.
>
> MADV_HWPOISON stands for hard-offline.
> MADV_SOFT_OFFLINE stands for soft-offline.
>
> MADV_SOFT_OFFLINE (since Linux 2.6.33)
> Soft offline the pages in the range specified by addr and
> length. The memory of each page in the specified range is
> preserved (i.e., when next accessed, the same content will be
> visible, but in a new physical page frame), and the original
> page is offlined (i.e., no longer used, and taken out of
> normal memory management). The effect of the
> MADV_SOFT_OFFLINE operation is invisible to (i.e., does not
> change the semantics of) the calling process.
>
> This feature is intended for testing of memory error-handling
> code; it is available only if the kernel was configured with
> CONFIG_MEMORY_FAILURE.
I have missed that one somehow. Thanks for pointing out.
[...]
> AFAICS, for hard-offline case, a recovered event would be if:
>
> - the page to shut down is already free
> - the page was unmapped
>
> In some cases we need to kill the process if it holds dirty pages.
Yes, I would expect that the page table would be poisoned and the
process receive a SIGBUS when accessing that memory.
> But we never migrate contents in hard-offline path.
> I guess it is because we cannot really trust the contents anymore.
Yes, that makes a perfect sense. What I am saying that the migration
(aka trying to recover) is the main and only difference. The soft
offline should poison page tables when not able to migrate as well
IIUC.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2019-10-22 10:25 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-17 14:21 [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 01/16] mm,hwpoison: cleanup unused PageHuge() check Oscar Salvador
2019-10-18 11:48 ` Michal Hocko
2019-10-21 7:00 ` Naoya Horiguchi
2019-10-21 12:16 ` Michal Hocko
2019-11-12 12:22 ` Aneesh Kumar K.V
2019-11-13 6:02 ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 02/16] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED Oscar Salvador
2019-10-18 11:52 ` Michal Hocko
2019-10-21 7:02 ` Naoya Horiguchi
2019-10-21 12:20 ` Michal Hocko
2019-10-17 14:21 ` [RFC PATCH v2 03/16] mm,madvise: Refactor madvise_inject_error Oscar Salvador
2019-10-21 7:03 ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 04/16] mm,hwpoison-inject: don't pin for hwpoison_filter Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 05/16] mm,hwpoison: Un-export get_hwpoison_page and make it static Oscar Salvador
2019-10-21 7:03 ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 06/16] mm,hwpoison: Kill put_hwpoison_page Oscar Salvador
2019-10-21 7:04 ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 07/16] mm,hwpoison: remove MF_COUNT_INCREASED Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 08/16] mm,hwpoison: remove flag argument from soft offline functions Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 09/16] mm,hwpoison: Unify THP handling for hard and soft offline Oscar Salvador
2019-10-21 7:04 ` Naoya Horiguchi
2019-10-21 9:51 ` [PATCH 17/16] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP Naoya Horiguchi
2019-10-22 8:00 ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages Oscar Salvador
2019-10-18 12:06 ` Michal Hocko
2019-10-21 12:58 ` Oscar Salvador
2019-10-21 15:41 ` Michal Hocko
2019-10-22 7:46 ` Oscar Salvador
2019-10-22 8:26 ` Michal Hocko
2019-10-22 8:35 ` Oscar Salvador
2019-10-22 9:22 ` Michal Hocko
2019-10-22 9:58 ` Oscar Salvador
2019-10-22 10:24 ` Michal Hocko [this message]
2019-10-22 10:33 ` Oscar Salvador
2019-10-23 2:15 ` Naoya Horiguchi
2019-10-23 2:01 ` Naoya Horiguchi
2019-10-21 7:45 ` Naoya Horiguchi
2019-10-22 8:00 ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 11/16] mm,hwpoison: Rework soft offline for in-use pages Oscar Salvador
2019-10-18 12:39 ` Michal Hocko
2019-10-21 13:48 ` Oscar Salvador
2019-10-21 14:06 ` Michal Hocko
2019-10-22 7:56 ` Oscar Salvador
2019-10-22 8:30 ` Michal Hocko
2019-10-22 9:40 ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 12/16] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 13/16] mm,hwpoison: Take pages off the buddy when hard-offlining Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 14/16] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline Oscar Salvador
2019-10-21 9:20 ` Naoya Horiguchi
2019-10-17 14:21 ` [RFC PATCH v2 15/16] mm/hwpoison-inject: Rip off duplicated checks Oscar Salvador
2019-10-21 9:40 ` David Hildenbrand
2019-10-22 7:57 ` Oscar Salvador
2019-10-17 14:21 ` [RFC PATCH v2 16/16] mm, soft-offline: convert parameter to pfn Oscar Salvador
2019-10-18 8:15 ` David Hildenbrand
2020-06-11 16:43 ` [RFC PATCH v2 00/16] Hwpoison rework {hard,soft}-offline Dmitry Yakunin
2020-06-15 6:19 ` HORIGUCHI NAOYA(堀口 直也)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191022102457.GJ9379@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).