Re: [RFC] Kernel Support of Memory Error Detection.

From: Jiaqi Yan <jiaqiyan@google.com>
To: "Luck, Tony" <tony.luck@intel.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"david@redhat.com" <david@redhat.com>,
	 "Aktas, Erdem" <erdemaktas@google.com>,
	"pgonda@google.com" <pgonda@google.com>,
	 "rientjes@google.com" <rientjes@google.com>,
	"Hsiao, Duen-wen" <duenwen@google.com>,
	 "Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
	"Malvestuto, Mike" <mike.malvestuto@intel.com>,
	 "gthelen@google.com" <gthelen@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "jthoughton@google.com" <jthoughton@google.com>,
	"Ghannam, Yazen" <Yazen.Ghannam@amd.com>,
	 Sean Christopherson <seanjc@google.com>
Subject: Re: [RFC] Kernel Support of Memory Error Detection.
Date: Thu, 10 Nov 2022 12:25:03 -0800	[thread overview]
Message-ID: <CACw3F527uDU3t6S_buRhCRsPFrabKCumF2H_5Jp_WzBTVVD8qQ@mail.gmail.com> (raw)
In-Reply-To: <SJ1PR11MB6083AE844ABC664FD1A8276BFC3E9@SJ1PR11MB6083.namprd11.prod.outlook.com>

On Wed, Nov 9, 2022 at 8:16 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> > I think that another viewpoint of how we prioritize memory type to scan
> > is kernel vs userspace memory. Current hwpoison mechanism does little to
> > recover from errors in kernel pages (slab, reserved), so there seesm
> > little benefit to detect such errors proactively and beforehand.  If the
> > resource for scanning is limited, the user might think of focusing on
> > scanning userspace memory.
>
> Page cache is (in some many use cases) a large user of kernel memory, and there
> would be options for recovery if errors were pre-emptively found: clean page ->
> re-read from storage, modified page -> mark in some way to force EIO for read()
> and fail(?) mmap().
>
> -Tony

Adding the page cache into discussion, I would like to separate the
memory scanner from mm's recovery mechanism.

We want to build an agnostic in-kernel scanner that safely detects
memory errors in physical memory.
(e.g. for IntelX86 all usable physical pages in e820), ideally without
the need to know the "memory type" (owned by user vs kernel? free vs
allocated? page cache dirty vs clean? owned by virtualization guest vs
host).
After the scanner detects a PFN has a memory error, it reports to the
memory-failure module, who classifies the type of the memory page and
takes recovery actions accordingly.
(For example, page cache will be handled by me_pagecache_dirty/clean,
I believe that's basically what Tony described)
So the proactive scanner should always improve the kernel's memory
reliability by recovering more error pages and recover proactively
(not waiting until someone's access).

That being said, prioritizing scanning a certain type of memory is
then hard (if not impossible).
Because the in-kernel background thread design sees all memory the
same type, physical memory, to make things simple.

The alternative is we assume there is a caller to drive the scanner.
This caller can be either userspace or kernel space (our RFC chooses userspace).
Then the caller can prioritize or only scan a certain type of memory,
but caller has to secure the memory regions before passing to scanner.

The "How to Scan" section in RFC has more details. Please do share
your opinion/preference for the two designs.