On Wed, Apr 12, 2023 at 11:14 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
On 04/11/23 17:27, Liu Shixin wrote:
> Patch a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults")
> introduced a new copy_user_highpage_mc() function, and fix the kernel crash
> when the kernel is copying a normal page as the result of a copy-on-write
> fault and runs into an uncorrectable error. But it doesn't work for HugeTLB.

Andrew asked about user-visible effects.  Perhaps, a better way of
stating this in the commit message might be:

Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write
faults") introduced the routine copy_user_highpage_mc() to gracefully
handle copying of user pages with uncorrectable errors.  Previously,
such copies would result in a kernel crash.  hugetlb has separate code
paths for copy-on-write and does not benefit from the changes made in
commit a873dfe1032a.

Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage()
so that they can also gracefully handle uncorrectable errors in user
pages.  This involves changing the hugetlb specific routine
?copy_user_folio()? from type void to int so that it can return an error.
Modify the hugetlb userfaultfd code in the same way so that it can return
-EHWPOISON if it encounters an uncorrectable error.

NOTE - There is still some churn in the series that introduces
copy_user_folio() and the name may change.

> This is to support HugeTLB by using copy_mc_user_highpage() in copy_subpage()
> and copy_user_gigantic_page() too.
>
> Moreover, this is also used by userfaultfd, it will return -EHWPOISON if
> running into an uncorrectable error.
>
> Signed-off-by: Liu Shixin <liushixin2@huawei.com>
> ---
>  include/linux/mm.h |  6 ++---
>  mm/hugetlb.c       | 19 +++++++++++----
>  mm/memory.c        | 59 +++++++++++++++++++++++++++++-----------------
>  3 files changed, 56 insertions(+), 28 deletions(-)

Code changes look good to me.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

Related question perhaps for Tony not directly impacting this patch.
This patch touches the hugetlb clear page paths withour consequence.

Just wondering if we can/should create something like clear_mc_user_highpage
to address clearing pages as well?  Apologies if this was previously
discussed.

Tony may have better answers but allow me to chime in for this question: Memory related #MC only happens when kernel reads encounter hw uncorrectbale memory errors. Writes(clearing memory page) are “safe” to kernel, at least generating no #MC. So I don’t think clear_user_highpage needs a #MC handled version (or even possible at all).


--
Mike Kravetz