From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BD33EC433F5
	for <linux-mm@archiver.kernel.org>; Wed,  6 Apr 2022 02:16:20 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 460326B0072; Tue,  5 Apr 2022 22:16:10 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 40F296B0073; Tue,  5 Apr 2022 22:16:10 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 2FE096B0074; Tue,  5 Apr 2022 22:16:10 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18])
	by kanga.kvack.org (Postfix) with ESMTP id 21B716B0072
	for <linux-mm@kvack.org>; Tue,  5 Apr 2022 22:16:10 -0400 (EDT)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id D25261838A152
	for <linux-mm@kvack.org>; Wed,  6 Apr 2022 02:15:59 +0000 (UTC)
X-FDA: 79324838838.22.18E2168
Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255])
	by imf14.hostedemail.com (Postfix) with ESMTP id 39A88100034
	for <linux-mm@kvack.org>; Wed,  6 Apr 2022 02:15:57 +0000 (UTC)
Received: from dggpemm500023.china.huawei.com (unknown [172.30.72.54])
	by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4KY7QR1czTz1HBTQ;
	Wed,  6 Apr 2022 10:15:27 +0800 (CST)
Received: from dggpemm100009.china.huawei.com (7.185.36.113) by
 dggpemm500023.china.huawei.com (7.185.36.83) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2308.21; Wed, 6 Apr 2022 10:15:47 +0800
Received: from [10.174.179.24] (10.174.179.24) by
 dggpemm100009.china.huawei.com (7.185.36.113) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2308.21; Wed, 6 Apr 2022 10:15:52 +0800
Subject: Re: Question about hwpoison handling of 1GB hugepage
To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPo+OAgOebtOS5nyk=?=
	<naoya.horiguchi@nec.com>
References: <0af88a11-4dfe-9a4e-7b94-08f12caafcf3@huawei.com>
 <20220403234250.GA2217943@hori.linux.bs1.fc.nec.co.jp>
CC: Andrew Morton <akpm@linux-foundation.org>, "linux-mm@kvack.org"
	<linux-mm@kvack.org>, Linux Kernel Mailing List
	<linux-kernel@vger.kernel.org>
From: Liu Shixin <liushixin2@huawei.com>
Message-ID: <d7c7cad5-3ed6-7191-5f7c-0e18e1a9bfbd@huawei.com>
Date: Wed, 6 Apr 2022 10:15:52 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <20220403234250.GA2217943@hori.linux.bs1.fc.nec.co.jp>
Content-Type: text/plain; charset="utf-8"
X-Originating-IP: [10.174.179.24]
X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To
 dggpemm100009.china.huawei.com (7.185.36.113)
X-CFilter-Loop: Reflected
X-Stat-Signature: t85ya44f5ft8ggd3aapx4cbkroa66kta
Authentication-Results: imf14.hostedemail.com;
	dkim=none;
	spf=pass (imf14.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=liushixin2@huawei.com;
	dmarc=pass (policy=quarantine) header.from=huawei.com
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 39A88100034
X-HE-Tag: 1649211357-295939
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On 2022/4/4 7:42, HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=
=B9=9F) wrote:
> On Thu, Mar 31, 2022 at 06:56:25PM +0800, Liu Shixin wrote:
>> Hi,
>>
>> Recently, I found a problem with hwpoison 1GB hugepage.
>> I created a process and mapped 1GB hugepage. This process will then fo=
rk a
>> child process and write/read this 1GB hugepage. Then I inject hwpoison=
 into
>> this 1GB hugepage. The child process triggers the memory failure and i=
s
>> being killed as expected. After this, the parent process will try to f=
ork a
>> new child process and do the same thing. It is killed again and finall=
y it
>> goes into such an infinite loop. I found this was caused by
>> commit 31286a8484a8 ("mm: hwpoison: disable memory error handling on 1=
GB hugepage")
> Hello Shixin,
>
> It's little unclear to me about what behavior you are expecting and
> what the infinite loop repeats, could you explain little more about the=
m?
> (I briefly tried to reproduce it, but didn't make it...)

There are two process in my environment. The parent process will firstly =
map
an 1GB hugepage then fork a child process and monitor it. If the child pr=
ocess
is killed, the parent process will fork a new child process. The child pr=
ocess will
write to the hugepage.

After we inject a hwpoison to the 1GB hugepage(madvise(MADV_HWPOISON)),
the child process will be killed by MCE when writing to the hugepage. The=
n the
parent process will fork new child process.
=20
I expect the new child process can realloc a new 1GB hugepage and no long=
er be killed.
But now the child process will write to the hwpoison hugepage again and b=
e killed.
For this reason, the parent process will keep forking new child process a=
nd the child
process will keep writing to the hwpoison hugepage and be killd.

>
>> It looks like there is a bug for hwpoison 1GB hugepage so I try to rep=
roduce
>> the bug described. After trying to revert the patch in an earlier vers=
ion of
>> the kernel, I reproduce the bug described. Then I try to revert the pa=
tch in
>> latest version, and find the bug is no longer reproduced.
>>
>> I compare the code paths of 1 GB hugepage and 2 MB hugepage for second=
 madvise(MADV_HWPOISON),
>> and find that the problem is caused because in gup_pud_range(), pud_no=
ne() and
>> pud_huge() both return false and then trigger the bug. But in gup_pmd_=
range(),
>> the pmd_none() is modified to pmd_present() which will make code retur=
n directly.
>> The I find that it is commit 15494520b776 ("mm: fix gup_pud_range") wh=
ich
>> cause latest version not reproduced. I backport commit 15494520b776 in
>> earlier version and find the bug is no longer reproduced either.
> Thank you for the analysis.
> So this patch might make 31286a8484a8 unnecessary, that's a good news.
>
>> So I'd like to consult that is it the time to revert commit 31286a8484=
a8?
>> Or if we modify pud_huge to be similar with pmd_huge, is it sufficient=
?
>>
>> I also noticed there is a TODO comment in memory_failure_hugetlb():
>>     - conversion of a pud that maps an error hugetlb into hwpoison
>>       entry properly works, and
>>     - other mm code walking over page table is aware of pud-aligned
>>       hwpoison entries.=20
> These are simply minimum requirements, but might not be sufficient.
> We need testing (with removing 31286a8484a8) to make sure that
> there's no issues on some corner cases.
> (I start to extend existing hugetlb-related testcases to 1GB ones.)
Looking forward to the testcases and further conclusions.
>
> Thanks,
> Naoya Horiguchi
>
>> I'm not sure whether the above fix are sufficient, so is there anythin=
g else need
>> to analysis that I haven't considered?