All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
       [not found] <99235479-716d-4c40-8f61-8e44c242abf8.xishi.qiuxishi@alibaba-inc.com>
@ 2018-07-06  8:18 ` Naoya Horiguchi
  2018-07-20  7:50   ` Xie XiuQi
  0 siblings, 1 reply; 8+ messages in thread
From: Naoya Horiguchi @ 2018-07-06  8:18 UTC (permalink / raw)
  To: 裘稀石(稀石)
  Cc: linux-mm, linux-kernel, zy.zhengyi

On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> This patch add05cec
> (mm: soft-offline: don't free target page in successful page migration) removes
> set_migratetype_isolate() and unset_migratetype_isolate() in soft_offline_page
> ().
> 
> And this patch 243abd5b
> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes
> if (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it could
> prevent someone
> reuse the free hugetlb again after set the hwpoison flag
> in soft_offline_free_page()
> 
> My question is that if someone reuse the free hugetlb again before 
> soft_offline_free_page() and
> after get_any_page(), then it uses the hopoison page, and this may trigger mce
> kill later, right?

Hi Xishi,

Thank you for pointing out the issue. That's nice catch.

I think that the race condition itself could happen, but it doesn't lead
to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
PageHWPoison flag is just a flag in struct page to report the memory error
from kernel to userspace. So even if a CPU is accessing to the page whose
struct page has PageHWPoison set, that doesn't cause a MCE unless the page
is physically broken.
The type of memory error that soft offline tries to handle is corrected
one which is not a failure yet although it's starting to wear.
So such PageHWPoison page can be reused, but that's not critical because
the page is freed at some point afterword and error containment completes.

However, I noticed that there's a small pain in free hugetlb case.
We call dissolve_free_huge_page() in soft_offline_free_page() which moves
the PageHWPoison flag from the head page to the raw error page.
If the reported race happens, dissolve_free_huge_page() just return without
doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
block is skipped.
The hugepage is allocated and used as usual, but the contaiment doesn't
complete as expected in the normal page, because free_huge_pages() doesn't
call dissolve_free_huge_page() for hwpoison hugepage. This is not critical
because such error hugepage just reside in free hugepage list. But this
might looks like a kind of memory leak. And even worse when hugepage pool
is shrinked and the hwpoison hugepage is freed, the PageHWPoison flag is
still on the head page which is unlikely to be an actual error page.

So I think we need improvement here, how about the fix like below?

  (not tested yet, sorry)

  diff --git a/mm/memory-failure.c b/mm/memory-failure.c
  --- a/mm/memory-failure.c
  +++ b/mm/memory-failure.c
  @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
          struct page *head = compound_head(page);
  
          if (!TestSetPageHWPoison(head)) {
  +               if (page_count(head)) {
  +                       ClearPageHWPoison(head);
  +                       return;
  +               }
  +
                  num_poisoned_pages_inc();
                  if (PageHuge(head))
                          dissolve_free_huge_page(page);

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
  2018-07-06  8:18 ` [RFC] a question about reuse hwpoison page in soft_offline_page() Naoya Horiguchi
@ 2018-07-20  7:50   ` Xie XiuQi
  2018-07-20  8:50     ` 答复: " Zhangfei (Tyler)
  0 siblings, 1 reply; 8+ messages in thread
From: Xie XiuQi @ 2018-07-20  7:50 UTC (permalink / raw)
  To: Naoya Horiguchi, 裘稀石(稀石)
  Cc: linux-mm, linux-kernel, zy.zhengyi, Zhangfei (Tyler),
	lvzhipeng, meinanjing, Zhong Jiang

Hi Naoya, Xishi,

We have a similar problem.
@zhangfei, could you please describe your problem here.

On 2018/7/6 16:18, Naoya Horiguchi wrote:
> On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
>> This patch add05cec
>> (mm: soft-offline: don't free target page in successful page migration) removes
>> set_migratetype_isolate() and unset_migratetype_isolate() in soft_offline_page
>> ().
>>
>> And this patch 243abd5b
>> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes
>> if (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it could
>> prevent someone
>> reuse the free hugetlb again after set the hwpoison flag
>> in soft_offline_free_page()
>>
>> My question is that if someone reuse the free hugetlb again before 
>> soft_offline_free_page() and
>> after get_any_page(), then it uses the hopoison page, and this may trigger mce
>> kill later, right?
> 
> Hi Xishi,
> 
> Thank you for pointing out the issue. That's nice catch.
> 
> I think that the race condition itself could happen, but it doesn't lead
> to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> PageHWPoison flag is just a flag in struct page to report the memory error
> from kernel to userspace. So even if a CPU is accessing to the page whose
> struct page has PageHWPoison set, that doesn't cause a MCE unless the page
> is physically broken.
> The type of memory error that soft offline tries to handle is corrected
> one which is not a failure yet although it's starting to wear.
> So such PageHWPoison page can be reused, but that's not critical because
> the page is freed at some point afterword and error containment completes.
> 
> However, I noticed that there's a small pain in free hugetlb case.
> We call dissolve_free_huge_page() in soft_offline_free_page() which moves
> the PageHWPoison flag from the head page to the raw error page.
> If the reported race happens, dissolve_free_huge_page() just return without
> doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> block is skipped.
> The hugepage is allocated and used as usual, but the contaiment doesn't
> complete as expected in the normal page, because free_huge_pages() doesn't
> call dissolve_free_huge_page() for hwpoison hugepage. This is not critical
> because such error hugepage just reside in free hugepage list. But this
> might looks like a kind of memory leak. And even worse when hugepage pool
> is shrinked and the hwpoison hugepage is freed, the PageHWPoison flag is
> still on the head page which is unlikely to be an actual error page.
> 
> So I think we need improvement here, how about the fix like below?
> 
>   (not tested yet, sorry)
> 
>   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>   --- a/mm/memory-failure.c
>   +++ b/mm/memory-failure.c
>   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
>           struct page *head = compound_head(page);
>   
>           if (!TestSetPageHWPoison(head)) {
>   +               if (page_count(head)) {
>   +                       ClearPageHWPoison(head);
>   +                       return;
>   +               }
>   +
>                   num_poisoned_pages_inc();
>                   if (PageHuge(head))
>                           dissolve_free_huge_page(page);
> 
> Thanks,
> Naoya Horiguchi
> 
> .
> 

-- 
Thanks,
Xie XiuQi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: [RFC] a question about reuse hwpoison page in soft_offline_page()
  2018-07-20  7:50   ` Xie XiuQi
@ 2018-07-20  8:50     ` Zhangfei (Tyler)
  2018-07-23  6:11       ` Naoya Horiguchi
  0 siblings, 1 reply; 8+ messages in thread
From: Zhangfei (Tyler) @ 2018-07-20  8:50 UTC (permalink / raw)
  To: Xiexiuqi, Naoya Horiguchi, 裘稀石(稀石)
  Cc: linux-mm, linux-kernel, zy.zhengyi, lvzhipeng, meinanjing,
	zhongjiang, Dukaitian, Chenglongfei

Hi Naoya&xishi:
	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?

Thanks!

-----邮件原件-----
发件人: Xiexiuqi 
发送时间: 2018年7月20日 15:50
收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>
抄送: linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()

Hi Naoya, Xishi,

We have a similar problem.
@zhangfei, could you please describe your problem here.

On 2018/7/6 16:18, Naoya Horiguchi wrote:
> On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
>> This patch add05cec
>> (mm: soft-offline: don't free target page in successful page 
>> migration) removes
>> set_migratetype_isolate() and unset_migratetype_isolate() in 
>> soft_offline_page ().
>>
>> And this patch 243abd5b
>> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes if 
>> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
>> could prevent someone reuse the free hugetlb again after set the 
>> hwpoison flag in soft_offline_free_page()
>>
>> My question is that if someone reuse the free hugetlb again before
>> soft_offline_free_page() and
>> after get_any_page(), then it uses the hopoison page, and this may 
>> trigger mce kill later, right?
> 
> Hi Xishi,
> 
> Thank you for pointing out the issue. That's nice catch.
> 
> I think that the race condition itself could happen, but it doesn't 
> lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> PageHWPoison flag is just a flag in struct page to report the memory 
> error from kernel to userspace. So even if a CPU is accessing to the 
> page whose struct page has PageHWPoison set, that doesn't cause a MCE 
> unless the page is physically broken.
> The type of memory error that soft offline tries to handle is 
> corrected one which is not a failure yet although it's starting to wear.
> So such PageHWPoison page can be reused, but that's not critical 
> because the page is freed at some point afterword and error containment completes.
> 
> However, I noticed that there's a small pain in free hugetlb case.
> We call dissolve_free_huge_page() in soft_offline_free_page() which 
> moves the PageHWPoison flag from the head page to the raw error page.
> If the reported race happens, dissolve_free_huge_page() just return 
> without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> block is skipped.
> The hugepage is allocated and used as usual, but the contaiment 
> doesn't complete as expected in the normal page, because 
> free_huge_pages() doesn't call dissolve_free_huge_page() for hwpoison 
> hugepage. This is not critical because such error hugepage just reside 
> in free hugepage list. But this might looks like a kind of memory 
> leak. And even worse when hugepage pool is shrinked and the hwpoison 
> hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> 
> So I think we need improvement here, how about the fix like below?
> 
>   (not tested yet, sorry)
> 
>   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>   --- a/mm/memory-failure.c
>   +++ b/mm/memory-failure.c
>   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
>           struct page *head = compound_head(page);
>   
>           if (!TestSetPageHWPoison(head)) {
>   +               if (page_count(head)) {
>   +                       ClearPageHWPoison(head);
>   +                       return;
>   +               }
>   +
>                   num_poisoned_pages_inc();
>                   if (PageHuge(head))
>                           dissolve_free_huge_page(page);
> 
> Thanks,
> Naoya Horiguchi
> 
> .
> 

--
Thanks,
Xie XiuQi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
  2018-07-20  8:50     ` 答复: " Zhangfei (Tyler)
@ 2018-07-23  6:11       ` Naoya Horiguchi
  2018-11-20 11:36           ` Zhangfei (Tyler)
  0 siblings, 1 reply; 8+ messages in thread
From: Naoya Horiguchi @ 2018-07-23  6:11 UTC (permalink / raw)
  To: Zhangfei (Tyler)
  Cc: Xiexiuqi, 裘稀石(稀石),
	linux-mm, linux-kernel, zy.zhengyi, lvzhipeng, meinanjing,
	zhongjiang, Dukaitian, Chenglongfei

On Fri, Jul 20, 2018 at 08:50:22AM +0000, Zhangfei (Tyler) wrote:
> Hi Naoya&xishi:
> 	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?

Hi Tyler,

Latest version of the fix is available on https://lkml.org/lkml/2018/7/17/60.
I'm still discussing with Michal about better design of this area, but
I think we'll go with this for short term fix.

Thanks,
Naoya Horiguchi

> 
> -----邮件原件-----
> 发件人: Xiexiuqi 
> 发送时间: 2018年7月20日 15:50
> 收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>
> 抄送: linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
> 主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
> 
> Hi Naoya, Xishi,
> 
> We have a similar problem.
> @zhangfei, could you please describe your problem here.
> 
> On 2018/7/6 16:18, Naoya Horiguchi wrote:
> > On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> >> This patch add05cec
> >> (mm: soft-offline: don't free target page in successful page 
> >> migration) removes
> >> set_migratetype_isolate() and unset_migratetype_isolate() in 
> >> soft_offline_page ().
> >>
> >> And this patch 243abd5b
> >> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes if 
> >> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
> >> could prevent someone reuse the free hugetlb again after set the 
> >> hwpoison flag in soft_offline_free_page()
> >>
> >> My question is that if someone reuse the free hugetlb again before
> >> soft_offline_free_page() and
> >> after get_any_page(), then it uses the hopoison page, and this may 
> >> trigger mce kill later, right?
> > 
> > Hi Xishi,
> > 
> > Thank you for pointing out the issue. That's nice catch.
> > 
> > I think that the race condition itself could happen, but it doesn't 
> > lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> > PageHWPoison flag is just a flag in struct page to report the memory 
> > error from kernel to userspace. So even if a CPU is accessing to the 
> > page whose struct page has PageHWPoison set, that doesn't cause a MCE 
> > unless the page is physically broken.
> > The type of memory error that soft offline tries to handle is 
> > corrected one which is not a failure yet although it's starting to wear.
> > So such PageHWPoison page can be reused, but that's not critical 
> > because the page is freed at some point afterword and error containment completes.
> > 
> > However, I noticed that there's a small pain in free hugetlb case.
> > We call dissolve_free_huge_page() in soft_offline_free_page() which 
> > moves the PageHWPoison flag from the head page to the raw error page.
> > If the reported race happens, dissolve_free_huge_page() just return 
> > without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> > block is skipped.
> > The hugepage is allocated and used as usual, but the contaiment 
> > doesn't complete as expected in the normal page, because 
> > free_huge_pages() doesn't call dissolve_free_huge_page() for hwpoison 
> > hugepage. This is not critical because such error hugepage just reside 
> > in free hugepage list. But this might looks like a kind of memory 
> > leak. And even worse when hugepage pool is shrinked and the hwpoison 
> > hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> > 
> > So I think we need improvement here, how about the fix like below?
> > 
> >   (not tested yet, sorry)
> > 
> >   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >   --- a/mm/memory-failure.c
> >   +++ b/mm/memory-failure.c
> >   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
> >           struct page *head = compound_head(page);
> >   
> >           if (!TestSetPageHWPoison(head)) {
> >   +               if (page_count(head)) {
> >   +                       ClearPageHWPoison(head);
> >   +                       return;
> >   +               }
> >   +
> >                   num_poisoned_pages_inc();
> >                   if (PageHuge(head))
> >                           dissolve_free_huge_page(page);
> > 
> > Thanks,
> > Naoya Horiguchi
> > 
> > .
> > 
> 
> --
> Thanks,
> Xie XiuQi
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: [RFC] a question about reuse hwpoison page in soft_offline_page()
  2018-07-23  6:11       ` Naoya Horiguchi
@ 2018-11-20 11:36           ` Zhangfei (Tyler)
  0 siblings, 0 replies; 8+ messages in thread
From: Zhangfei (Tyler) @ 2018-11-20 11:36 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Xiexiuqi, linux-mm, linux-kernel, Lvzhipeng (pang,
	Intelligent Computing R&D),
	meinanjing, Dukaitian, Chenglongfei (Longfei Cheng,
	Intelligent Computing R&D),
	Liugang (Robert, Intelligent Computing R&D),
	Chenzhuowei (Intelligent Computing R&D)

Hi Naoya
	Any Update on this issue?Is there a final conclusion on how to fix this issue?

Thinks
	
-----邮件原件-----
发件人: Naoya Horiguchi [mailto:n-horiguchi@ah.jp.nec.com] 
发送时间: 2018年7月23日 14:11
收件人: Zhangfei (Tyler) <tyler.zhang@huawei.com>
抄送: Xiexiuqi <xiexiuqi@huawei.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>; linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>; Dukaitian <dukaitian@huawei.com>; Chenglongfei <chenglongfei@huawei.com>
主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()

On Fri, Jul 20, 2018 at 08:50:22AM +0000, Zhangfei (Tyler) wrote:
> Hi Naoya&xishi:
> 	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?

Hi Tyler,

Latest version of the fix is available on https://lkml.org/lkml/2018/7/17/60.
I'm still discussing with Michal about better design of this area, but I think we'll go with this for short term fix.

Thanks,
Naoya Horiguchi

> 
> -----邮件原件-----
> 发件人: Xiexiuqi
> 发送时间: 2018年7月20日 15:50
> 收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) 
> <xishi.qiuxishi@alibaba-inc.com>
> 抄送: linux-mm <linux-mm@kvack.org>; linux-kernel 
> <linux-kernel@vger.kernel.org>; zy.zhengyi 
> <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) 
> <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing 
> <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
> 主题: Re: [RFC] a question about reuse hwpoison page in 
> soft_offline_page()
> 
> Hi Naoya, Xishi,
> 
> We have a similar problem.
> @zhangfei, could you please describe your problem here.
> 
> On 2018/7/6 16:18, Naoya Horiguchi wrote:
> > On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> >> This patch add05cec
> >> (mm: soft-offline: don't free target page in successful page
> >> migration) removes
> >> set_migratetype_isolate() and unset_migratetype_isolate() in 
> >> soft_offline_page ().
> >>
> >> And this patch 243abd5b
> >> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes 
> >> if
> >> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
> >> could prevent someone reuse the free hugetlb again after set the 
> >> hwpoison flag in soft_offline_free_page()
> >>
> >> My question is that if someone reuse the free hugetlb again before
> >> soft_offline_free_page() and
> >> after get_any_page(), then it uses the hopoison page, and this may 
> >> trigger mce kill later, right?
> > 
> > Hi Xishi,
> > 
> > Thank you for pointing out the issue. That's nice catch.
> > 
> > I think that the race condition itself could happen, but it doesn't 
> > lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> > PageHWPoison flag is just a flag in struct page to report the memory 
> > error from kernel to userspace. So even if a CPU is accessing to the 
> > page whose struct page has PageHWPoison set, that doesn't cause a 
> > MCE unless the page is physically broken.
> > The type of memory error that soft offline tries to handle is 
> > corrected one which is not a failure yet although it's starting to wear.
> > So such PageHWPoison page can be reused, but that's not critical 
> > because the page is freed at some point afterword and error containment completes.
> > 
> > However, I noticed that there's a small pain in free hugetlb case.
> > We call dissolve_free_huge_page() in soft_offline_free_page() which 
> > moves the PageHWPoison flag from the head page to the raw error page.
> > If the reported race happens, dissolve_free_huge_page() just return 
> > without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> > block is skipped.
> > The hugepage is allocated and used as usual, but the contaiment 
> > doesn't complete as expected in the normal page, because
> > free_huge_pages() doesn't call dissolve_free_huge_page() for 
> > hwpoison hugepage. This is not critical because such error hugepage 
> > just reside in free hugepage list. But this might looks like a kind 
> > of memory leak. And even worse when hugepage pool is shrinked and 
> > the hwpoison hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> > 
> > So I think we need improvement here, how about the fix like below?
> > 
> >   (not tested yet, sorry)
> > 
> >   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >   --- a/mm/memory-failure.c
> >   +++ b/mm/memory-failure.c
> >   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
> >           struct page *head = compound_head(page);
> >   
> >           if (!TestSetPageHWPoison(head)) {
> >   +               if (page_count(head)) {
> >   +                       ClearPageHWPoison(head);
> >   +                       return;
> >   +               }
> >   +
> >                   num_poisoned_pages_inc();
> >                   if (PageHuge(head))
> >                           dissolve_free_huge_page(page);
> > 
> > Thanks,
> > Naoya Horiguchi
> > 
> > .
> > 
> 
> --
> Thanks,
> Xie XiuQi
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: [RFC] a question about reuse hwpoison page in soft_offline_page()
@ 2018-11-20 11:36           ` Zhangfei (Tyler)
  0 siblings, 0 replies; 8+ messages in thread
From: Zhangfei (Tyler) @ 2018-11-20 11:36 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Xiexiuqi, linux-mm, linux-kernel, Lvzhipeng (pang,
	Intelligent Computing R&D),
	meinanjing, Dukaitian, Chenglongfei (Longfei Cheng,
	Intelligent Computing R&D),
	Liugang (Robert, Intelligent Computing R&D),
	Chenzhuowei (Intelligent Computing R&D)

Hi Naoya
	Any Update on this issue?Is there a final conclusion on how to fix this issue?

Thinks
	
-----邮件原件-----
发件人: Naoya Horiguchi [mailto:n-horiguchi@ah.jp.nec.com] 
发送时间: 2018年7月23日 14:11
收件人: Zhangfei (Tyler) <tyler.zhang@huawei.com>
抄送: Xiexiuqi <xiexiuqi@huawei.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>; linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>; Dukaitian <dukaitian@huawei.com>; Chenglongfei <chenglongfei@huawei.com>
主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()

On Fri, Jul 20, 2018 at 08:50:22AM +0000, Zhangfei (Tyler) wrote:
> Hi Naoya&xishi:
> 	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?

Hi Tyler,

Latest version of the fix is available on https://lkml.org/lkml/2018/7/17/60.
I'm still discussing with Michal about better design of this area, but I think we'll go with this for short term fix.

Thanks,
Naoya Horiguchi

> 
> -----邮件原件-----
> 发件人: Xiexiuqi
> 发送时间: 2018年7月20日 15:50
> 收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) 
> <xishi.qiuxishi@alibaba-inc.com>
> 抄送: linux-mm <linux-mm@kvack.org>; linux-kernel 
> <linux-kernel@vger.kernel.org>; zy.zhengyi 
> <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) 
> <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing 
> <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
> 主题: Re: [RFC] a question about reuse hwpoison page in 
> soft_offline_page()
> 
> Hi Naoya, Xishi,
> 
> We have a similar problem.
> @zhangfei, could you please describe your problem here.
> 
> On 2018/7/6 16:18, Naoya Horiguchi wrote:
> > On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> >> This patch add05cec
> >> (mm: soft-offline: don't free target page in successful page
> >> migration) removes
> >> set_migratetype_isolate() and unset_migratetype_isolate() in 
> >> soft_offline_page ().
> >>
> >> And this patch 243abd5b
> >> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes 
> >> if
> >> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
> >> could prevent someone reuse the free hugetlb again after set the 
> >> hwpoison flag in soft_offline_free_page()
> >>
> >> My question is that if someone reuse the free hugetlb again before
> >> soft_offline_free_page() and
> >> after get_any_page(), then it uses the hopoison page, and this may 
> >> trigger mce kill later, right?
> > 
> > Hi Xishi,
> > 
> > Thank you for pointing out the issue. That's nice catch.
> > 
> > I think that the race condition itself could happen, but it doesn't 
> > lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> > PageHWPoison flag is just a flag in struct page to report the memory 
> > error from kernel to userspace. So even if a CPU is accessing to the 
> > page whose struct page has PageHWPoison set, that doesn't cause a 
> > MCE unless the page is physically broken.
> > The type of memory error that soft offline tries to handle is 
> > corrected one which is not a failure yet although it's starting to wear.
> > So such PageHWPoison page can be reused, but that's not critical 
> > because the page is freed at some point afterword and error containment completes.
> > 
> > However, I noticed that there's a small pain in free hugetlb case.
> > We call dissolve_free_huge_page() in soft_offline_free_page() which 
> > moves the PageHWPoison flag from the head page to the raw error page.
> > If the reported race happens, dissolve_free_huge_page() just return 
> > without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> > block is skipped.
> > The hugepage is allocated and used as usual, but the contaiment 
> > doesn't complete as expected in the normal page, because
> > free_huge_pages() doesn't call dissolve_free_huge_page() for 
> > hwpoison hugepage. This is not critical because such error hugepage 
> > just reside in free hugepage list. But this might looks like a kind 
> > of memory leak. And even worse when hugepage pool is shrinked and 
> > the hwpoison hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> > 
> > So I think we need improvement here, how about the fix like below?
> > 
> >   (not tested yet, sorry)
> > 
> >   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >   --- a/mm/memory-failure.c
> >   +++ b/mm/memory-failure.c
> >   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
> >           struct page *head = compound_head(page);
> >   
> >           if (!TestSetPageHWPoison(head)) {
> >   +               if (page_count(head)) {
> >   +                       ClearPageHWPoison(head);
> >   +                       return;
> >   +               }
> >   +
> >                   num_poisoned_pages_inc();
> >                   if (PageHuge(head))
> >                           dissolve_free_huge_page(page);
> > 
> > Thanks,
> > Naoya Horiguchi
> > 
> > .
> > 
> 
> --
> Thanks,
> Xie XiuQi
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 答复: [RFC] a question about reuse hwpoison page in soft_offline_page()
  2018-11-20 11:36           ` Zhangfei (Tyler)
@ 2018-11-22  6:42             ` Naoya Horiguchi
  -1 siblings, 0 replies; 8+ messages in thread
From: Naoya Horiguchi @ 2018-11-22  6:42 UTC (permalink / raw)
  To: Zhangfei (Tyler)
  Cc: Xiexiuqi, linux-mm, linux-kernel, Lvzhipeng (pang,
	Intelligent Computing R&D),
	meinanjing, Dukaitian, Chenglongfei (Longfei Cheng,
	Intelligent Computing R&D),
	Liugang (Robert, Intelligent Computing R&D),
	Chenzhuowei (Intelligent Computing R&D)

Hi Zhangfei,

On Tue, Nov 20, 2018 at 11:36:16AM +0000, Zhangfei (Tyler) wrote:
> Hi Naoya
> 	Any Update on this issue?Is there a final conclusion on how to fix this issue?

This issue is solved by the following commit for 4kB pages:

    commit d4ae9916ea2947341180d2b538f48875ff393a86
    Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Date:   Thu Aug 23 17:00:42 2018 -0700
    
        mm: soft-offline: close the race against page allocation

The reported sigbus issue should not reproduce because PageHWPoison is
set under zone->lock. We safely give up page contaiment if the race happens.

Thanks,
Naoya Horiguchi

> 
> Thinks
> 	
> -----邮件原件-----
> 发件人: Naoya Horiguchi [mailto:n-horiguchi@ah.jp.nec.com] 
> 发送时间: 2018年7月23日 14:11
> 收件人: Zhangfei (Tyler) <tyler.zhang@huawei.com>
> 抄送: Xiexiuqi <xiexiuqi@huawei.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>; linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>; Dukaitian <dukaitian@huawei.com>; Chenglongfei <chenglongfei@huawei.com>
> 主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
> 
> On Fri, Jul 20, 2018 at 08:50:22AM +0000, Zhangfei (Tyler) wrote:
> > Hi Naoya&xishi:
> > 	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?
> 
> Hi Tyler,
> 
> Latest version of the fix is available on https://lkml.org/lkml/2018/7/17/60.
> I'm still discussing with Michal about better design of this area, but I think we'll go with this for short term fix.
> 
> Thanks,
> Naoya Horiguchi
> 
> > 
> > -----邮件原件-----
> > 发件人: Xiexiuqi
> > 发送时间: 2018年7月20日 15:50
> > 收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) 
> > <xishi.qiuxishi@alibaba-inc.com>
> > 抄送: linux-mm <linux-mm@kvack.org>; linux-kernel 
> > <linux-kernel@vger.kernel.org>; zy.zhengyi 
> > <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) 
> > <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing 
> > <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
> > 主题: Re: [RFC] a question about reuse hwpoison page in 
> > soft_offline_page()
> > 
> > Hi Naoya, Xishi,
> > 
> > We have a similar problem.
> > @zhangfei, could you please describe your problem here.
> > 
> > On 2018/7/6 16:18, Naoya Horiguchi wrote:
> > > On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> > >> This patch add05cec
> > >> (mm: soft-offline: don't free target page in successful page
> > >> migration) removes
> > >> set_migratetype_isolate() and unset_migratetype_isolate() in 
> > >> soft_offline_page ().
> > >>
> > >> And this patch 243abd5b
> > >> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes 
> > >> if
> > >> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
> > >> could prevent someone reuse the free hugetlb again after set the 
> > >> hwpoison flag in soft_offline_free_page()
> > >>
> > >> My question is that if someone reuse the free hugetlb again before
> > >> soft_offline_free_page() and
> > >> after get_any_page(), then it uses the hopoison page, and this may 
> > >> trigger mce kill later, right?
> > > 
> > > Hi Xishi,
> > > 
> > > Thank you for pointing out the issue. That's nice catch.
> > > 
> > > I think that the race condition itself could happen, but it doesn't 
> > > lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> > > PageHWPoison flag is just a flag in struct page to report the memory 
> > > error from kernel to userspace. So even if a CPU is accessing to the 
> > > page whose struct page has PageHWPoison set, that doesn't cause a 
> > > MCE unless the page is physically broken.
> > > The type of memory error that soft offline tries to handle is 
> > > corrected one which is not a failure yet although it's starting to wear.
> > > So such PageHWPoison page can be reused, but that's not critical 
> > > because the page is freed at some point afterword and error containment completes.
> > > 
> > > However, I noticed that there's a small pain in free hugetlb case.
> > > We call dissolve_free_huge_page() in soft_offline_free_page() which 
> > > moves the PageHWPoison flag from the head page to the raw error page.
> > > If the reported race happens, dissolve_free_huge_page() just return 
> > > without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> > > block is skipped.
> > > The hugepage is allocated and used as usual, but the contaiment 
> > > doesn't complete as expected in the normal page, because
> > > free_huge_pages() doesn't call dissolve_free_huge_page() for 
> > > hwpoison hugepage. This is not critical because such error hugepage 
> > > just reside in free hugepage list. But this might looks like a kind 
> > > of memory leak. And even worse when hugepage pool is shrinked and 
> > > the hwpoison hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> > > 
> > > So I think we need improvement here, how about the fix like below?
> > > 
> > >   (not tested yet, sorry)
> > > 
> > >   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > >   --- a/mm/memory-failure.c
> > >   +++ b/mm/memory-failure.c
> > >   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
> > >           struct page *head = compound_head(page);
> > >   
> > >           if (!TestSetPageHWPoison(head)) {
> > >   +               if (page_count(head)) {
> > >   +                       ClearPageHWPoison(head);
> > >   +                       return;
> > >   +               }
> > >   +
> > >                   num_poisoned_pages_inc();
> > >                   if (PageHuge(head))
> > >                           dissolve_free_huge_page(page);
> > > 
> > > Thanks,
> > > Naoya Horiguchi
> > > 
> > > .
> > > 
> > 
> > --
> > Thanks,
> > Xie XiuQi
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 答复: [RFC] a question about reuse hwpoison page in soft_offline_page()
@ 2018-11-22  6:42             ` Naoya Horiguchi
  0 siblings, 0 replies; 8+ messages in thread
From: Naoya Horiguchi @ 2018-11-22  6:42 UTC (permalink / raw)
  To: Zhangfei (Tyler)
  Cc: Xiexiuqi, linux-mm, linux-kernel, Lvzhipeng (pang,
	Intelligent Computing R&D),
	meinanjing, Dukaitian, Chenglongfei (Longfei Cheng,
	Intelligent Computing R&D),
	Liugang (Robert, Intelligent Computing R&D),
	Chenzhuowei (Intelligent Computing R&D)

Hi Zhangfei,

On Tue, Nov 20, 2018 at 11:36:16AM +0000, Zhangfei (Tyler) wrote:
> Hi Naoya
> 	Any Update on this issue?Is there a final conclusion on how to fix this issue?

This issue is solved by the following commit for 4kB pages:

    commit d4ae9916ea2947341180d2b538f48875ff393a86
    Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Date:   Thu Aug 23 17:00:42 2018 -0700
    
        mm: soft-offline: close the race against page allocation

The reported sigbus issue should not reproduce because PageHWPoison is
set under zone->lock. We safely give up page contaiment if the race happens.

Thanks,
Naoya Horiguchi

> 
> Thinks
> 	
> -----邮件原件-----
> 发件人: Naoya Horiguchi [mailto:n-horiguchi@ah.jp.nec.com] 
> 发送时间: 2018年7月23日 14:11
> 收件人: Zhangfei (Tyler) <tyler.zhang@huawei.com>
> 抄送: Xiexiuqi <xiexiuqi@huawei.com>; 裘稀石(稀石) <xishi.qiuxishi@alibaba-inc.com>; linux-mm <linux-mm@kvack.org>; linux-kernel <linux-kernel@vger.kernel.org>; zy.zhengyi <zy.zhengyi@alibaba-inc.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>; Dukaitian <dukaitian@huawei.com>; Chenglongfei <chenglongfei@huawei.com>
> 主题: Re: [RFC] a question about reuse hwpoison page in soft_offline_page()
> 
> On Fri, Jul 20, 2018 at 08:50:22AM +0000, Zhangfei (Tyler) wrote:
> > Hi Naoya&xishi:
> > 	We have a similar problem, the difference is that we did not Enable hugepage, the soft-offline was executed in the case of normal 4K pages, and finally the MCE kill was triggered(find hwpoison flag is already set-->ret = VM_FAULT_HWPOISON-->mm_fault_error -->do_sigbus --> mce kill). We noticed that the new patch made some modifications to the case of huge page offline, But how can we avoid this race condition for the case of normal page?
> 
> Hi Tyler,
> 
> Latest version of the fix is available on https://lkml.org/lkml/2018/7/17/60.
> I'm still discussing with Michal about better design of this area, but I think we'll go with this for short term fix.
> 
> Thanks,
> Naoya Horiguchi
> 
> > 
> > -----邮件原件-----
> > 发件人: Xiexiuqi
> > 发送时间: 2018年7月20日 15:50
> > 收件人: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>; 裘稀石(稀石) 
> > <xishi.qiuxishi@alibaba-inc.com>
> > 抄送: linux-mm <linux-mm@kvack.org>; linux-kernel 
> > <linux-kernel@vger.kernel.org>; zy.zhengyi 
> > <zy.zhengyi@alibaba-inc.com>; Zhangfei (Tyler) 
> > <tyler.zhang@huawei.com>; lvzhipeng <lvzhipeng@huawei.com>; meinanjing 
> > <meinanjing@huawei.com>; zhongjiang <zhongjiang@huawei.com>
> > 主题: Re: [RFC] a question about reuse hwpoison page in 
> > soft_offline_page()
> > 
> > Hi Naoya, Xishi,
> > 
> > We have a similar problem.
> > @zhangfei, could you please describe your problem here.
> > 
> > On 2018/7/6 16:18, Naoya Horiguchi wrote:
> > > On Fri, Jul 06, 2018 at 11:37:41AM +0800, 裘稀石(稀石) wrote:
> > >> This patch add05cec
> > >> (mm: soft-offline: don't free target page in successful page
> > >> migration) removes
> > >> set_migratetype_isolate() and unset_migratetype_isolate() in 
> > >> soft_offline_page ().
> > >>
> > >> And this patch 243abd5b
> > >> (mm: hugetlb: prevent reuse of hwpoisoned free hugepages) changes 
> > >> if
> > >> (!is_migrate_isolate_page(page)) to if (!PageHWPoison(page)), so it 
> > >> could prevent someone reuse the free hugetlb again after set the 
> > >> hwpoison flag in soft_offline_free_page()
> > >>
> > >> My question is that if someone reuse the free hugetlb again before
> > >> soft_offline_free_page() and
> > >> after get_any_page(), then it uses the hopoison page, and this may 
> > >> trigger mce kill later, right?
> > > 
> > > Hi Xishi,
> > > 
> > > Thank you for pointing out the issue. That's nice catch.
> > > 
> > > I think that the race condition itself could happen, but it doesn't 
> > > lead to MCE kill because PageHWPoison is not visible to HW which triggers MCE.
> > > PageHWPoison flag is just a flag in struct page to report the memory 
> > > error from kernel to userspace. So even if a CPU is accessing to the 
> > > page whose struct page has PageHWPoison set, that doesn't cause a 
> > > MCE unless the page is physically broken.
> > > The type of memory error that soft offline tries to handle is 
> > > corrected one which is not a failure yet although it's starting to wear.
> > > So such PageHWPoison page can be reused, but that's not critical 
> > > because the page is freed at some point afterword and error containment completes.
> > > 
> > > However, I noticed that there's a small pain in free hugetlb case.
> > > We call dissolve_free_huge_page() in soft_offline_free_page() which 
> > > moves the PageHWPoison flag from the head page to the raw error page.
> > > If the reported race happens, dissolve_free_huge_page() just return 
> > > without doing any dissolve work because "if (PageHuge(page) && !page_count(page))"
> > > block is skipped.
> > > The hugepage is allocated and used as usual, but the contaiment 
> > > doesn't complete as expected in the normal page, because
> > > free_huge_pages() doesn't call dissolve_free_huge_page() for 
> > > hwpoison hugepage. This is not critical because such error hugepage 
> > > just reside in free hugepage list. But this might looks like a kind 
> > > of memory leak. And even worse when hugepage pool is shrinked and 
> > > the hwpoison hugepage is freed, the PageHWPoison flag is still on the head page which is unlikely to be an actual error page.
> > > 
> > > So I think we need improvement here, how about the fix like below?
> > > 
> > >   (not tested yet, sorry)
> > > 
> > >   diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > >   --- a/mm/memory-failure.c
> > >   +++ b/mm/memory-failure.c
> > >   @@ -1883,6 +1883,11 @@ static void soft_offline_free_page(struct page *page)
> > >           struct page *head = compound_head(page);
> > >   
> > >           if (!TestSetPageHWPoison(head)) {
> > >   +               if (page_count(head)) {
> > >   +                       ClearPageHWPoison(head);
> > >   +                       return;
> > >   +               }
> > >   +
> > >                   num_poisoned_pages_inc();
> > >                   if (PageHuge(head))
> > >                           dissolve_free_huge_page(page);
> > > 
> > > Thanks,
> > > Naoya Horiguchi
> > > 
> > > .
> > > 
> > 
> > --
> > Thanks,
> > Xie XiuQi
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-22  6:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <99235479-716d-4c40-8f61-8e44c242abf8.xishi.qiuxishi@alibaba-inc.com>
2018-07-06  8:18 ` [RFC] a question about reuse hwpoison page in soft_offline_page() Naoya Horiguchi
2018-07-20  7:50   ` Xie XiuQi
2018-07-20  8:50     ` 答复: " Zhangfei (Tyler)
2018-07-23  6:11       ` Naoya Horiguchi
2018-11-20 11:36         ` 答复: " Zhangfei (Tyler)
2018-11-20 11:36           ` Zhangfei (Tyler)
2018-11-22  6:42           ` Naoya Horiguchi
2018-11-22  6:42             ` Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.