All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Liu Shixin <liushixin2@huawei.com>,
	Yang Shi <shy828301@gmail.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <songmuchun@bytedance.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages
Date: Wed, 6 Jul 2022 23:06:15 +0000	[thread overview]
Message-ID: <20220706230615.GA2560241@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <3b6bdbbc-289c-4d38-5728-ed3efcde25d8@huawei.com>

On Wed, Jul 06, 2022 at 10:37:50AM +0800, Miaohe Lin wrote:
> On 2022/7/4 9:33, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > When handling memory error on a hugetlb page, the error handler tries to
> > dissolve and turn it into 4kB pages.  If it's successfully dissolved,
> > PageHWPoison flag is moved to the raw error page, so that's all right.
> > However, dissolve sometimes fails, then the error page is left as
> > hwpoisoned hugepage. It's useful if we can retry to dissolve it to save
> > healthy pages, but that's not possible now because the information about
> > where the raw error pages is lost.
> > 
> > Use the private field of a few tail pages to keep that information.  The
> > code path of shrinking hugepage pool uses this info to try delayed dissolve.
> > In order to remember multiple errors in a hugepage, a singly-linked list
> > originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed.  Only
> > simple operations (adding an entry or clearing all) are required and the
> > list is assumed not to be very long, so this simple data structure should
> > be enough.
> > 
> > If we failed to save raw error info, the hwpoison hugepage has errors on
> > unknown subpage, then this new saving mechanism does not work any more,
> > so disable saving new raw error info and freeing hwpoison hugepages.
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
...
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 66bb39e0fce8..ccd470f0194c 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1535,17 +1535,15 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
> >  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >  		return;
> >  
> > -	if (hugetlb_vmemmap_restore(h, page)) {
> > -		spin_lock_irq(&hugetlb_lock);
> > -		/*
> > -		 * If we cannot allocate vmemmap pages, just refuse to free the
> > -		 * page and put the page back on the hugetlb free list and treat
> > -		 * as a surplus page.
> > -		 */
> > -		add_hugetlb_page(h, page, true);
> > -		spin_unlock_irq(&hugetlb_lock);
> > -		return;
> > -	}
> > +	if (hugetlb_vmemmap_restore(h, page))
> > +		goto fail;
> > +
> > +	/*
> > +	 * Move PageHWPoison flag from head page to the raw error pages,
> > +	 * which makes any healthy subpages reusable.
> > +	 */
> > +	if (unlikely(PageHWPoison(page) && hugetlb_clear_page_hwpoison(page)))
> > +		goto fail;
> 
> IIUC, HPageVmemmapOptimized must have been cleared via hugetlb_vmemmap_restore above. So
> VM_BUG_ON_PAGE(!HPageVmemmapOptimized(page), page) in add_hugetlb_page will be triggered
> if we go to fail here. add_hugetlb_page is expected to be called when we cannot allocate
> vmemmap pages.

Thanks a lot, you're right. I shouldn't simply factor the failure path with
the goto label.  I think that it's hard to undo hugetlb_vmemmap_restore(), so
checking HPageRawHwpUnreliable() before hugetlb_vmemmap_restore(), then
try to hugetlb_clear_page_hwpoison() after it (where tail pages are available).

> 
> >  
> >  	for (i = 0; i < pages_per_huge_page(h);
> >  	     i++, subpage = mem_map_next(subpage, page, i)) {
> > @@ -1566,6 +1564,16 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
> >  	} else {
> >  		__free_pages(page, huge_page_order(h));
> >  	}
> > +	return;
> > +fail:
> > +	spin_lock_irq(&hugetlb_lock);
> > +	/*
> > +	 * If we cannot allocate vmemmap pages or cannot identify raw hwpoison
> > +	 * subpages reliably, just refuse to free the page and put the page
> > +	 * back on the hugetlb free list and treat as a surplus page.
> > +	 */
> > +	add_hugetlb_page(h, page, true);
> > +	spin_unlock_irq(&hugetlb_lock);
> >  }
> >  
> >  /*
> > @@ -2109,15 +2117,6 @@ int dissolve_free_huge_page(struct page *page)
> >  		 */
> >  		rc = hugetlb_vmemmap_restore(h, head);
> >  		if (!rc) {
> > -			/*
> > -			 * Move PageHWPoison flag from head page to the raw
> > -			 * error page, which makes any subpages rather than
> > -			 * the error page reusable.
> > -			 */
> > -			if (PageHWPoison(head) && page != head) {
> > -				SetPageHWPoison(page);
> > -				ClearPageHWPoison(head);
> > -			}
> >  			update_and_free_page(h, head, false);
> >  		} else {
> >  			spin_lock_irq(&hugetlb_lock);
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index c9931c676335..53bf7486a245 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1664,6 +1664,82 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
> >  EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
> >  #endif /* CONFIG_FS_DAX */
> >  
> > +/*
> > + * Struct raw_hwp_page represents information about "raw error page",
> > + * constructing singly linked list originated from ->private field of
> > + * SUBPAGE_INDEX_HWPOISON-th tail page.
> > + */
> > +struct raw_hwp_page {
> > +	struct llist_node node;
> > +	struct page *page;
> > +};
> > +
> > +static inline struct llist_head *raw_hwp_list_head(struct page *hpage)
> > +{
> > +	return (struct llist_head *)&page_private(hpage + SUBPAGE_INDEX_HWPOISON);
> > +}
> > +
> > +static inline int hugetlb_set_page_hwpoison(struct page *hpage,
> > +					struct page *page)
> > +{
> > +	struct llist_head *head;
> > +	struct raw_hwp_page *raw_hwp;
> > +	struct llist_node *t, *tnode;
> > +	int ret;
> > +
> > +	/*
> > +	 * Once the hwpoison hugepage has lost reliable raw error info,
> > +	 * there is little meaning to keep additional error info precisely,
> > +	 * so skip to add additional raw error info.
> > +	 */
> > +	if (HPageRawHwpUnreliable(hpage))
> > +		return -EHWPOISON;
> 
> If we return here, num_poisoned_pages can't reflect the all hwpoisoned hugepages?

No, it can't now.  Currently we try to (and fail) count only the number
hwpoison subpages with raw_hwp_info.  If we want to track all corrupted
pages (including hwpoison subpages without raw_hwp_info), maybe running the
following part first in this function could make it better.

	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
	/* the first error event will be counted in action_result(). */
	if (ret)
		num_poisoned_pages_inc();

But I like the option you suggest below.

> 
> > +	head = raw_hwp_list_head(hpage);
> > +	llist_for_each_safe(tnode, t, head->first) {
> > +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> > +
> > +		if (p->page == page)
> > +			return -EHWPOISON;
> > +	}
> > +
> > +	ret = TestSetPageHWPoison(hpage) ? -EHWPOISON : 0;
> > +	/* the first error event will be counted in action_result(). */
> > +	if (ret)
> > +		num_poisoned_pages_inc();
> > +
> > +	raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
> > +	if (raw_hwp) {
> > +		raw_hwp->page = page;
> > +		llist_add(&raw_hwp->node, head);
> 
> IMHO, we might need to do num_poisoned_pages_inc here because we decrement the
> num_poisoned_pages according to the llist length.

Yes, if we'd like to count only hwpoisoned subpages with raw_hwp_info,
doing num_poisoned_pages_inc here is fine.

> > +	} else {
> > +		/*
> > +		 * Failed to save raw error info.  We no longer trace all
> > +		 * hwpoisoned subpages, and we need refuse to free/dissolve
> > +		 * this hwpoisoned hugepage.
> > +		 */
> > +		SetHPageRawHwpUnreliable(hpage);
> > +	}
> > +	return ret;
> > +}
> > +
> > +inline int hugetlb_clear_page_hwpoison(struct page *hpage)
> 
> off-the-topic: Is "inline" needed here? I see hugetlb_clear_page_hwpoison is "extern" above.

Maybe not, this code is not performance-sensitive and actually this inline
is a leftover of updates in the previous versions. I'll remove this.

> 
> > +{
> > +	struct llist_head *head;
> > +	struct llist_node *t, *tnode;
> > +
> > +	if (!HPageRawHwpUnreliable(hpage))
> > +		ClearPageHWPoison(hpage);
> > +	head = raw_hwp_list_head(hpage);
> > +	llist_for_each_safe(tnode, t, head->first) {
> > +		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
> > +
> > +		SetPageHWPoison(p->page);
> 
> IMHO, in HPageRawHwpUnreliable(hpage) case, it's better not to do SetPageHWPoison here.
> Because hugepage won't be dissolved and thus we cannot write any data to some tail struct
> pages if HugeTLB Vmemmap Optimization is enabled. Freeing the memory here should be enough.

This is a good point, too. Current version surely does not work with HVO, so
I think of simply giving up clearing hwpoison when HPageVmemmapOptimized is
true. And I should leave some inline comment about this.

> > +		kfree(p);
> > +	}
> > +	llist_del_all(head);
> > +	return 0;
> > +}
> 
> I thought num_poisoned_pages_dec is missing and return value is unneeded. But this is changed
> in next patch. So it should be fine here.

OK.

> 
> > +
> >  /*
> >   * Called from hugetlb code with hugetlb_lock held.
> >   *
> > @@ -1698,7 +1774,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> >  		goto out;
> >  	}
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > +	if (hugetlb_set_page_hwpoison(head, page)) {
> >  		ret = -EHWPOISON;
> >  		goto out;
> >  	}
> > @@ -1751,7 +1827,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >  	lock_page(head);
> >  
> >  	if (hwpoison_filter(p)) {
> > -		ClearPageHWPoison(head);
> > +		hugetlb_clear_page_hwpoison(head);
> >  		res = -EOPNOTSUPP;
> >  		goto out;
> >  	}
> > 
> 
> Many thanks for your hard work. :)

Thanks for the detailed review and feedback.

- Naoya Horiguchi

  reply	other threads:[~2022-07-06 23:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-04  1:33 [mm-unstable PATCH v4 0/9] mm, hwpoison: enable 1GB hugepage support (v4) Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 1/9] mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() Naoya Horiguchi
2022-07-05  2:16   ` Miaohe Lin
2022-07-05  6:39     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-06  3:04       ` Miaohe Lin
2022-07-06  3:22         ` Mike Kravetz
2022-07-07  2:59           ` Miaohe Lin
2022-07-06 21:51   ` Mike Kravetz
2022-07-07  0:56     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 2/9] mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() Naoya Horiguchi
2022-07-04  1:42   ` Andrew Morton
2022-07-04  2:04     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-04  1:33 ` [mm-unstable PATCH v4 3/9] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry Naoya Horiguchi
2022-07-05  2:46   ` Miaohe Lin
2022-07-05  9:04     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-06  3:07       ` Miaohe Lin
2022-07-06 22:21   ` Mike Kravetz
2022-07-04  1:33 ` [mm-unstable PATCH v4 4/9] mm, hwpoison, hugetlb: support saving mechanism of raw error pages Naoya Horiguchi
2022-07-06  2:37   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口 直也) [this message]
2022-07-07  3:22       ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 5/9] mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Naoya Horiguchi
2022-07-06  2:58   ` Miaohe Lin
2022-07-06 23:06     ` HORIGUCHI NAOYA(堀口 直也)
2022-07-07  1:35       ` HORIGUCHI NAOYA(堀口 直也)
2022-07-07  3:08         ` Miaohe Lin
2022-07-04  1:33 ` [mm-unstable PATCH v4 6/9] mm, hwpoison: set PG_hwpoison for busy hugetlb pages Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 7/9] mm, hwpoison: make __page_handle_poison returns int Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 8/9] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Naoya Horiguchi
2022-07-04  1:33 ` [mm-unstable PATCH v4 9/9] mm, hwpoison: enable memory error handling on " Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706230615.GA2560241@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.