linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Michal Hocko <mhocko@suse.com>, Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page
Date: Tue, 11 Aug 2020 14:43:28 -0700	[thread overview]
Message-ID: <eb9d1e13-7455-0c4e-1f94-0c859c36c0bb@oracle.com> (raw)
In-Reply-To: <20200811065406.GC4793@dhcp22.suse.cz>

On 8/10/20 11:54 PM, Michal Hocko wrote:
> 
> I have managed to forgot all the juicy details since I have made that
> change. All that remains is that the surplus pages accounting was quite
> tricky and back then I didn't figure out a simpler method that would
> achieve the consistent look at those counters. As mentioned above I
> suspect this could lead to pre-mature allocation failures while the
> migration is ongoing.

It is likely lost in the e-mail thread, but the suggested change was to
alloc_surplus_huge_page().  The code which allocates the migration target
(alloc_migrate_huge_page) will not be changed.  So, this should not be
an issue.

>                       Sure quite unlikely to happen and the race window
> is likely very small. Maybe this is even acceptable but I would strongly
> recommend to have all this thinking documented in the changelog.

I wrote down a description of what happens in the two different approaches
"temporary page" vs "surplus page".  It is at the very end of this e-mail.
When looking at the details, I came up with what may be an even better
approach.  Why not just call the low level routine to free the page instead
of going through put_page/free_huge_page?  At the very least, it saves a
lock roundtrip and there is no need to worry about the counters/accounting.

Here is a patch to do that.  However, we are optimizing a return path in
a race condition that we are unlikely to ever hit.  I 'tested' it by allocating
an 'extra' page and freeing it via this method in alloc_surplus_huge_page.

From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Tue, 11 Aug 2020 12:45:41 -0700
Subject: [PATCH] hugetlb: optimize race error return in
 alloc_surplus_huge_page

The routine alloc_surplus_huge_page() could race with with a pool
size change.  If this happens, the allocated page may not be needed.
To free the page, the current code will 'Abuse temporary page to
workaround the nasty free_huge_page codeflow'.  Instead, directly
call the low level routine that free_huge_page uses.  This works
out well because the page is new, we hold the only reference and
already hold the hugetlb_lock.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 590111ea6975..ac89b91fba86 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1923,14 +1923,17 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
 	/*
 	 * We could have raced with the pool size change.
 	 * Double check that and simply deallocate the new page
-	 * if we would end up overcommiting the surpluses. Abuse
-	 * temporary page to workaround the nasty free_huge_page
-	 * codeflow
+	 * if we would end up overcommiting the surpluses.
 	 */
 	if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
-		SetPageHugeTemporary(page);
+		/*
+		 * Since this page is new, we hold the only reference, and
+		 * we already hold the hugetlb_lock call the low level free
+		 * page routine.  This saves at least a lock roundtrip.
+		 */
+		(void)put_page_testzero(page); /* don't call destructor */
+		update_and_free_page(h, page);
 		spin_unlock(&hugetlb_lock);
-		put_page(page);
 		return NULL;
 	} else {
 		h->surplus_huge_pages++;
-- 
2.25.4


Here is a description of the difference in "Temporary Page" vs "Surplus
Page" approach.

Both only allocate a fresh huge page if surplus_huge_pages is less than
nr_overcommit_huge_pages.  Of course, the lock protecting those counts
must be dropped to perform the allocation.  After reacquiring the lock
is where we have the proposed difference in behavior.

temporary page behavior
-----------------------
if surplus_huge_pages >= h->nr_overcommit_huge_pages
	SetPageHugeTemporary(page)
	spin_unlock(&hugetlb_lock);
	put_page(page);

At this time we know surplus_huge_pages is 'at least' nr_overcommit_huge_pages.
As a result, any new allocation will fail.
Only user visible result is that number of huge pages will be one greater than
that specified by user and overcommit values.  This is only visible for the
short time until the page is actully freed as a result of put_page().

free_huge_page()
	number of huge pages will be decremented

suprlus page behavior
---------------------
surplus_huge_pages++
surplus_huge_pages_node[page_to_nid(page)]++
if surplus_huge_pages > nr_overcommit_huge_pages
	spin_unlock(&hugetlb_lock);
	put_page(page);

At this time we know surplus_huge_pages is greater than
nr_overcommit_huge_pages.  As a result, any new allocation will fail.
User visible result is an increase in surplus pages as well as number of
huge pages.  In addition, surplus pages will exceed overcommit.  This is
only visible for the short time until the page is actully freed as a
result of put_page().

free_huge_page()
	number of huge pages will be decremented
	h->surplus_huge_pages--;
	h->surplus_huge_pages_node[nid]--;



  reply	other threads:[~2020-08-11 21:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07  9:12 [PATCH 00/10] mm/hugetlb: code refine and simplification Wei Yang
2020-08-07  9:12 ` [PATCH 01/10] mm/hugetlb: not necessary to coalesce regions recursively Wei Yang
2020-08-07 12:47   ` Baoquan He
2020-08-10 20:22   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 02/10] mm/hugetlb: make sure to get NULL when list is empty Wei Yang
2020-08-07 12:49   ` Baoquan He
2020-08-07 14:28     ` Wei Yang
2020-08-10  0:57       ` Baoquan He
2020-08-10 20:28       ` Mike Kravetz
2020-08-10 23:05         ` Wei Yang
2020-08-07  9:12 ` [PATCH 03/10] mm/hugetlb: use list_splice to merge two list at once Wei Yang
2020-08-07 12:53   ` Baoquan He
2020-08-10 21:07   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 04/10] mm/hugetlb: count file_region to be added when regions_needed != NULL Wei Yang
2020-08-07 12:54   ` Baoquan He
2020-08-10 21:46   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 05/10] mm/hugetlb: remove the redundant check on non_swap_entry() Wei Yang
2020-08-07 12:55   ` Baoquan He
2020-08-07 14:28     ` Wei Yang
2020-08-07  9:12 ` [PATCH 06/10] mm/hugetlb: remove redundant huge_pte_alloc() in hugetlb_fault() Wei Yang
2020-08-07 12:59   ` Baoquan He
2020-08-10 22:00   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 07/10] mm/hugetlb: a page from buddy is not on any list Wei Yang
2020-08-07 13:06   ` Baoquan He
2020-08-10 22:25   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 08/10] mm/hugetlb: return non-isolated page in the loop instead of break and check Wei Yang
2020-08-07 13:09   ` Baoquan He
2020-08-07 14:32     ` Wei Yang
2020-08-10 22:55   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 09/10] mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page Wei Yang
2020-08-07 13:12   ` Baoquan He
2020-08-10 23:02   ` Mike Kravetz
2020-08-07  9:12 ` [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page to workaround the nasty free_huge_page Wei Yang
2020-08-10  2:17   ` Baoquan He
2020-08-11  0:19     ` Mike Kravetz
2020-08-11  1:51       ` Baoquan He
2020-08-11  6:54         ` Michal Hocko
2020-08-11 21:43           ` Mike Kravetz [this message]
2020-08-11 23:19             ` Wei Yang
2020-08-11 23:25               ` Mike Kravetz
2020-08-12  5:40             ` Baoquan He
2020-08-13 11:46             ` Michal Hocko
2020-08-17  3:04               ` Wei Yang
2020-08-11 23:55           ` Baoquan He
2020-08-07 22:25 ` [PATCH 00/10] mm/hugetlb: code refine and simplification Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb9d1e13-7455-0c4e-1f94-0c859c36c0bb@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).