All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <guro@fb.com>, Michal Hocko <mhocko@suse.com>,
	Shakeel Butt <shakeelb@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Rientjes <rientjes@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	HORIGUCHI NAOYA <naoya.horiguchi@nec.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Waiman Long <longman@redhat.com>, Peter Xu <peterx@redhat.com>,
	Mina Almasry <almasrymina@google.com>,
	Hillf Danton <hdanton@sina.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Will Deacon <will@kernel.org>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH v5 3/8] hugetlb: add per-hstate mutex to synchronize user adjustments
Date: Fri,  9 Apr 2021 13:52:49 -0700	[thread overview]
Message-ID: <20210409205254.242291-4-mike.kravetz@oracle.com> (raw)
In-Reply-To: <20210409205254.242291-1-mike.kravetz@oracle.com>

The helper routine hstate_next_node_to_alloc accesses and modifies the
hstate variable next_nid_to_alloc.  The helper is used by the routines
alloc_pool_huge_page and adjust_pool_surplus.  adjust_pool_surplus is
called with hugetlb_lock held.  However, alloc_pool_huge_page can not
be called with the hugetlb lock held as it will call the page allocator.
Two instances of alloc_pool_huge_page could be run in parallel or
alloc_pool_huge_page could run in parallel with adjust_pool_surplus
which may result in the variable next_nid_to_alloc becoming invalid
for the caller and pages being allocated on the wrong node.

Both alloc_pool_huge_page and adjust_pool_surplus are only called from
the routine set_max_huge_pages after boot.  set_max_huge_pages is only
called as the reusult of a user writing to the proc/sysfs nr_hugepages,
or nr_hugepages_mempolicy file to adjust the number of hugetlb pages.

It makes little sense to allow multiple adjustment to the number of
hugetlb pages in parallel.  Add a mutex to the hstate and use it to only
allow one hugetlb page adjustment at a time.  This will synchronize
modifications to the next_nid_to_alloc variable.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 include/linux/hugetlb.h | 1 +
 mm/hugetlb.c            | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a7f7d5f328dc..09f1fd12a6fa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -566,6 +566,7 @@ HPAGEFLAG(Freed, freed)
 #define HSTATE_NAME_LEN 32
 /* Defines one hugetlb page size */
 struct hstate {
+	struct mutex resize_lock;
 	int next_nid_to_alloc;
 	int next_nid_to_free;
 	unsigned int order;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3a10b96a2124..c8799a480784 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2615,6 +2615,11 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	else
 		return -ENOMEM;
 
+	/*
+	 * resize_lock mutex prevents concurrent adjustments to number of
+	 * pages in hstate via the proc/sysfs interfaces.
+	 */
+	mutex_lock(&h->resize_lock);
 	spin_lock(&hugetlb_lock);
 
 	/*
@@ -2647,6 +2652,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
 		if (count > persistent_huge_pages(h)) {
 			spin_unlock(&hugetlb_lock);
+			mutex_unlock(&h->resize_lock);
 			NODEMASK_FREE(node_alloc_noretry);
 			return -EINVAL;
 		}
@@ -2721,6 +2727,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 out:
 	h->max_huge_pages = persistent_huge_pages(h);
 	spin_unlock(&hugetlb_lock);
+	mutex_unlock(&h->resize_lock);
 
 	NODEMASK_FREE(node_alloc_noretry);
 
@@ -3208,6 +3215,7 @@ void __init hugetlb_add_hstate(unsigned int order)
 	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
 	BUG_ON(order == 0);
 	h = &hstates[hugetlb_max_hstate++];
+	mutex_init(&h->resize_lock);
 	h->order = order;
 	h->mask = ~(huge_page_size(h) - 1);
 	for (i = 0; i < MAX_NUMNODES; ++i)
-- 
2.30.2


  parent reply	other threads:[~2021-04-09 20:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 20:52 [PATCH v5 0/8] make hugetlb put_page safe for all calling contexts Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 1/8] mm/cma: change cma mutex to irq safe spinlock Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 2/8] hugetlb: no need to drop hugetlb_lock to call cma_release Mike Kravetz
2021-04-09 20:52 ` Mike Kravetz [this message]
2021-04-09 20:52 ` [PATCH v5 4/8] hugetlb: create remove_hugetlb_page() to separate functionality Mike Kravetz
2021-04-12  7:33   ` Oscar Salvador
2021-04-12 18:03     ` Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 5/8] hugetlb: call update_and_free_page without hugetlb_lock Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 6/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 7/8] hugetlb: make free_huge_page irq safe Mike Kravetz
2021-04-09 20:52 ` [PATCH v5 8/8] hugetlb: add lockdep_assert_held() calls for hugetlb_lock Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210409205254.242291-4-mike.kravetz@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=guro@fb.com \
    --cc=hdanton@sina.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.