[RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
@ 2023-11-23 13:30 Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 1/4] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:30 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Andrew Morton; +Cc: linux-mm, linux-kernel, Gang Li

From: Gang Li <ligang.bdlg@bytedance.com>

Inspired by these patches [1][2], this series aims to speed up the
initialization of hugetlb during the boot process through
parallelization.

It is particularly effective in large systems. On a machine equipped
with 1TB of memory and two NUMA nodes, the time for hugetlb
initialization was reduced from 2 seconds to 1 second.

In the future, as memory continues to grow, more and more time can
be saved.

This series currently focuses on optimizing 2MB hugetlb. Since
gigantic pages are few in number, their optimization effects
are not as pronounced. We may explore optimizations for
gigantic pages in the future.

Thanks,
Gang Li

Gang Li (4):
  hugetlb: code clean for hugetlb_hstate_alloc_pages
  hugetlb: split hugetlb_hstate_alloc_pages
  hugetlb: add timing to hugetlb allocations on boot
  hugetlb: parallelize hugetlb page allocation

 mm/hugetlb.c | 191 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 134 insertions(+), 57 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 1/4] hugetlb: code clean for hugetlb_hstate_alloc_pages
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
@ 2023-11-23 13:30 ` Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 2/4] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:30 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Andrew Morton; +Cc: linux-mm, linux-kernel, Gang Li

From: Gang Li <ligang.bdlg@bytedance.com>

This patch focus on cleaning up the code related to per node allocation
and error reporting in the hugetlb alloc:

- hugetlb_hstate_alloc_pages_node_specific() to handle iterates through
  each online node and performs allocation if necessary.
- hugetlb_hstate_alloc_pages_report() report error during allocation.
  And the value of h->max_huge_pages is updated accordingly.

This patch has no functional changes.

Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
---
 mm/hugetlb.c | 46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c466551e2fd9..7af2ee08ad1b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3482,6 +3482,33 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
 	h->max_huge_pages_node[nid] = i;
 }
 
+static bool __init hugetlb_hstate_alloc_pages_node_specific(struct hstate *h)
+{
+	int i;
+	bool node_specific_alloc = false;
+
+	for_each_online_node(i) {
+		if (h->max_huge_pages_node[i] > 0) {
+			hugetlb_hstate_alloc_pages_onenode(h, i);
+			node_specific_alloc = true;
+		}
+	}
+
+	return node_specific_alloc;
+}
+
+static void __init hugetlb_hstate_alloc_pages_report(unsigned long allocated, struct hstate *h)
+{
+	if (allocated < h->max_huge_pages) {
+		char buf[32];
+
+		string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+		pr_warn("HugeTLB: allocating %lu of page size %s failed.  Only allocated %lu hugepages.\n",
+			h->max_huge_pages, buf, allocated);
+		h->max_huge_pages = allocated;
+	}
+}
+
 /*
  * NOTE: this routine is called in different contexts for gigantic and
  * non-gigantic pages.
@@ -3499,7 +3526,6 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 	struct folio *folio;
 	LIST_HEAD(folio_list);
 	nodemask_t *node_alloc_noretry;
-	bool node_specific_alloc = false;
 
 	/* skip gigantic hugepages allocation if hugetlb_cma enabled */
 	if (hstate_is_gigantic(h) && hugetlb_cma_size) {
@@ -3508,14 +3534,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 	}
 
 	/* do node specific alloc */
-	for_each_online_node(i) {
-		if (h->max_huge_pages_node[i] > 0) {
-			hugetlb_hstate_alloc_pages_onenode(h, i);
-			node_specific_alloc = true;
-		}
-	}
-
-	if (node_specific_alloc)
+	if (hugetlb_hstate_alloc_pages_node_specific(h))
 		return;
 
 	/* below will do all node balanced alloc */
@@ -3558,14 +3577,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 	/* list will be empty if hstate_is_gigantic */
 	prep_and_add_allocated_folios(h, &folio_list);
 
-	if (i < h->max_huge_pages) {
-		char buf[32];
-
-		string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
-		pr_warn("HugeTLB: allocating %lu of page size %s failed.  Only allocated %lu hugepages.\n",
-			h->max_huge_pages, buf, i);
-		h->max_huge_pages = i;
-	}
+	hugetlb_hstate_alloc_pages_report(i, h);
 	kfree(node_alloc_noretry);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 2/4] hugetlb: split hugetlb_hstate_alloc_pages
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 1/4] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
@ 2023-11-23 13:30 ` Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 3/4] hugetlb: add timing to hugetlb allocations on boot Gang Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:30 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Andrew Morton; +Cc: linux-mm, linux-kernel, Gang Li

From: Gang Li <ligang.bdlg@bytedance.com>

Split hugetlb_hstate_alloc_pages into gigantic and non-gigantic.

These patch has no functional changes.

Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
---
 mm/hugetlb.c | 86 +++++++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 41 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7af2ee08ad1b..7f9ff0855dd0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3509,6 +3509,47 @@ static void __init hugetlb_hstate_alloc_pages_report(unsigned long allocated, st
 	}
 }
 
+static unsigned long __init hugetlb_hstate_alloc_pages_gigantic(struct hstate *h)
+{
+	unsigned long i;
+
+	for (i = 0; i < h->max_huge_pages; ++i) {
+		/*
+		 * gigantic pages not added to list as they are not
+		 * added to pools now.
+		 */
+		if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE))
+			break;
+		cond_resched();
+	}
+
+	return i;
+}
+
+static unsigned long __init hugetlb_hstate_alloc_pages_non_gigantic(struct hstate *h)
+{
+	unsigned long i;
+	struct folio *folio;
+	LIST_HEAD(folio_list);
+	nodemask_t node_alloc_noretry;
+
+	/* Bit mask controlling how hard we retry per-node allocations.*/
+	nodes_clear(node_alloc_noretry);
+
+	for (i = 0; i < h->max_huge_pages; ++i) {
+		folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY],
+						&node_alloc_noretry);
+		if (!folio)
+			break;
+		list_add(&folio->lru, &folio_list);
+		cond_resched();
+	}
+
+	prep_and_add_allocated_folios(h, &folio_list);
+
+	return i;
+}
+
 /*
  * NOTE: this routine is called in different contexts for gigantic and
  * non-gigantic pages.
@@ -3522,10 +3563,7 @@ static void __init hugetlb_hstate_alloc_pages_report(unsigned long allocated, st
  */
 static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 {
-	unsigned long i;
-	struct folio *folio;
-	LIST_HEAD(folio_list);
-	nodemask_t *node_alloc_noretry;
+	unsigned long allocated;
 
 	/* skip gigantic hugepages allocation if hugetlb_cma enabled */
 	if (hstate_is_gigantic(h) && hugetlb_cma_size) {
@@ -3539,46 +3577,12 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 
 	/* below will do all node balanced alloc */
 	if (!hstate_is_gigantic(h)) {
-		/*
-		 * Bit mask controlling how hard we retry per-node allocations.
-		 * Ignore errors as lower level routines can deal with
-		 * node_alloc_noretry == NULL.  If this kmalloc fails at boot
-		 * time, we are likely in bigger trouble.
-		 */
-		node_alloc_noretry = kmalloc(sizeof(*node_alloc_noretry),
-						GFP_KERNEL);
+		allocated = hugetlb_hstate_alloc_pages_non_gigantic(h);
 	} else {
-		/* allocations done at boot time */
-		node_alloc_noretry = NULL;
-	}
-
-	/* bit mask controlling how hard we retry per-node allocations */
-	if (node_alloc_noretry)
-		nodes_clear(*node_alloc_noretry);
-
-	for (i = 0; i < h->max_huge_pages; ++i) {
-		if (hstate_is_gigantic(h)) {
-			/*
-			 * gigantic pages not added to list as they are not
-			 * added to pools now.
-			 */
-			if (!alloc_bootmem_huge_page(h, NUMA_NO_NODE))
-				break;
-		} else {
-			folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY],
-							node_alloc_noretry);
-			if (!folio)
-				break;
-			list_add(&folio->lru, &folio_list);
-		}
-		cond_resched();
+		allocated = hugetlb_hstate_alloc_pages_gigantic(h);
 	}
 
-	/* list will be empty if hstate_is_gigantic */
-	prep_and_add_allocated_folios(h, &folio_list);
-
-	hugetlb_hstate_alloc_pages_report(i, h);
-	kfree(node_alloc_noretry);
+	hugetlb_hstate_alloc_pages_report(allocated, h);
 }
 
 static void __init hugetlb_init_hstates(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 3/4] hugetlb: add timing to hugetlb allocations on boot
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 1/4] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 2/4] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
@ 2023-11-23 13:30 ` Gang Li
  2023-11-23 13:30 ` [RFC PATCH v1 4/4] hugetlb: parallelize hugetlb page allocation Gang Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:30 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Andrew Morton; +Cc: linux-mm, linux-kernel, Gang Li

From: Gang Li <ligang.bdlg@bytedance.com>

Add timing to hugetlb allocations for further optimization.

Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
---
 mm/hugetlb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7f9ff0855dd0..ac8558724cc2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3563,7 +3563,7 @@ static unsigned long __init hugetlb_hstate_alloc_pages_non_gigantic(struct hstat
  */
 static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 {
-	unsigned long allocated;
+	unsigned long allocated, start;
 
 	/* skip gigantic hugepages allocation if hugetlb_cma enabled */
 	if (hstate_is_gigantic(h) && hugetlb_cma_size) {
@@ -3576,11 +3576,13 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 		return;
 
 	/* below will do all node balanced alloc */
+	start = jiffies;
 	if (!hstate_is_gigantic(h)) {
 		allocated = hugetlb_hstate_alloc_pages_non_gigantic(h);
 	} else {
 		allocated = hugetlb_hstate_alloc_pages_gigantic(h);
 	}
+	pr_info("HugeTLB: Allocation takes %u ms\n", jiffies_to_msecs(jiffies - start));
 
 	hugetlb_hstate_alloc_pages_report(allocated, h);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v1 4/4] hugetlb: parallelize hugetlb page allocation
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
                   ` (2 preceding siblings ...)
  2023-11-23 13:30 ` [RFC PATCH v1 3/4] hugetlb: add timing to hugetlb allocations on boot Gang Li
@ 2023-11-23 13:30 ` Gang Li
  2023-11-23 13:58 ` [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
  2023-11-23 14:10 ` David Hildenbrand
  5 siblings, 0 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:30 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song, Andrew Morton; +Cc: linux-mm, linux-kernel, Gang Li

From: Gang Li <ligang.bdlg@bytedance.com>

By distributing the allocation across threads, large hugetlb
configurations can allocate pages faster, improving boot speed.

Signed-off-by: Gang Li <ligang.bdlg@bytedance.com>
---
 mm/hugetlb.c | 89 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 74 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ac8558724cc2..df3fbe95989e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3509,6 +3509,55 @@ static void __init hugetlb_hstate_alloc_pages_report(unsigned long allocated, st
 	}
 }
 
+struct hugetlb_work {
+	struct work_struct work;
+	struct hstate *h;
+	int num;
+	int nid;
+};
+
+static atomic_t hugetlb_hstate_alloc_n_undone __initdata;
+static __initdata DECLARE_COMPLETION(hugetlb_hstate_alloc_comp);
+
+static void __init hugetlb_alloc_node(struct work_struct *w)
+{
+	struct hugetlb_work *hw = container_of(w, struct hugetlb_work, work);
+	struct hstate *h = hw->h;
+	int i, num = hw->num;
+	nodemask_t node_alloc_noretry;
+	unsigned long flags;
+
+	/* Bit mask controlling how hard we retry per-node allocations.*/
+	nodes_clear(node_alloc_noretry);
+
+	for (i = 0; i < num; ++i) {
+		struct folio *folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY],
+						&node_alloc_noretry);
+		if (!folio)
+			break;
+		spin_lock_irqsave(&hugetlb_lock, flags);
+		__prep_account_new_huge_page(h, folio_nid(folio));
+		enqueue_hugetlb_folio(h, folio);
+		spin_unlock_irqrestore(&hugetlb_lock, flags);
+		cond_resched();
+	}
+
+	if (atomic_dec_and_test(&hugetlb_hstate_alloc_n_undone))
+		complete(&hugetlb_hstate_alloc_comp);
+}
+
+static void __init hugetlb_vmemmap_optimize_node(struct work_struct *w)
+{
+	struct hugetlb_work *hw = container_of(w, struct hugetlb_work, work);
+	struct hstate *h = hw->h;
+	int nid = hw->nid;
+
+	hugetlb_vmemmap_optimize_folios(h, &h->hugepage_freelists[nid]);
+
+	if (atomic_dec_and_test(&hugetlb_hstate_alloc_n_undone))
+		complete(&hugetlb_hstate_alloc_comp);
+}
+
 static unsigned long __init hugetlb_hstate_alloc_pages_gigantic(struct hstate *h)
 {
 	unsigned long i;
@@ -3528,26 +3577,36 @@ static unsigned long __init hugetlb_hstate_alloc_pages_gigantic(struct hstate *h
 
 static unsigned long __init hugetlb_hstate_alloc_pages_non_gigantic(struct hstate *h)
 {
-	unsigned long i;
-	struct folio *folio;
-	LIST_HEAD(folio_list);
-	nodemask_t node_alloc_noretry;
+	int nid;
+	struct hugetlb_work *works;
 
-	/* Bit mask controlling how hard we retry per-node allocations.*/
-	nodes_clear(node_alloc_noretry);
+	works = kcalloc(num_node_state(N_MEMORY), sizeof(*works), GFP_KERNEL);
+	if (works == NULL) {
+		pr_warn("HugeTLB: allocating struct hugetlb_work failed.\n");
+		return 0;
+	}
 
-	for (i = 0; i < h->max_huge_pages; ++i) {
-		folio = alloc_pool_huge_folio(h, &node_states[N_MEMORY],
-						&node_alloc_noretry);
-		if (!folio)
-			break;
-		list_add(&folio->lru, &folio_list);
-		cond_resched();
+	atomic_set(&hugetlb_hstate_alloc_n_undone, num_node_state(N_MEMORY));
+	for_each_node_state(nid, N_MEMORY) {
+		works[nid].h = h;
+		works[nid].num = h->max_huge_pages/num_node_state(N_MEMORY);
+		if (nid == 0)
+			works[nid].num += h->max_huge_pages % num_node_state(N_MEMORY);
+		INIT_WORK(&works[nid].work, hugetlb_alloc_node);
+		queue_work_node(nid, system_unbound_wq, &works[nid].work);
 	}
+	wait_for_completion(&hugetlb_hstate_alloc_comp);
 
-	prep_and_add_allocated_folios(h, &folio_list);
+	atomic_set(&hugetlb_hstate_alloc_n_undone, num_node_state(N_MEMORY));
+	for_each_node_state(nid, N_MEMORY) {
+		works[nid].nid = nid;
+		INIT_WORK(&works[nid].work, hugetlb_vmemmap_optimize_node);
+		queue_work_node(nid, system_unbound_wq, &works[nid].work);
+	}
+	wait_for_completion(&hugetlb_hstate_alloc_comp);
 
-	return i;
+	kfree(works);
+	return h->nr_huge_pages;
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
                   ` (3 preceding siblings ...)
  2023-11-23 13:30 ` [RFC PATCH v1 4/4] hugetlb: parallelize hugetlb page allocation Gang Li
@ 2023-11-23 13:58 ` Gang Li
  2023-11-23 14:10 ` David Hildenbrand
  5 siblings, 0 replies; 14+ messages in thread
From: Gang Li @ 2023-11-23 13:58 UTC (permalink / raw)
  To: Gang Li; +Cc: Mike Kravetz, Muchun Song, Andrew Morton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 318 bytes --]

> Inspired by these patches [1][2], this series aims to speed up the [1]
https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/
[2]
https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/
> initialization of hugetlb during the boot process through >
parallelization. >

[-- Attachment #2: Type: text/html, Size: 654 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
                   ` (4 preceding siblings ...)
  2023-11-23 13:58 ` [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
@ 2023-11-23 14:10 ` David Hildenbrand
  2023-11-24 19:44   ` David Rientjes
  5 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2023-11-23 14:10 UTC (permalink / raw)
  To: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton
  Cc: linux-mm, linux-kernel, Gang Li

On 23.11.23 14:30, Gang Li wrote:
> From: Gang Li <ligang.bdlg@bytedance.com>
> 
> Inspired by these patches [1][2], this series aims to speed up the
> initialization of hugetlb during the boot process through
> parallelization.
> 
> It is particularly effective in large systems. On a machine equipped
> with 1TB of memory and two NUMA nodes, the time for hugetlb
> initialization was reduced from 2 seconds to 1 second.

Sorry to say, but why is that a scenario worth adding complexity for / 
optimizing for? You don't cover that, so there is a clear lack in the 
motivation.

2 vs. 1 second on a 1 TiB system is usually really just noise.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-23 14:10 ` David Hildenbrand
@ 2023-11-24 19:44   ` David Rientjes
  2023-11-24 19:47     ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2023-11-24 19:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel, Gang Li

On Thu, 23 Nov 2023, David Hildenbrand wrote:

> On 23.11.23 14:30, Gang Li wrote:
> > From: Gang Li <ligang.bdlg@bytedance.com>
> > 
> > Inspired by these patches [1][2], this series aims to speed up the
> > initialization of hugetlb during the boot process through
> > parallelization.
> > 
> > It is particularly effective in large systems. On a machine equipped
> > with 1TB of memory and two NUMA nodes, the time for hugetlb
> > initialization was reduced from 2 seconds to 1 second.
> 
> Sorry to say, but why is that a scenario worth adding complexity for /
> optimizing for? You don't cover that, so there is a clear lack in the
> motivation.
> 
> 2 vs. 1 second on a 1 TiB system is usually really just noise.
> 

The cost will continue to grow over time, so I presume that Gang is trying 
to get out in front of the issue even though it may not be a large savings 
today.

Running single boot tests, with the latest upstream kernel, allocating 
1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s.

But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s 
today with the current implementation.

So it's likely something worth optimizing.

Gang, I'm curious about this in the cover letter:

"""
This series currently focuses on optimizing 2MB hugetlb. Since
gigantic pages are few in number, their optimization effects
are not as pronounced. We may explore optimizations for
gigantic pages in the future.
"""

For >1TB hosts, why the emphasis on 2MB hugetlb? :)  I would have expected 
1GB pages.  Are you really allocating ~500k 2MB hugetlb pages?

So if the patchset optimizes for the more likely scenario on these large 
hosts, which would be 1GB pages, that would be great.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-24 19:44   ` David Rientjes
@ 2023-11-24 19:47     ` David Hildenbrand
  2023-11-24 20:00       ` David Rientjes
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2023-11-24 19:47 UTC (permalink / raw)
  To: David Rientjes
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel, Gang Li

On 24.11.23 20:44, David Rientjes wrote:
> On Thu, 23 Nov 2023, David Hildenbrand wrote:
> 
>> On 23.11.23 14:30, Gang Li wrote:
>>> From: Gang Li <ligang.bdlg@bytedance.com>
>>>
>>> Inspired by these patches [1][2], this series aims to speed up the
>>> initialization of hugetlb during the boot process through
>>> parallelization.
>>>
>>> It is particularly effective in large systems. On a machine equipped
>>> with 1TB of memory and two NUMA nodes, the time for hugetlb
>>> initialization was reduced from 2 seconds to 1 second.
>>
>> Sorry to say, but why is that a scenario worth adding complexity for /
>> optimizing for? You don't cover that, so there is a clear lack in the
>> motivation.
>>
>> 2 vs. 1 second on a 1 TiB system is usually really just noise.
>>
> 
> The cost will continue to grow over time, so I presume that Gang is trying
> to get out in front of the issue even though it may not be a large savings
> today.
> 
> Running single boot tests, with the latest upstream kernel, allocating
> 1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s.
> 
> But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s
> today with the current implementation.

And there, the 65.2s won't be noise because that 12TB system is up by a 
snap of a finger? :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-24 19:47     ` David Hildenbrand
@ 2023-11-24 20:00       ` David Rientjes
  2023-11-28  3:18         ` Gang Li
  0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2023-11-24 20:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel, Gang Li

On Fri, 24 Nov 2023, David Hildenbrand wrote:

> On 24.11.23 20:44, David Rientjes wrote:
> > On Thu, 23 Nov 2023, David Hildenbrand wrote:
> > 
> > > On 23.11.23 14:30, Gang Li wrote:
> > > > From: Gang Li <ligang.bdlg@bytedance.com>
> > > > 
> > > > Inspired by these patches [1][2], this series aims to speed up the
> > > > initialization of hugetlb during the boot process through
> > > > parallelization.
> > > > 
> > > > It is particularly effective in large systems. On a machine equipped
> > > > with 1TB of memory and two NUMA nodes, the time for hugetlb
> > > > initialization was reduced from 2 seconds to 1 second.
> > > 
> > > Sorry to say, but why is that a scenario worth adding complexity for /
> > > optimizing for? You don't cover that, so there is a clear lack in the
> > > motivation.
> > > 
> > > 2 vs. 1 second on a 1 TiB system is usually really just noise.
> > > 
> > 
> > The cost will continue to grow over time, so I presume that Gang is trying
> > to get out in front of the issue even though it may not be a large savings
> > today.
> > 
> > Running single boot tests, with the latest upstream kernel, allocating
> > 1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s.
> > 
> > But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s
> > today with the current implementation.
> 
> And there, the 65.2s won't be noise because that 12TB system is up by a snap
> of a finger? :)
> 

In this single boot test, total boot time was 373.78s, so 1GB hugetlb
allocation is 17.4% of that.

Would love to see what the numbers would look like if 1GB pages were
supported.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-24 20:00       ` David Rientjes
@ 2023-11-28  3:18         ` Gang Li
  2023-11-28  6:52           ` Gang Li
  2023-11-29 19:41           ` David Rientjes
  0 siblings, 2 replies; 14+ messages in thread
From: Gang Li @ 2023-11-28  3:18 UTC (permalink / raw)
  To: David Rientjes, David Hildenbrand
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel


On 2023/11/25 04:00, David Rientjes wrote:
> On Fri, 24 Nov 2023, David Hildenbrand wrote:
> 
>> And there, the 65.2s won't be noise because that 12TB system is up by a snap
>> of a finger? :)
>>
> 
> In this single boot test, total boot time was 373.78s, so 1GB hugetlb
> allocation is 17.4% of that.

Thank you for sharing these data. Currently, I don't have access to a 
machine of such large capacity, so the benefits in my tests are not as 
pronounced.

I believe testing on a system of this scale would yield significant 
benefits.

> 
> Would love to see what the numbers would look like if 1GB pages were
> supported.
> 

Support for 1GB hugetlb is not yet perfect, so it wasn't included in v1. 
But I'm happy to refine and introduce 1GB hugetlb support in future 
versions.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-28  3:18         ` Gang Li
@ 2023-11-28  6:52           ` Gang Li
  2023-11-28  8:09             ` David Hildenbrand
  2023-11-29 19:41           ` David Rientjes
  1 sibling, 1 reply; 14+ messages in thread
From: Gang Li @ 2023-11-28  6:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel

Hi David Hildenbrand :),

On 2023/11/23 22:10, David Hildenbrand wrote:
> Sorry to say, but why is that a scenario worth adding complexity for /
> optimizing for? You don't cover that, so there is a clear lack in the
> motivation.

Regarding your concern about complexity, this is indeed something to
consider. There is a precedent of parallelization in pgdata[1] which
might be reused (or other methods) to reduce the complexity of this
series.

[1] 
https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-28  6:52           ` Gang Li
@ 2023-11-28  8:09             ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2023-11-28  8:09 UTC (permalink / raw)
  To: Gang Li
  Cc: Gang Li, Mike Kravetz, Muchun Song, Andrew Morton, linux-mm,
	linux-kernel

On 28.11.23 07:52, Gang Li wrote:
> Hi David Hildenbrand :),
> 
> On 2023/11/23 22:10, David Hildenbrand wrote:
>> Sorry to say, but why is that a scenario worth adding complexity for /
>> optimizing for? You don't cover that, so there is a clear lack in the
>> motivation.
> 
> Regarding your concern about complexity, this is indeed something to
> consider. There is a precedent of parallelization in pgdata[1] which
> might be reused (or other methods) to reduce the complexity of this
> series.

Yes, please!

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot
  2023-11-28  3:18         ` Gang Li
  2023-11-28  6:52           ` Gang Li
@ 2023-11-29 19:41           ` David Rientjes
  1 sibling, 0 replies; 14+ messages in thread
From: David Rientjes @ 2023-11-29 19:41 UTC (permalink / raw)
  To: Gang Li
  Cc: David Hildenbrand, Gang Li, Mike Kravetz, Muchun Song,
	Andrew Morton, linux-mm, linux-kernel

On Tue, 28 Nov 2023, Gang Li wrote:

> > 
> > > And there, the 65.2s won't be noise because that 12TB system is up by a
> > > snap
> > > of a finger? :)
> > > 
> > 
> > In this single boot test, total boot time was 373.78s, so 1GB hugetlb
> > allocation is 17.4% of that.
> 
> Thank you for sharing these data. Currently, I don't have access to a machine
> of such large capacity, so the benefits in my tests are not as pronounced.
> 
> I believe testing on a system of this scale would yield significant benefits.
> 
> > 
> > Would love to see what the numbers would look like if 1GB pages were
> > supported.
> > 
> 
> Support for 1GB hugetlb is not yet perfect, so it wasn't included in v1. But
> I'm happy to refine and introduce 1GB hugetlb support in future versions.
> 

That would be very appreciated, thank you!  I'm happy to test and collect 
data for any proposed patch series on 12TB systems booted with a lot of 
1GB hugetlb pages on the kernel command line.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-11-29 19:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-23 13:30 [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
2023-11-23 13:30 ` [RFC PATCH v1 1/4] hugetlb: code clean for hugetlb_hstate_alloc_pages Gang Li
2023-11-23 13:30 ` [RFC PATCH v1 2/4] hugetlb: split hugetlb_hstate_alloc_pages Gang Li
2023-11-23 13:30 ` [RFC PATCH v1 3/4] hugetlb: add timing to hugetlb allocations on boot Gang Li
2023-11-23 13:30 ` [RFC PATCH v1 4/4] hugetlb: parallelize hugetlb page allocation Gang Li
2023-11-23 13:58 ` [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Gang Li
2023-11-23 14:10 ` David Hildenbrand
2023-11-24 19:44   ` David Rientjes
2023-11-24 19:47     ` David Hildenbrand
2023-11-24 20:00       ` David Rientjes
2023-11-28  3:18         ` Gang Li
2023-11-28  6:52           ` Gang Li
2023-11-28  8:09             ` David Hildenbrand
2023-11-29 19:41           ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.