[PATCH] percpu: reduce the number of searches calculating best upa

* [PATCH] percpu: reduce the number of searches calculating best upa
@ 2020-11-02  5:26 Wonhuyk Yang
  2020-11-06 23:33 ` Dennis Zhou
  0 siblings, 1 reply; 2+ messages in thread
From: Wonhuyk Yang @ 2020-11-02  5:26 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Tejun Heo, Christoph Lameter, linux-mm, Wonhyuk Yang

From: Wonhyuk Yang <vvghjk1234@gmail.com>

Best upa is determined by iterating 1 to max_upa. If the size of
alloc_size is power of 2, numbers of iteration decrease to
logarithmic level.

Prime factorization of alloc_size makes it easy to get possible
upas. When alloc_size is power of 2, we can avoid cost of the
prime factorization and possible upas are 1, 2, 4, ... max_upa.

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
---
 mm/percpu.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 66a93f096394..a24f3973744f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2689,18 +2689,17 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 
 	/*
 	 * Determine min_unit_size, alloc_size and max_upa such that
-	 * alloc_size is multiple of atom_size and is the smallest
-	 * which can accommodate 4k aligned segments which are equal to
-	 * or larger than min_unit_size.
+	 * alloc_size is the maximu value of min_unit_size, atom_size.
+	 * Also, alloc_size is power of 2 because both min_unit_size
+	 * and atom_size are power of 2.
 	 */
 	min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
+	min_unit_size = roundup_pow_of_two(min_unit_size);
 
 	/* determine the maximum # of units that can fit in an allocation */
-	alloc_size = roundup(min_unit_size, atom_size);
-	upa = alloc_size / min_unit_size;
-	while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
-		upa--;
-	max_upa = upa;
+	alloc_size = max_t(size_t, min_unit_size, atom_size);
+	max_upa = alloc_size / min_unit_size;
+
 
 	/* group cpus according to their proximity */
 	for_each_possible_cpu(cpu) {
@@ -2727,12 +2726,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 	 * Related to atom_size, which could be much larger than the unit_size.
 	 */
 	last_allocs = INT_MAX;
-	for (upa = max_upa; upa; upa--) {
+	for (upa = max_upa; upa; upa >>= 1) {
 		int allocs = 0, wasted = 0;
 
-		if (alloc_size % upa || (offset_in_page(alloc_size / upa)))
-			continue;
-
 		for (group = 0; group < nr_groups; group++) {
 			int this_allocs = DIV_ROUND_UP(group_cnt[group], upa);
 			allocs += this_allocs;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 2+ messages in thread