linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] percpu: reduce the number of searches calculating best upa
@ 2020-11-02  5:26 Wonhuyk Yang
  2020-11-06 23:33 ` Dennis Zhou
  0 siblings, 1 reply; 2+ messages in thread
From: Wonhuyk Yang @ 2020-11-02  5:26 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Tejun Heo, Christoph Lameter, linux-mm, Wonhyuk Yang

From: Wonhyuk Yang <vvghjk1234@gmail.com>

Best upa is determined by iterating 1 to max_upa. If the size of
alloc_size is power of 2, numbers of iteration decrease to
logarithmic level.

Prime factorization of alloc_size makes it easy to get possible
upas. When alloc_size is power of 2, we can avoid cost of the
prime factorization and possible upas are 1, 2, 4, ... max_upa.

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
---
 mm/percpu.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 66a93f096394..a24f3973744f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2689,18 +2689,17 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 
 	/*
 	 * Determine min_unit_size, alloc_size and max_upa such that
-	 * alloc_size is multiple of atom_size and is the smallest
-	 * which can accommodate 4k aligned segments which are equal to
-	 * or larger than min_unit_size.
+	 * alloc_size is the maximu value of min_unit_size, atom_size.
+	 * Also, alloc_size is power of 2 because both min_unit_size
+	 * and atom_size are power of 2.
 	 */
 	min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
+	min_unit_size = roundup_pow_of_two(min_unit_size);
 
 	/* determine the maximum # of units that can fit in an allocation */
-	alloc_size = roundup(min_unit_size, atom_size);
-	upa = alloc_size / min_unit_size;
-	while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
-		upa--;
-	max_upa = upa;
+	alloc_size = max_t(size_t, min_unit_size, atom_size);
+	max_upa = alloc_size / min_unit_size;
+
 
 	/* group cpus according to their proximity */
 	for_each_possible_cpu(cpu) {
@@ -2727,12 +2726,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
 	 * Related to atom_size, which could be much larger than the unit_size.
 	 */
 	last_allocs = INT_MAX;
-	for (upa = max_upa; upa; upa--) {
+	for (upa = max_upa; upa; upa >>= 1) {
 		int allocs = 0, wasted = 0;
 
-		if (alloc_size % upa || (offset_in_page(alloc_size / upa)))
-			continue;
-
 		for (group = 0; group < nr_groups; group++) {
 			int this_allocs = DIV_ROUND_UP(group_cnt[group], upa);
 			allocs += this_allocs;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] percpu: reduce the number of searches calculating best upa
  2020-11-02  5:26 [PATCH] percpu: reduce the number of searches calculating best upa Wonhuyk Yang
@ 2020-11-06 23:33 ` Dennis Zhou
  0 siblings, 0 replies; 2+ messages in thread
From: Dennis Zhou @ 2020-11-06 23:33 UTC (permalink / raw)
  To: Wonhuyk Yang; +Cc: Tejun Heo, Christoph Lameter, linux-mm

Hi,

On Mon, Nov 02, 2020 at 02:26:47PM +0900, Wonhuyk Yang wrote:
> From: Wonhyuk Yang <vvghjk1234@gmail.com>
> 
> Best upa is determined by iterating 1 to max_upa. If the size of
> alloc_size is power of 2, numbers of iteration decrease to
> logarithmic level.
> 
> Prime factorization of alloc_size makes it easy to get possible
> upas. When alloc_size is power of 2, we can avoid cost of the
> prime factorization and possible upas are 1, 2, 4, ... max_upa.
> 
> Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
> ---
>  mm/percpu.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 66a93f096394..a24f3973744f 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2689,18 +2689,17 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  
>  	/*
>  	 * Determine min_unit_size, alloc_size and max_upa such that
> -	 * alloc_size is multiple of atom_size and is the smallest
> -	 * which can accommodate 4k aligned segments which are equal to
> -	 * or larger than min_unit_size.
> +	 * alloc_size is the maximu value of min_unit_size, atom_size.
> +	 * Also, alloc_size is power of 2 because both min_unit_size
> +	 * and atom_size are power of 2.
>  	 */
>  	min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
> +	min_unit_size = roundup_pow_of_two(min_unit_size);

While this may make sense for the vast majority of users, there remain
users such as embedded devices that page in the first chunk and have
fairly limited use of percpu memory. In these cases, we wouldn't want to
round up the min_unit_size as that might be wasteful albeit not much but
still to be not really worth changing the behavior here.

>  
>  	/* determine the maximum # of units that can fit in an allocation */
> -	alloc_size = roundup(min_unit_size, atom_size);
> -	upa = alloc_size / min_unit_size;
> -	while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
> -		upa--;
> -	max_upa = upa;
> +	alloc_size = max_t(size_t, min_unit_size, atom_size);
> +	max_upa = alloc_size / min_unit_size;
> +
>  
>  	/* group cpus according to their proximity */
>  	for_each_possible_cpu(cpu) {
> @@ -2727,12 +2726,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
>  	 * Related to atom_size, which could be much larger than the unit_size.
>  	 */
>  	last_allocs = INT_MAX;
> -	for (upa = max_upa; upa; upa--) {
> +	for (upa = max_upa; upa; upa >>= 1) {
>  		int allocs = 0, wasted = 0;
>  
> -		if (alloc_size % upa || (offset_in_page(alloc_size / upa)))
> -			continue;
> -
>  		for (group = 0; group < nr_groups; group++) {
>  			int this_allocs = DIV_ROUND_UP(group_cnt[group], upa);
>  			allocs += this_allocs;
> -- 
> 2.17.1
> 

Overall, I'm not inclined to take this because it is trying to optimize
boot time code, which runs at most a few times, by introducing a new
assumption. I personally find this code a little complex to parse, so
I'd rather not make the change unless it aided in maintainability or was
side effect free.

Thanks,
Dennis


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-11-06 23:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02  5:26 [PATCH] percpu: reduce the number of searches calculating best upa Wonhuyk Yang
2020-11-06 23:33 ` Dennis Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).