LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Leonardo Bras <leobras.c@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Scott Cheloha <cheloha@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Bharata B Rao <bharata@linux.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Sandipan Das <sandipan@linux.ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug
Date: Mon, 22 Mar 2021 17:49:53 +1100
Message-ID: <YFg+Edy6dfmZx3lr@yekko.fritz.box> (raw)
In-Reply-To: <20210312072940.598696-2-leobras.c@gmail.com>


[-- Attachment #1: Type: text/plain, Size: 4282 bytes --]

On Fri, Mar 12, 2021 at 04:29:39AM -0300, Leonardo Bras wrote:
> Because hypervisors may need to create HPTs without knowing the guest
> page size, the smallest used page-size (4k) may be chosen, resulting in
> a HPT that is possibly bigger than needed.
> 
> On a guest with bigger page-sizes, the amount of entries for HTP may be
> too high, causing the guest to ask for a HPT resize-down on the first
> hotplug.
> 
> This becomes a problem when HPT resize-down fails, and causes the
> HPT resize to be performed on every LMB added, until HPT size is
> compatible to guest memory size, causing a major slowdown.
> 
> So, avoiding HPT resizing-down on hot-add significantly improves memory
> hotplug times.
> 
> As an example, hotplugging 256GB on a 129GB guest took 710s without this
> patch, and 21s after applied.
> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

I don't love this approach.  Adding the extra flag at this level seems
a bit inelegant, and it means we're passing up an easy opportunity to
reduce our resource footprint on the host.

But... maybe we'll have to do it.  I'd like to see if we can get
things to work well enough with just the "batching" to avoid multiple
resize attempts first.

> ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 36 ++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 73b06adb6eeb..cfb3ec164f56 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -794,7 +794,7 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -static int resize_hpt_for_hotplug(unsigned long new_mem_size)
> +static int resize_hpt_for_hotplug(unsigned long new_mem_size, bool shrinking)
>  {
>  	unsigned target_hpt_shift;
>  
> @@ -803,19 +803,25 @@ static int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  
>  	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> -	/*
> -	 * To avoid lots of HPT resizes if memory size is fluctuating
> -	 * across a boundary, we deliberately have some hysterisis
> -	 * here: we immediately increase the HPT size if the target
> -	 * shift exceeds the current shift, but we won't attempt to
> -	 * reduce unless the target shift is at least 2 below the
> -	 * current shift
> -	 */
> -	if (target_hpt_shift > ppc64_pft_size ||
> -	    target_hpt_shift < ppc64_pft_size - 1)
> -		return mmu_hash_ops.resize_hpt(target_hpt_shift);
> +	if (shrinking) {
>  
> -	return 0;
> +		/*
> +		 * To avoid lots of HPT resizes if memory size is fluctuating
> +		 * across a boundary, we deliberately have some hysterisis
> +		 * here: we immediately increase the HPT size if the target
> +		 * shift exceeds the current shift, but we won't attempt to
> +		 * reduce unless the target shift is at least 2 below the
> +		 * current shift
> +		 */
> +
> +		if (target_hpt_shift >= ppc64_pft_size - 1)
> +			return 0;
> +
> +	} else if (target_hpt_shift <= ppc64_pft_size) {
> +		return 0;
> +	}
> +
> +	return mmu_hash_ops.resize_hpt(target_hpt_shift);
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end,
> @@ -828,7 +834,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
>  		return -1;
>  	}
>  
> -	resize_hpt_for_hotplug(memblock_phys_mem_size());
> +	resize_hpt_for_hotplug(memblock_phys_mem_size(), false);
>  
>  	rc = htab_bolt_mapping(start, end, __pa(start),
>  			       pgprot_val(prot), mmu_linear_psize,
> @@ -847,7 +853,7 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
>  	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
>  				     mmu_kernel_ssize);
>  
> -	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> +	if (resize_hpt_for_hotplug(memblock_phys_mem_size(), true) == -ENOSPC)
>  		pr_warn("Hash collision while resizing HPT\n");
>  
>  	return rc;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12  7:29 [PATCH 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
2021-03-12  7:29 ` [PATCH 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
2021-03-22  6:49   ` David Gibson [this message]
2021-04-09  2:16     ` Leonardo Bras
2021-03-12  7:29 ` [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
2021-03-22  7:55   ` David Gibson
2021-04-09  2:51     ` Leonardo Bras
2021-04-19  5:34       ` David Gibson
2021-03-12  7:29 ` [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
2021-03-22 23:45   ` David Gibson
2021-04-09  3:31     ` Leonardo Bras
2021-04-19  5:37       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFg+Edy6dfmZx3lr@yekko.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.ibm.com \
    --cc=cheloha@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=ldufour@linux.ibm.com \
    --cc=leobras.c@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=nathanl@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=rppt@kernel.org \
    --cc=sandipan@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git