LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Leonardo Bras <leobras.c@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Scott Cheloha <cheloha@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Nicholas Piggin <npiggin@gmail.com>,
	Bharata B Rao <bharata@linux.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Sandipan Das <sandipan@linux.ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on memory hotplug
Date: Mon, 19 Apr 2021 15:34:25 +1000
Message-ID: <YH0WYaXvGMf9YsWi@yekko.fritz.box> (raw)
In-Reply-To: <e5c924479839815c025de29d650d82419b18c0dc.camel@gmail.com>


[-- Attachment #1: Type: text/plain, Size: 3759 bytes --]

On Thu, Apr 08, 2021 at 11:51:36PM -0300, Leonardo Bras wrote:
> Hello David, thanks for the feedback!
> 
> On Mon, 2021-03-22 at 18:55 +1100, David Gibson wrote:
> > > +void hash_memory_batch_expand_prepare(unsigned long newsize)
> > > +{
> > > +	/*
> > > +	 * Resizing-up HPT should never fail, but there are some cases system starts with higher
> > > +	 * SHIFT than required, and we go through the funny case of resizing HPT down while
> > > +	 * adding memory
> > > +	 */
> > > +
> > > +	while (resize_hpt_for_hotplug(newsize, false) == -ENOSPC) {
> > > +		newsize *= 2;
> > > +		pr_warn("Hash collision while resizing HPT\n");
> > 
> > This unbounded increase in newsize makes me nervous - we should be
> > bounded by the current size of the HPT at least.  In practice we
> > should be fine, since the resize should always succeed by the time we
> > reach our current HPT size, but that's far from obvious from this
> > point in the code.
> 
> Sure, I will add bounds in v2.
> 
> > 
> > And... you're doubling newsize which is a value which might not be a
> > power of 2.  I'm wondering if there's an edge case where this could
> > actually cause us to skip the current size and erroneously resize to
> > one bigger than we have currently.
> 
> I also though that at the start, but it seems quite reliable.
> Before using this value, htab_shift_for_mem_size() will always round it
> to next power of 2. 
> Ex.
> Any value between 0b0101 and 0b1000 will be rounded to 0b1000 for shift
> calculation. If we multiply it by 2 (same as << 1), we have that
> anything between 0b01010 and 0b10000 will be rounded to 0b10000. 

Ah, good point.

> This works just fine as long as we are multiplying. 
> Division may have the behavior you expect, as 0b0101 >> 1 would become
> 0b010 and skip a shift.
> 	
> > > +void memory_batch_expand_prepare(unsigned long newsize)
> > 
> > This wrapper doesn't seem useful.
> 
> Yeah, it does little, but I can't just jump into hash_* functions
> directly from hotplug-memory.c, without even knowing if it's using hash
> pagetables. (in case the suggestion would be test for disable_radix
> inside hash_memory_batch*)
> 
> > 
> > > +{
> > > +	if (!radix_enabled())
> > > +		hash_memory_batch_expand_prepare(newsize);
> > > +}
> > >  #endif /* CONFIG_MEMORY_HOTPLUG */
> > >  
> > > 
> > > +	memory_batch_expand_prepare(memblock_phys_mem_size() +
> > > +				     drmem_info->n_lmbs * drmem_lmb_size());
> > 
> > This doesn't look right.  memory_add_by_index() is adding a *single*
> > LMB, I think using drmem_info->n_lmbs here means you're counting this
> > as adding again as much memory as you already have hotplugged.
> 
> Yeah, my mistake. This makes sense.
> I will change it to something like 
> memblock_phys_mem_size() + drmem_lmb_size()
> 
> > > 
> > > +	memory_batch_expand_prepare(memblock_phys_mem_size() + lmbs_to_add * drmem_lmb_size());
> > > +
> > >  	for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
> > >  		if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > >  			continue;
> > 
> > I don't see memory_batch_expand_prepare() suppressing any existing HPT
> > resizes.  Won't this just resize to the right size for the full add,
> > then resize several times again as we perform the add?  Or.. I guess
> > that will be suppressed by patch 1/3. 
> 
> Correct.
> 
> >  That's seems kinda fragile, though.
> 
> What do you mean by fragile here?
> What would you suggest doing different?
> 
> Best regards,
> Leonardo Bras
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12  7:29 [PATCH 0/3] powerpc/mm/hash: Time improvements for memory hot(un)plug Leonardo Bras
2021-03-12  7:29 ` [PATCH 1/3] powerpc/mm/hash: Avoid resizing-down HPT on first memory hotplug Leonardo Bras
2021-03-22  6:49   ` David Gibson
2021-04-09  2:16     ` Leonardo Bras
2021-03-12  7:29 ` [PATCH 2/3] powerpc/mm/hash: Avoid multiple HPT resize-ups on " Leonardo Bras
2021-03-22  7:55   ` David Gibson
2021-04-09  2:51     ` Leonardo Bras
2021-04-19  5:34       ` David Gibson [this message]
2021-03-12  7:29 ` [PATCH 3/3] powerpc/mm/hash: Avoid multiple HPT resize-downs on memory hotunplug Leonardo Bras
2021-03-22 23:45   ` David Gibson
2021-04-09  3:31     ` Leonardo Bras
2021-04-19  5:37       ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YH0WYaXvGMf9YsWi@yekko.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.ibm.com \
    --cc=cheloha@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=ldufour@linux.ibm.com \
    --cc=leobras.c@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=nathanl@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=rppt@kernel.org \
    --cc=sandipan@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git