All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@ozlabs.org>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	lvivier@redhat.com, thuth@redhat.com, aik@ozlabs.ru,
	linuxppc-dev@lists.ozlabs.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [RFCv2 5/9] arch/powerpc: Split hash page table sizing heuristic into a helper
Date: Mon, 8 Feb 2016 16:57:22 +1100	[thread overview]
Message-ID: <20160208055722.GE30807@oak.ozlabs.ibm.com> (raw)
In-Reply-To: <56B32E54.90307@linux.vnet.ibm.com>

On Thu, Feb 04, 2016 at 04:26:20PM +0530, Anshuman Khandual wrote:
> On 02/02/2016 06:34 AM, David Gibson wrote:
> > On Mon, Feb 01, 2016 at 12:34:32PM +0530, Anshuman Khandual wrote:
> >> On 01/29/2016 10:53 AM, David Gibson wrote:
> >>> htab_get_table_size() either retrieve the size of the hash page table (HPT)
> >>> from the device tree - if the HPT size is determined by firmware - or
> >>> uses a heuristic to determine a good size based on RAM size if the kernel
> >>> is responsible for allocating the HPT.
> >>>
> >>> To support a PAPR extension allowing resizing of the HPT, we're going to
> >>> want the memory size -> HPT size logic elsewhere, so split it out into a
> >>> helper function.
> >>>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> >>>  arch/powerpc/include/asm/mmu-hash64.h |  3 +++
> >>>  arch/powerpc/mm/hash_utils_64.c       | 30 +++++++++++++++++-------------
> >>>  2 files changed, 20 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h
> >>> index 7352d3f..cf070fd 100644
> >>> --- a/arch/powerpc/include/asm/mmu-hash64.h
> >>> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> >>> @@ -607,6 +607,9 @@ static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize)
> >>>  	context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
> >>>  	return get_vsid(context, ea, ssize);
> >>>  }
> >>> +
> >>> +unsigned htab_shift_for_mem_size(unsigned long mem_size);
> >>> +
> >>>  #endif /* __ASSEMBLY__ */
> >>>  
> >>>  #endif /* _ASM_POWERPC_MMU_HASH64_H_ */
> >>> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> >>> index e88a86e..d63f7dc 100644
> >>> --- a/arch/powerpc/mm/hash_utils_64.c
> >>> +++ b/arch/powerpc/mm/hash_utils_64.c
> >>> @@ -606,10 +606,24 @@ static int __init htab_dt_scan_pftsize(unsigned long node,
> >>>  	return 0;
> >>>  }
> >>>  
> >>> -static unsigned long __init htab_get_table_size(void)
> >>> +unsigned htab_shift_for_mem_size(unsigned long mem_size)
> >>>  {
> >>> -	unsigned long mem_size, rnd_mem_size, pteg_count, psize;
> >>> +	unsigned memshift = __ilog2(mem_size);
> >>> +	unsigned pshift = mmu_psize_defs[mmu_virtual_psize].shift;
> >>> +	unsigned pteg_shift;
> >>> +
> >>> +	/* round mem_size up to next power of 2 */
> >>> +	if ((1UL << memshift) < mem_size)
> >>> +		memshift += 1;
> >>> +
> >>> +	/* aim for 2 pages / pteg */
> >>
> >> While here I guess its a good opportunity to write couple of lines
> >> about why one PTE group for every two physical pages on the system,
> > 
> > Well, that don't really know, it's just copied from the existing code.
> 
> Aneesh, would you know why ?

1 PTEG per 2 pages means 4 HPTEs per page, which means you can map
each page to an average of 4 different virtual addresses.  It's a
heuristic that has been around for a long time and dates back to the
early days of AIX.  For Linux, running on machines which typically
have quite a lot of memory, it's probably overkill.

> > 
> >> why minimum (1UL << 11 = 2048) number of PTE groups required,
> 
> Aneesh, would you know why ?

It's in the architecture, which specifies the minimum size of the HPT
as 256kB.  The reason is because not all of the virtual address bits
are present in the HPT.  That's OK because some of the virtual address
bits are implied by the HPTEG index in the hash table.  If the HPT was
less than 256kB (2048 HPTEGs) there would be the possibility of
collisions where two different virtual addresses could hash to the
same HPTEG and their HPTEs would be impossible to tell apart.

> 
> > 
> > Ok.
> > 
> >> why
> >> (1U << 7 = 128) entries per PTE group
> > 
> > Um.. what?  Because that's how big a PTEG is, I don't think
> > re-explaining the HPT structure here is useful.
> 
> Agreed, though think some where these things should be macros not used
> as hard coded numbers like this.

Using symbols instead of constant numbers is not always clearer.  The
symbol name can give some context (but so can a suitable comment) but
has the cost of obscuring the actual numeric value.

Paul.

  reply	other threads:[~2016-02-08  6:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-29  5:23 [RFCv2 0/9] PAPR hash page table resizing (guest side) David Gibson
2016-01-29  5:23 ` [RFCv2 1/9] memblock: Don't mark memblock_phys_mem_size() as __init David Gibson
2016-02-01  5:50   ` Anshuman Khandual
2016-02-08  2:46   ` Paul Mackerras
2016-01-29  5:23 ` [RFCv2 2/9] arch/powerpc: Clean up error handling for htab_remove_mapping David Gibson
2016-02-01  5:54   ` Anshuman Khandual
2016-02-08  2:48   ` Paul Mackerras
2016-01-29  5:23 ` [RFCv2 3/9] arch/powerpc: Handle removing maybe-present bolted HPTEs David Gibson
2016-02-01  5:58   ` Anshuman Khandual
2016-02-02  1:08     ` David Gibson
2016-02-02 13:49   ` Denis Kirjanov
2016-02-08  2:54   ` Paul Mackerras
2016-02-09  0:43     ` David Gibson
2016-01-29  5:23 ` [RFCv2 4/9] arch/powerpc: Clean up memory hotplug failure paths David Gibson
2016-02-01  6:29   ` Anshuman Khandual
2016-02-02 15:04   ` Nathan Fontenot
2016-02-03  4:31     ` David Gibson
2016-02-08  5:47   ` Paul Mackerras
2016-01-29  5:23 ` [RFCv2 5/9] arch/powerpc: Split hash page table sizing heuristic into a helper David Gibson
2016-02-01  7:04   ` Anshuman Khandual
2016-02-02  1:04     ` David Gibson
2016-02-04 10:56       ` Anshuman Khandual
2016-02-08  5:57         ` Paul Mackerras [this message]
2016-01-29  5:24 ` [RFCv2 6/9] pseries: Add hypercall wrappers for hash page table resizing David Gibson
2016-02-01  7:11   ` Anshuman Khandual
2016-02-02  0:58     ` David Gibson
2016-02-04 11:11       ` Anshuman Khandual
2016-02-07 22:33         ` David Gibson
2016-02-08  5:58   ` Paul Mackerras
2016-01-29  5:24 ` [RFCv2 7/9] pseries: Add support for hash " David Gibson
2016-02-01  8:31   ` Anshuman Khandual
2016-02-01 11:04     ` David Gibson
2016-02-08  5:59   ` Paul Mackerras
2016-01-29  5:24 ` [RFCv2 8/9] pseries: Advertise HPT resizing support via CAS David Gibson
2016-02-01  8:36   ` Anshuman Khandual
2016-02-08  6:00   ` Paul Mackerras
2016-01-29  5:24 ` [RFCv2 9/9] pseries: Automatically resize HPT for memory hot add/remove David Gibson
2016-02-01  8:51   ` Anshuman Khandual
2016-02-01 10:55     ` David Gibson
2016-02-08  6:01   ` Paul Mackerras
2016-02-01  5:50 ` [RFCv2 0/9] PAPR hash page table resizing (guest side) Anshuman Khandual
2016-02-02  0:57   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160208055722.GE30807@oak.ozlabs.ibm.com \
    --to=paulus@ozlabs.org \
    --cc=aik@ozlabs.ru \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lvivier@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.