linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	paulus@samba.org, mpe@ellerman.id.au,
	Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64
Date: Mon, 19 Oct 2015 08:47:10 +0530	[thread overview]
Message-ID: <87lhaz4keh.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1445088167.24309.58.camel@kernel.crashing.org>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sat, 2015-10-17 at 15:38 +0530, Aneesh Kumar K.V wrote:
>> Hi All,
>> 
>> This patch series attempt to update book3s 64 linux page table format to
>> make it more flexible. Our current pte format is very restrictive and we
>> overload multiple pte bits. This is due to the non-availability of free bits
>> in pte_t. We use pte_t to track the validity of 4K subpages. This patch
>> series free up pte_t of 11 bits by moving 4K subpage tracking to the
>> lower half of PTE page. The pte format is updated such that we have a
>> better method for identifying a pte entry at pmd level. This will also enable
>> us to implement hugetlb migration(not yet done in this series). 
>
> I still have serious concerns about the fact that we now use 4 times
> more memory for page tables than strictly necessary. We were using
> twice as much before.
>
> We need to find a way to not allocate all those "other halves" when not
> needed.
>
> I understand it's tricky, we tend to notice we need the second half too
> late...
>
> Maybe if we could escalate the hash miss into a minor fault when the
> second half is needed and not present, we can then allocate it from the
>
> For demotion of the vmap space, we might have to be a bit smarter,
> maybe detect at ioremap/vmap time and flag the mm as needed second
> halves for everything (and allocate them).
>
> Of course if the machine doesn't do hw 64k, we would always allocate
> the second half.
>
> The question then becomes how to reference it from the first half.
>
> A completely parallel tree means a lot more walks for each PTE, is
> there something in the PTE page's struct page we can use maybe ?
>

We could use page->index. I have an early patch (it still find wrong
ptes ) which increase fragment count to 32. IMHO, we should
get this series merged and not depend on increasing fragment
count to 32 now. I was planning to get that merged in the next
merge window. We get sufficient time to test and also won't push
all the changes in one merge window. This will allow us to easily
isolate issues if they arise. 

    powerpc/mm: Move subpage tracking allocation out of pgtable
    
    This is the first part of the attempt to reduce pagetable usage for 64k.
    The goal is to allocate 4k subpage tracking memory only if we need to
    
    Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

diff --git a/arch/powerpc/include/asm/pgalloc-64.h b/arch/powerpc/include/asm/pgalloc-64.h
index 4f1cc6c46728..f4243d9264c4 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -166,13 +166,24 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 /*
  * we support 8 fragments per PTE page.
  */
-#define PTE_FRAG_NR	8
+#define PTE_FRAG_NR	32
 /*
  * We use a 2K PTE page fragment and another 4K for storing
  * real_pte_t hash index. Rounding the entire thing to 8K
  */
-#define PTE_FRAG_SIZE_SHIFT  13
+#define PTE_FRAG_SIZE_SHIFT  11
 #define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+/*
+ * Size of array used to track the respective 4K subpage details
+ * Total entries in a pte page:
+ * (PTE_FRAG_NR * PTE_FRAG_SIZE) / 8
+ * Space needed per 64K pte:
+ * 16 bytes (One byte per 4K hpte)
+ * Total space for 4K PTE frag size
+ * (PTE_FRAG_NR * PTE_FRAG_SIZE) * 2
+ * ie, order 1 pages.
+ */
+#define PTE_4K_FRAG_SIZE (PTE_FRAG_SIZE << 1)
 
 extern pte_t *page_table_alloc(struct mm_struct *, unsigned long, int);
 extern void page_table_free(struct mm_struct *, unsigned long *, int);
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 84867a1491a2..08c141a844ca 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -15,24 +15,40 @@
 #include <linux/mm.h>
 #include <asm/machdep.h>
 #include <asm/mmu.h>
+#include <asm/pgalloc.h>
+
+static inline int pte_frag_index(unsigned long ptep, unsigned long pte_page)
+{
+	return  (ptep - pte_page)/ PTE_FRAG_SIZE;
+
+}
+
+static inline int pte_4k_track_index(int frag_index, int pte_index)
+{
+	return (PTE_4K_FRAG_SIZE * frag_index) + (pte_index * 16);
+
+}
 
 real_pte_t __real_pte(unsigned long addr, pte_t pte, pte_t *ptep)
 {
 	int indx;
 	real_pte_t rpte;
-	pte_t *pte_headp;
+	struct page *pte_page = virt_to_page(ptep);
 
 	rpte.pte = pte;
 	rpte.hidx = NULL;
 	if (pte_val(pte) & _PAGE_COMBO) {
+		int frag_index = pte_frag_index((unsigned long) ptep,
+					(unsigned long)page_address(pte_page));
+
 		indx = pte_index(addr);
-		pte_headp = ptep - indx;
 		/*
 		 * Make sure we order the hidx load against the _PAGE_COMBO
 		 * check. The store side ordering is done in __hash_page_4K
 		 */
 		smp_rmb();
-		rpte.hidx = (unsigned char *)(pte_headp + PTRS_PER_PTE) + (16 * indx);
+		rpte.hidx = ((unsigned char *)pte_page->index) +
+				pte_4k_track_index(frag_index, indx);
 	}
 	return rpte;
 }
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index 4e4efbc2658e..e25cba47febb 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -121,6 +121,7 @@ static void destroy_pagetable_page(struct mm_struct *mm)
 	count = atomic_sub_return(PTE_FRAG_NR - count, &page->_count);
 	if (!count) {
 		pgtable_page_dtor(page);
+		free_pages((unsigned long)page->index, 1);
 		free_hot_cold_page(page, 0);
 	}
 }
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index ea6bc31debb0..e6fe21af45ab 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -386,10 +386,18 @@ static pte_t *get_from_cache(struct mm_struct *mm)
 static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
 {
 	void *ret = NULL;
+	struct page *subpage_pte;
 	struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
 				       __GFP_REPEAT | __GFP_ZERO);
 	if (!page)
 		return NULL;
+
+	subpage_pte = alloc_pages(GFP_KERNEL | __GFP_NOTRACK |
+				  __GFP_REPEAT | __GFP_ZERO, 1);
+	WARN(page->index, "Page index is not null\n");
+	WARN(page->freelist, "Free list is not null\n");
+	page->index = (unsigned long)page_address(subpage_pte);
+
 	if (!kernel && !pgtable_page_ctor(page)) {
 		__free_page(page);
 		return NULL;
@@ -426,6 +434,8 @@ void page_table_free(struct mm_struct *mm, unsigned long *table, int kernel)
 {
 	struct page *page = virt_to_page(table);
 	if (put_page_testzero(page)) {
+		free_pages(page->index, 1);
+		page->index = 0;
 		if (!kernel)
 			pgtable_page_dtor(page);
 		free_hot_cold_page(page, 0);
@@ -437,6 +447,8 @@ static void page_table_free_rcu(void *table)
 {
 	struct page *page = virt_to_page(table);
 	if (put_page_testzero(page)) {
+		free_pages(page->index, 1);
+		page->index = 0;
 		pgtable_page_dtor(page);
 		free_hot_cold_page(page, 0);
 	}
@@ -471,6 +483,8 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
 		/* PTE page needs special handling */
 		struct page *page = virt_to_page(table);
 		if (put_page_testzero(page)) {
+			free_pages(page->index, 1);
+			page->index = 0;
 			pgtable_page_dtor(page);
 			free_hot_cold_page(page, 0);
 		}
 

  reply	other threads:[~2015-10-19  3:18 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-17 10:08 [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64 Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 01/31] powerpc/mm: move pte headers to book3s directory Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 02/31] powerpc/mm: move pte headers to book3s directory (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 03/31] powerpc/mm: make a separate copy for book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 04/31] powerpc/mm: make a separate copy for book3s (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 05/31] powerpc/mm: Move hash specific pte width and other defines to book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 06/31] powerpc/mm: Delete booke bits from book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 07/31] powerpc/mm: Don't have generic headers introduce functions touching pte bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 08/31] powerpc/mm: Drop pte-common.h from BOOK3S 64 Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 09/31] powerpc/mm: Don't use pte_val as lvalue Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 10/31] powerpc/mm: Don't use pmd_val, pud_val and pgd_val " Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 11/31] powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 12/31] powerpc/mm: Move PTE bits from generic functions to hash64 functions Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 13/31] powerpc/booke: Move nohash headers (part 1) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 14/31] powerpc/booke: Move nohash headers (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 15/31] powerpc/booke: Move nohash headers (part 3) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 16/31] powerpc/booke: Move nohash headers (part 4) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 17/31] powerpc/booke: Move nohash headers (part 5) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 18/31] powerpc/mm: Increase the pte frag size Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 19/31] powerpc/mm: Convert 4k hash insert to C Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 20/31] powerpc/mm: update __real_pte to take address as argument Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 21/31] powerpc/mm: make pte page hash index slot 8 bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 22/31] powerpc/mm: Don't track subpage valid bit in pte_t Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 23/31] powerpc/mm: Increase the width of #define Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 24/31] powerpc/mm: Convert __hash_page_64K to C Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 25/31] powerpc/mm: Convert 4k insert from asm " Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 26/31] powerpc/mm: Remove the dependency on pte bit position in asm code Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 27/31] powerpc/mm: Add helper for converting pte bit to hpte bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 28/31] powerpc/mm: Move WIMG update to helper Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 29/31] powerpc/mm: Move hugetlb related headers Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 30/31] powerpc/mm: Move THP headers around Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 31/31] powerpc/mm: Add a _PAGE_PTE bit Aneesh Kumar K.V
2015-10-17 13:22 ` [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64 Benjamin Herrenschmidt
2015-10-19  3:17   ` Aneesh Kumar K.V [this message]
2015-10-19  8:31   ` Aneesh Kumar K.V
2015-10-22 18:40 ` Denis Kirjanov
2015-10-23  6:06   ` Aneesh Kumar K.V
2015-10-23 19:08     ` Denis Kirjanov
2015-11-03  5:02       ` Aneesh Kumar K.V
2015-11-10  8:28         ` Denis Kirjanov
2015-11-10 16:00           ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lhaz4keh.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).