linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	paulus@samba.org, mpe@ellerman.id.au,
	Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64
Date: Mon, 19 Oct 2015 14:01:09 +0530	[thread overview]
Message-ID: <87io6345v6.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <1445088167.24309.58.camel@kernel.crashing.org>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sat, 2015-10-17 at 15:38 +0530, Aneesh Kumar K.V wrote:
>> Hi All,
>> 
>> This patch series attempt to update book3s 64 linux page table format to
>> make it more flexible. Our current pte format is very restrictive and we
>> overload multiple pte bits. This is due to the non-availability of free bits
>> in pte_t. We use pte_t to track the validity of 4K subpages. This patch
>> series free up pte_t of 11 bits by moving 4K subpage tracking to the
>> lower half of PTE page. The pte format is updated such that we have a
>> better method for identifying a pte entry at pmd level. This will also enable
>> us to implement hugetlb migration(not yet done in this series). 
>
> I still have serious concerns about the fact that we now use 4 times
> more memory for page tables than strictly necessary. We were using
> twice as much before.
>
> We need to find a way to not allocate all those "other halves" when not
> needed.
>
> I understand it's tricky, we tend to notice we need the second half too
> late...
>
> Maybe if we could escalate the hash miss into a minor fault when the
> second half is needed and not present, we can then allocate it from the
>
> For demotion of the vmap space, we might have to be a bit smarter,
> maybe detect at ioremap/vmap time and flag the mm as needed second
> halves for everything (and allocate them).
>
> Of course if the machine doesn't do hw 64k, we would always allocate
> the second half.
>

I am now trying to do the conditional alloc and one of the unpleasant
side effect of that is spreading of subpage 4k information all around
the code. We have the below cases

1) subpage protection:
This will result in demotion of segment when we call sys_subpage_prot.
New allocation of pgtable_t will result in subpage tracking page
allocation by checking in pte_alloc_one as below

pte_t *page_table_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
{
	pte_t *pte;

	pte = get_from_cache(mm);
	if (pte)
		goto out;

	pte = __alloc_for_cache(mm, kernel);
out:
	if (REGIOND_ID(vmaddr) == USER_REGION_ID) {
		int slice_psize;

		slice_psize = get_slice_psize(mm, vmaddr);
		/*
		 * 64K linux page size with 4K segment base page size. Allocate
		 * the 4k subpage track page
		 */
		if (slice_psize == MMU_PAGE_4K)
			alloc_and_update_4k(pte);
	}
	return pte;
}

Existing allocation when trying to insert a 4K hpte will push the fault
to handle_mm_fault and I am looking at adding the below in update_mmu_cache

	/*
	 * The fault was sent to us because, we didn't had the subpage
	 * tracking page
	 */
	if (pte_val(*ptep) & _PAGE_COMBO)
		alloc_and_update_4k(ptep);

We would have marked the pte _PAGE_COMBO in __hash_page_4k

2) ioremap with cache inhibited restrictions:
We can handle that in map_kernel_page
		/*
		 * if pte is non-cacheable and we have restrictions on
		 * using non cacheable large pages, we will have to use
		 * 4K subpages. Allocate the 4K region, but skip the
		 * segment demote
		 */
		if (mmu_ci_restrictions && (flags & _PAGE_NO_CACHE)) {
			alloc_and_update_4k(ptep);
		}

3) remap_4k_pfn:
Can we mark the segment demoted here ?. if so the pte alloc will handle
that if the remap address is in user region.

4) vmalloc space demotion because somebody did a remap_4k_pfn to an address
in that range:
Not yet sure about this 

5) Anything else I missed ?
you mention about vmap space. I was not able to find out when we would
demote an address in 0xf region ?

I will try to see if there is a better way to isolate the subpage
handling, but it looks like, we are making the code more complex by
doing the above ?

Also how do I evaluate the impact of doubling the pgtable_t size ?
Looking at the above i am wondering if this is really a issue we need
worry about ? We are better than other arch with 4K page size in-terms
of pte pages. We allocate one pte page per 128MB of address range and
for 4K page size architectures that is one per every 2MB ?

-aneesh

  parent reply	other threads:[~2015-10-19  9:41 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-17 10:08 [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64 Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 01/31] powerpc/mm: move pte headers to book3s directory Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 02/31] powerpc/mm: move pte headers to book3s directory (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 03/31] powerpc/mm: make a separate copy for book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 04/31] powerpc/mm: make a separate copy for book3s (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 05/31] powerpc/mm: Move hash specific pte width and other defines to book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 06/31] powerpc/mm: Delete booke bits from book3s Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 07/31] powerpc/mm: Don't have generic headers introduce functions touching pte bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 08/31] powerpc/mm: Drop pte-common.h from BOOK3S 64 Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 09/31] powerpc/mm: Don't use pte_val as lvalue Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 10/31] powerpc/mm: Don't use pmd_val, pud_val and pgd_val " Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 11/31] powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 12/31] powerpc/mm: Move PTE bits from generic functions to hash64 functions Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 13/31] powerpc/booke: Move nohash headers (part 1) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 14/31] powerpc/booke: Move nohash headers (part 2) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 15/31] powerpc/booke: Move nohash headers (part 3) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 16/31] powerpc/booke: Move nohash headers (part 4) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 17/31] powerpc/booke: Move nohash headers (part 5) Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 18/31] powerpc/mm: Increase the pte frag size Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 19/31] powerpc/mm: Convert 4k hash insert to C Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 20/31] powerpc/mm: update __real_pte to take address as argument Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 21/31] powerpc/mm: make pte page hash index slot 8 bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 22/31] powerpc/mm: Don't track subpage valid bit in pte_t Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 23/31] powerpc/mm: Increase the width of #define Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 24/31] powerpc/mm: Convert __hash_page_64K to C Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 25/31] powerpc/mm: Convert 4k insert from asm " Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 26/31] powerpc/mm: Remove the dependency on pte bit position in asm code Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 27/31] powerpc/mm: Add helper for converting pte bit to hpte bits Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 28/31] powerpc/mm: Move WIMG update to helper Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 29/31] powerpc/mm: Move hugetlb related headers Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 30/31] powerpc/mm: Move THP headers around Aneesh Kumar K.V
2015-10-17 10:08 ` [PATCH V4 31/31] powerpc/mm: Add a _PAGE_PTE bit Aneesh Kumar K.V
2015-10-17 13:22 ` [PATCH V4 00/31] powerpc/mm: Update page table format for book3s 64 Benjamin Herrenschmidt
2015-10-19  3:17   ` Aneesh Kumar K.V
2015-10-19  8:31   ` Aneesh Kumar K.V [this message]
2015-10-22 18:40 ` Denis Kirjanov
2015-10-23  6:06   ` Aneesh Kumar K.V
2015-10-23 19:08     ` Denis Kirjanov
2015-11-03  5:02       ` Aneesh Kumar K.V
2015-11-10  8:28         ` Denis Kirjanov
2015-11-10 16:00           ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87io6345v6.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).