linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Kardashevskiy <aik@au1.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, paulus@samba.org
Subject: Re: [PATCH -V10 00/15] THP support for PPC64
Date: Sun, 16 Jun 2013 14:06:07 +1000	[thread overview]
Message-ID: <1371355567.21896.101.camel@pasglop> (raw)
In-Reply-To: <1371353865.21896.94.camel@pasglop>

On Sun, 2013-06-16 at 13:37 +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2013-06-16 at 12:00 +1000, Benjamin Herrenschmidt wrote:
> > So at this point, hash_page might *still* see the old pmd. Unless I
> > missed something, you did nothing that will prevent that (the only way
> > to lock against hash_page is really an IPI & wait or to take the PTE's
> > busy and make them !present or something). So as far as I can tell,
> > a concurrent hash_page can still sneak into the hash some "small"
> > entries after you have supposedly flushed them.
> 
> Note that the _PAGE_PRESENT bit is removed eventually ... but much
> later, in __collapse_huge_page_copy() which will also flush the hash, so
> at least we will remove a stale hash entry that would have been added by
> the race above I suppose...  but:
> 
>  - _PAGE_ACCESSED can still potentially be set after it was supposed to
> be stable
> 
>  - The clearing happens *after* copy_user_highpage(), ie, unless I
> missed something here, we potentially still have something writing to
> the 4k page while it's being copied, which is BAD.
> 
> Now, let me know if I did miss something here :-)

An additional issue is that this all collides a bit with Alexey's work
to support TCEs in real mode in KVM, which is necessary to have "usable"
PCI pass-through.

Look at patches http://patchwork.ozlabs.org/patch/248920/ and followup,
he basically walks the page tables here in a slightly different way than
Paul does in H_ENTER. It's more like gup_fast. It will need to handle
concurrent split/collapse etc... as well which it doesn't right now.

I'm considering merging Alexei stuff first (provided I don't find major
problems with it), then you can provide a new THP series on top of it.

While at it, also fix:

 - Some of your patches are bug fixes (like the one about subpage
protection). They need to be either merged in the main patch or put
before the patch that enables THP.

 - I haven't completely yet considered the impact of the demotion of
segments, but neither do you :-) IE. Under some circumstances, we can
demote entire segments from 64K HW pages to 4K HW pages in the SLB. For
example if a driver (such as HCA) sets the 4K_PFN bit in a PTE, this
will happen at hashing time. I don't think your code deals with that at
all, am I correct ? It *might* be that the right approach with those is:

  * If you find a THP in hash_page and the segment size is 4k, fault

  * In do_page_fault, re-check for that condition (or maybe we can make
hash_page return a specific bit that gets ORed into the error_code into
do_page_fault ?) and split huge pages there.

But that's just an idea off the top of my mind, there might be a better
way. Of course this needs to be tested.

BTW. For the subpage protection, similarily, you need to make sure you
properly map the entire segment as "no THP", not just the range
passed-in by the user.

Cheers,
Ben.
 

  reply	other threads:[~2013-06-16  4:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05 15:28 [PATCH -V10 00/15] THP support for PPC64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 01/15] powerpc/mm: handle hugepage size correctly when invalidating hpte entries Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 02/15] powerpc/THP: Double the PMD table size for THP Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 03/15] powerpc/THP: Implement transparent hugepages for ppc64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 04/15] powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 05/15] powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 06/15] powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 07/15] powerpc: Update gup_pmd_range to handle transparent hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 08/15] powerpc/THP: Add code to handle HPTE faults for hugepages Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 09/15] powerpc: Make linux pagetable walk safe with THP enabled Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 10/15] powerpc: Prevent gcc to re-read the pagetables Aneesh Kumar K.V
2013-06-05 15:41   ` David Laight
2013-06-05 22:39     ` Benjamin Herrenschmidt
2013-06-05 15:28 ` [PATCH -V10 11/15] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 12/15] powerpc: Optimize hugepage invalidate Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 13/15] powerpc: disable assert_pte_locked for collapse_huge_page Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 14/15] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Aneesh Kumar K.V
2013-06-05 15:28 ` [PATCH -V10 15/15] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
2013-06-05 23:31 ` [PATCH -V10 00/15] THP support for PPC64 Benjamin Herrenschmidt
2013-06-06  0:13   ` Andrew Morton
2013-06-06  6:05     ` Aneesh Kumar K.V
2013-06-06  7:20       ` Andrew Morton
2013-06-16  2:00 ` Benjamin Herrenschmidt
2013-06-16  3:37   ` Benjamin Herrenschmidt
2013-06-16  4:06     ` Benjamin Herrenschmidt [this message]
2013-06-16 23:02       ` Benjamin Herrenschmidt
2013-06-18 18:46       ` Aneesh Kumar K.V
2013-06-18 22:03         ` Benjamin Herrenschmidt
2013-06-19  3:30           ` Aneesh Kumar K.V
2013-06-18 17:54   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1371355567.21896.101.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=aik@au1.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).