linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Gerd Hoffmann <kraxel@redhat.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	Andrew Morton <akpm@osdl.org>, Nick Piggin <npiggin@suse.de>,
	kvm-devel@lists.sourceforge.net,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	steiner@sgi.com, linux-kernel@vger.kernel.org,
	Avi Kivity <avi@qumranet.com>,
	linux-mm@kvack.org, daniel.blueman@quadrics.com, holt@sgi.com,
	Hugh Dickins <hugh@veritas.com>
Subject: Re: [kvm-devel] [PATCH] export notifier #1
Date: Wed, 23 Jan 2008 16:41:30 +0100	[thread overview]
Message-ID: <20080123154130.GC7141@v2.random> (raw)
In-Reply-To: <4797384B.7080200@redhat.com>

Hi Kraxel,

On Wed, Jan 23, 2008 at 01:51:23PM +0100, Gerd Hoffmann wrote:
> That would render the notifies useless for Xen too.  Xen needs to
> intercept the actual pte clear and instead of just zapping it use the
> hypercall to do the unmap and release the grant.

I think it has yet to be demonstrated that doing the invalidate
_before_ clearing the linux pte is workable at all for
shadow-pte/RDMA. Infact even doing it _after_ still requires some form
of serialization but it's less obviously broken and perhaps more
fixable unlike doing it before that seems hardly fixable given the
refill event running in the remote node is supposed to wait on a
bitflag of a page in the master node to return ON. What Christoph
didn't specify after hinting you have to wait for the PageExported
bitflag to return on, is that such page may be back in the freelist by
the time the secondary-tlb page fault starts checking that bit. And
nobody is setting that bit anyway in the VM so good luck waiting that
bit to return on in a page in the freelist (nothing keeps that page
pinned anymore by the time the ->invalidate_page returns: the whole
point of the invalidate is so the VM code can finally free it and put
it in the freelist).

Until there's some more reasonable theory of how invalidating the
remote tlbs/ptes _before_ the main linux pte can remotely work, I'm
"quite" skeptical it's the way to go for the invalidate_page callback.

Like Avi said, Xen is dealing with the linux pte only, so there's no
racy smp page fault to serialize against. Perhaps we can add another
notifier for Xen though.

But I think it's still not enough for Xen to have a method called
before the ptep_clear_flush: rmap.c would get confused in
page_mkclean_one for example. It might be possible that vm_ops is the
right way for you even if it further clutters the VM. Like Avi pointed
me out once, with our current mmu_notifiers we can export the KVM
address space with remote dma and keep swapping the whole KVM asset
just fine despite the triple MMU running the system (qemu using linux
pte, KVM using spte, quadrics using pcimmu). And the core Linux VM
code (not some obscure hypervisor) will deal with all aging and VM
issues like a normal task (especially with my last patch that reflects
the accessed bitflag in the spte the same way the accessed bitflag is
reflected for the regular ptes).

Nevertheless if you've any idea on how to use the notifiers for Xen
I'd be glad to help. Perhaps one workable way to change my patch to
work for you could be to pass the retval of ptep_clear_flush to the
notifiers themself. something like:

#define ptep_clear_flush(__vma, __address, __ptep)			\
({									\
	pte_t __pte;							\
	__pte = ptep_get_and_clear((__vma)->vm_mm, __address, __ptep);	\
	flush_tlb_page(__vma, __address);				\
	__pte = mmu_notifier(invalidate_page, (__vma)->vm_mm, __address, __pte, __ptep);	\
	__pte;								\
})

But this would kind of need an exclusive registration or this loop
wouldn't work well if everyone pretends to overwrite the memory
pointed by __ptep with its own value calculated in function of __pte.

    for_each_notifier(mn,mm)
        mn->invalidate_page(mm, __address, __pte, __ptep);

You could get a pte_none page fault in between the old value and the
new value though. (but you wouldn't need to flush the tlb inside
invalidate_pte, only us need to flush the secondary tlb for the spte
inside the invalidate_page obviously)

Let me know if you're interested in the above.

Thanks!


  parent reply	other threads:[~2008-01-23 15:41 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13 16:24 [PATCH] mmu notifiers #v2 Andrea Arcangeli
2008-01-13 21:11 ` Benjamin Herrenschmidt
2008-01-14 20:02 ` Christoph Lameter
2008-01-15  4:28   ` Benjamin Herrenschmidt
2008-01-15 12:44   ` Andrea Arcangeli
2008-01-15 20:18     ` Benjamin Herrenschmidt
2008-01-16  1:06       ` Andrea Arcangeli
2008-01-16  9:01 ` Brice Goglin
2008-01-16 10:19   ` Andrea Arcangeli
2008-01-16 17:42 ` Rik van Riel
2008-01-16 17:48   ` Izik Eidus
2008-01-17 16:23     ` Andrea Arcangeli
2008-01-17 18:21       ` Izik Eidus
2008-01-17 19:32         ` Andrea Arcangeli
2008-01-21 12:52           ` [PATCH] mmu notifiers #v3 Andrea Arcangeli
2008-01-22  2:21             ` Rik van Riel
2008-01-22 14:12             ` [kvm-devel] " Avi Kivity
2008-01-22 14:43               ` Andrea Arcangeli
2008-01-22 20:08                 ` [kvm-devel] [PATCH] mmu notifiers #v4 Andrea Arcangeli
2008-01-22 20:34                   ` [kvm-devel] [PATCH] export notifier #1 Christoph Lameter
2008-01-22 22:31                     ` Andrea Arcangeli
2008-01-22 22:53                       ` Christoph Lameter
2008-01-23 10:27                         ` Avi Kivity
2008-01-23 10:52                           ` Robin Holt
2008-01-23 12:04                             ` Andrea Arcangeli
2008-01-23 12:34                               ` Robin Holt
2008-01-23 19:48                               ` Christoph Lameter
2008-01-23 19:58                                 ` Robin Holt
2008-01-23 19:47                             ` Christoph Lameter
2008-01-24  5:56                               ` Avi Kivity
2008-01-24 12:26                                 ` Andrea Arcangeli
2008-01-24 12:34                                   ` Avi Kivity
2008-01-23 11:41                         ` Andrea Arcangeli
2008-01-23 12:32                           ` Robin Holt
2008-01-23 17:33                             ` Andrea Arcangeli
2008-01-23 20:27                               ` Christoph Lameter
2008-01-24 15:42                                 ` Andrea Arcangeli
2008-01-24 20:07                                   ` Christoph Lameter
2008-01-25  6:35                                     ` Avi Kivity
2008-01-23 20:18                           ` Christoph Lameter
2008-01-24 14:34                             ` Andrea Arcangeli
2008-01-24 14:41                               ` Andrea Arcangeli
2008-01-24 15:15                               ` Avi Kivity
2008-01-24 15:18                                 ` Avi Kivity
2008-01-24 20:01                               ` Christoph Lameter
2008-01-22 23:36                     ` Benjamin Herrenschmidt
2008-01-23  0:40                       ` Christoph Lameter
2008-01-23  1:21                         ` Robin Holt
2008-01-23 12:51                     ` Gerd Hoffmann
2008-01-23 13:19                       ` Robin Holt
2008-01-23 14:12                         ` Gerd Hoffmann
2008-01-23 14:18                           ` Robin Holt
2008-01-23 14:35                             ` Gerd Hoffmann
2008-01-23 15:48                               ` Robin Holt
2008-01-23 14:17                         ` Avi Kivity
2008-01-24  4:03                           ` Benjamin Herrenschmidt
2008-01-23 15:41                       ` Andrea Arcangeli [this message]
2008-01-23 17:47                         ` Gerd Hoffmann
2008-01-24  6:01                           ` Avi Kivity
2008-01-24  6:45                           ` Jeremy Fitzhardinge
2008-01-23 20:40                         ` Christoph Lameter
2008-01-24  2:00                   ` Enhance mmu notifiers to accomplish a lockless implementation (incomplete) Robin Holt
2008-01-24  4:05                     ` Robin Holt
2008-01-22 19:28             ` [PATCH] mmu notifiers #v3 Peter Zijlstra
2008-01-22 20:31               ` Christoph Lameter
2008-01-22 20:31               ` Andrea Arcangeli
2008-01-22 22:10                 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080123154130.GC7141@v2.random \
    --to=andrea@qumranet.com \
    --cc=akpm@osdl.org \
    --cc=avi@qumranet.com \
    --cc=benh@kernel.crashing.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=kraxel@redhat.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).