linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: clameter@sgi.com, steiner@sgi.com, holt@sgi.com, npiggin@suse.de,
	a.p.zijlstra@chello.nl, kvm-devel@lists.sourceforge.net,
	kanojsarcar@yahoo.com, rdreier@cisco.com,
	swise@opengridcomputing.com, linux-kernel@vger.kernel.org,
	avi@qumranet.com, linux-mm@kvack.org,
	general@lists.openfabrics.org, hugh@veritas.com,
	rusty@rustcorp.com.au, aliguori@us.ibm.com, chrisw@redhat.com,
	marcelo@kvack.org, dada1@cosmosbay.com, paulmck@us.ibm.com
Subject: Re: [PATCH 08 of 11] anon-vma-rwsem
Date: Thu, 8 May 2008 01:39:53 +0200	[thread overview]
Message-ID: <20080507233953.GM8276@duo.random> (raw)
In-Reply-To: <20080507155914.d7790069.akpm@linux-foundation.org>

Hi Andrew,

On Wed, May 07, 2008 at 03:59:14PM -0700, Andrew Morton wrote:
> 	CPU0:			CPU1:
> 
> 	spin_lock(global_lock)	
> 	spin_lock(a->lock);	spin_lock(b->lock);
				================== mmu_notifier_register()
> 	spin_lock(b->lock);	spin_unlock(b->lock);
> 				spin_lock(a->lock);
> 				spin_unlock(a->lock);
> 
> also OK.

But the problem is that we've to stop the critical section in the
place I marked with "========" while mmu_notifier_register
runs. Otherwise the driver calling mmu_notifier_register won't know if
it's safe to start establishing secondary sptes/tlbs. If the driver
will establish sptes/tlbs with get_user_pages/follow_page the page
could be freed immediately later when zap_page_range starts.

So if CPU1 doesn't take the global_lock before proceeding in
zap_page_range (inside vmtruncate i_mmap_lock that is represented as
b->lock above) we're in trouble.

What we can do is to replace the mm_lock with a
spin_lock(&global_lock) only if all places that takes i_mmap_lock
takes the global lock first and that hurts scalability of the fast
paths that are performance critical like vmtruncate and
anon_vma->lock. Perhaps they're not so performance critical, but
surely much more performant critical than mmu_notifier_register ;).

The idea of polluting various scalable paths like truncate() syscall
in the VM with a global spinlock frightens me, I'd rather return to
invalidate_page() inside the PT lock removing both
invalidate_range_start/end. Then all serialization against the mmu
notifiers will be provided by the PT lock that the secondary mmu page
fault also has to take in get_user_pages (or follow_page). In any case
that is a better solution that won't slowdown the VM when
MMU_NOTIFIER=y even if it's a bit slower for GRU, for KVM performance
is about the same with or without invalidate_range_start/end. I didn't
think anybody could care about how long mmu_notifier_register takes
until it returns compared to all heavyweight operations that happens
to start a VM (not only in the kernel but in the guest too).

Infact if it's security that we worry about here, can put a cap of
_time_ that mmu_notifier_register can take before it fails, and we
fail to start a VM if it takes more than 5sec, that's still fine as
the failure could happen for other reasons too like vmalloc shortage
and we already handle it just fine. This 5sec delay can't possibly happen in
practice anyway in the only interesting scenario, just like the
vmalloc shortage. This is obviously a superior solution than polluting
the VM with an useless global spinlock that will destroy truncate/AIM
on numa.

Anyway Christoph, I uploaded my last version here:

       	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v16

(applies and runs fine on 26-rc1)

You're more than welcome to takeover from it, I kind of feel my time
now may be better spent to emulate the mmu-notifier-core with kprobes.

  parent reply	other threads:[~2008-05-07 23:41 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-07 14:35 [PATCH 00 of 11] mmu notifier #v16 Andrea Arcangeli
2008-05-07 14:35 ` [PATCH 01 of 11] mmu-notifier-core Andrea Arcangeli
2008-05-07 17:35   ` Rik van Riel
2008-05-07 20:02   ` Andrew Morton
2008-05-07 20:05   ` Andrew Morton
2008-05-07 20:30     ` Linus Torvalds
2008-05-07 21:58       ` Andrea Arcangeli
2008-05-07 22:11         ` Linus Torvalds
2008-05-07 22:27           ` Andrea Arcangeli
2008-05-07 22:31             ` [ofa-general] " Roland Dreier
2008-05-07 22:39               ` Andrea Arcangeli
2008-05-07 23:03                 ` Linus Torvalds
2008-05-07 22:37             ` Andrea Arcangeli
2008-05-07 23:38               ` Linus Torvalds
2008-05-07 23:00             ` Linus Torvalds
2008-05-07 14:35 ` [PATCH 02 of 11] get_task_mm Andrea Arcangeli
2008-05-07 15:59   ` Robin Holt
2008-05-07 16:20     ` Andrea Arcangeli
2008-05-07 14:35 ` [PATCH 03 of 11] invalidate_page outside PT lock Andrea Arcangeli
2008-05-07 17:39   ` Rik van Riel
2008-05-07 17:57     ` Andrea Arcangeli
2008-05-07 14:35 ` [PATCH 04 of 11] free-pgtables Andrea Arcangeli
2008-05-07 17:41   ` Rik van Riel
2008-05-07 14:35 ` [PATCH 05 of 11] unmap vmas tlb flushing Andrea Arcangeli
2008-05-07 17:46   ` Rik van Riel
2008-05-07 14:35 ` [PATCH 06 of 11] rwsem contended Andrea Arcangeli
2008-05-07 14:35 ` [PATCH 07 of 11] i_mmap_rwsem Andrea Arcangeli
2008-05-07 14:35 ` [PATCH 08 of 11] anon-vma-rwsem Andrea Arcangeli
2008-05-07 20:56   ` Linus Torvalds
2008-05-07 21:26     ` Andrea Arcangeli
2008-05-07 21:36       ` Linus Torvalds
2008-05-07 22:22         ` Andrea Arcangeli
2008-05-07 22:31           ` Andrew Morton
2008-05-07 22:44             ` Andrea Arcangeli
2008-05-07 22:59               ` Andrew Morton
2008-05-07 23:19                 ` Linus Torvalds
2008-05-07 23:39                   ` Christoph Lameter
2008-05-08  0:03                     ` Linus Torvalds
2008-05-08  0:52                       ` Robin Holt
2008-05-08  0:56                       ` Christoph Lameter
2008-05-08  1:07                         ` Linus Torvalds
2008-05-08  1:39                         ` Linus Torvalds
2008-05-08  1:52                           ` Andrea Arcangeli
2008-05-08  1:57                             ` Linus Torvalds
2008-05-08  2:24                               ` Andrea Arcangeli
2008-05-08  2:32                                 ` Linus Torvalds
2008-05-07 23:39                 ` Andrea Arcangeli [this message]
2008-05-08  1:02                   ` Linus Torvalds
2008-05-08  1:12                     ` Christoph Lameter
2008-05-08  1:32                       ` Linus Torvalds
2008-05-08  2:56                       ` Andrea Arcangeli
2008-05-08  3:10                         ` Christoph Lameter
2008-05-08  3:41                           ` Andrea Arcangeli
2008-05-08  4:14                             ` Linus Torvalds
2008-05-08  5:20                               ` Andrea Arcangeli
2008-05-08  5:27                                 ` Pekka Enberg
2008-05-08  5:30                                   ` Pekka Enberg
2008-05-08  5:49                                     ` Andrea Arcangeli
2008-05-08 15:03                                 ` Linus Torvalds
2008-05-08 16:11                                   ` Linus Torvalds
2008-05-08 22:01                                     ` Andrea Arcangeli
2008-05-09 18:37                                     ` Peter Zijlstra
2008-05-09 18:55                                       ` Andrea Arcangeli
2008-05-09 19:04                                         ` Peter Zijlstra
2008-05-08  1:26                     ` Andrea Arcangeli
2008-05-07 23:28               ` Benjamin Herrenschmidt
2008-05-07 23:45                 ` Andrea Arcangeli
2008-05-08  1:34                   ` Andrea Arcangeli
2008-05-13 12:14                     ` Nick Piggin
2008-05-14  5:43                       ` Benjamin Herrenschmidt
2008-05-14  6:06                         ` Nick Piggin
2008-05-14 13:15                         ` Jack Steiner
2008-05-07 22:44           ` Linus Torvalds
2008-05-07 22:58             ` Andrea Arcangeli
2008-05-07 23:02               ` Andrea Arcangeli
2008-05-07 23:09               ` Linus Torvalds
2008-05-08  0:38         ` Robin Holt
2008-05-08  0:55           ` Linus Torvalds
2008-05-13 12:06           ` Nick Piggin
2008-05-13 15:32             ` Robin Holt
2008-05-14  4:11               ` Nick Piggin
2008-05-14 11:26                 ` Robin Holt
2008-05-14 15:18                   ` Linus Torvalds
2008-05-14 16:22                     ` Robin Holt
2008-05-14 16:56                       ` Linus Torvalds
2008-05-14 17:57                     ` Christoph Lameter
2008-05-14 18:27                       ` Linus Torvalds
2008-05-17  1:38                         ` mm notifier: Notifications when pages are unmapped Christoph Lameter
2008-05-15  7:57                   ` [PATCH 08 of 11] anon-vma-rwsem Nick Piggin
2008-05-15 11:01                     ` Robin Holt
2008-05-15 11:12                       ` Avi Kivity
2008-05-15 17:33                     ` Christoph Lameter
2008-05-15 23:52                       ` Nick Piggin
2008-05-16 11:23                         ` Robin Holt
2008-05-16 11:50                           ` Robin Holt
2008-05-20  5:31                             ` Nick Piggin
2008-05-20 10:01                               ` Robin Holt
2008-05-20 10:50                                 ` Nick Piggin
2008-05-20 11:05                                   ` Robin Holt
2008-05-20 11:14                                     ` Nick Piggin
2008-05-20 11:26                                       ` Robin Holt
2008-05-07 22:42       ` Jack Steiner
2008-05-07 14:35 ` [PATCH 09 of 11] mm_lock-rwsem Andrea Arcangeli
2008-05-07 14:36 ` [PATCH 10 of 11] export zap_page_range for XPMEM Andrea Arcangeli
2008-05-07 14:36 ` [PATCH 11 of 11] mmap sems Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2008-05-02 15:05 [PATCH 00 of 11] mmu notifier #v15 Andrea Arcangeli
2008-05-02 15:05 ` [PATCH 08 of 11] anon-vma-rwsem Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080507233953.GM8276@duo.random \
    --to=andrea@qumranet.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=aliguori@us.ibm.com \
    --cc=avi@qumranet.com \
    --cc=chrisw@redhat.com \
    --cc=clameter@sgi.com \
    --cc=dada1@cosmosbay.com \
    --cc=general@lists.openfabrics.org \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo@kvack.org \
    --cc=npiggin@suse.de \
    --cc=paulmck@us.ibm.com \
    --cc=rdreier@cisco.com \
    --cc=rusty@rustcorp.com.au \
    --cc=steiner@sgi.com \
    --cc=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).