linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@cpushare.com>
To: Izik Eidus <izike@qumranet.com>
Cc: Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm-devel@lists.sourceforge.net, Avi Kivity <avi@qumranet.com>,
	clameter@sgi.com, daniel.blueman@quadrics.com, holt@sgi.com,
	steiner@sgi.com, Andrew Morton <akpm@osdl.org>,
	Hugh Dickins <hugh@veritas.com>, Nick Piggin <npiggin@suse.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	andrea@qumranet.com
Subject: Re: [PATCH] mmu notifiers #v2
Date: Thu, 17 Jan 2008 17:23:02 +0100	[thread overview]
Message-ID: <20080117162302.GI7170@v2.random> (raw)
In-Reply-To: <478E4356.7030303@qumranet.com>

On Wed, Jan 16, 2008 at 07:48:06PM +0200, Izik Eidus wrote:
> Rik van Riel wrote:
>> On Sun, 13 Jan 2008 17:24:18 +0100
>> Andrea Arcangeli <andrea@qumranet.com> wrote:
>>
>>   
>>> In my basic initial patch I only track the tlb flushes which should be
>>> the minimum required to have a nice linux-VM controlled swapping
>>> behavior of the KVM gphysical memory.     
>>
>> I have a vaguely related question on KVM swapping.
>>
>> Do page accesses inside KVM guests get propagated to the host
>> OS, so Linux can choose a reasonable page for eviction, or is
>> the pageout of KVM guest pages essentially random?

Right, selection of the guest OS pages to swap is partly random but
wait: _only_ for the long-cached and hot spte entries. It's certainly
not entirely random.

As the shadow-cache is a bit dynamic, every new instantiated spte will
refresh the PG_referenced bit in follow_page already (through minor
faults). not-present fault of swapped non-present sptes, can trigger
minor faults from swapcache too and they'll refresh young regular
ptes.

> right now when kvm remove pte from the shadow cache, it mark as access the 
> page that this pte pointed to.

Yes: the referenced bit in the mmu-notifier invalidate case isn't
useful because it's set right before freeing the page.

> it was a good solution untill the mmut notifiers beacuse the pages were 
> pinned and couldnt be swapped to disk

It probably still makes sense for sptes removed because of other
reasons (not mmu notifier invalidates).

> so now it will have to do something more sophisticated or at least mark as 
> access every page pointed by pte
> that get insrted to the shadow cache....

I think that should already be the case, see the mark_page_accessed in
follow_page, isn't FOLL_TOUCH set, isn't it?

The only thing we clearly miss is a logic that refreshes the
PG_referenced bitflag for "hot" sptes that remains instantiated and
cached for a long time. For regular linux ptes this is done by the cpu
through the young bitflag. But note that not all architectures have
the young bitflag support in hardware! So I suppose the swapping of
the KVM task, is like the swapping any other task but on an alpha
CPU. It works good enough in practice even if we clearly have room for
further optimizations in this area (like there would be on archs w/o
young bit updated in hardware too).

To refresh the PG_referenced bit for long lived hot sptes, I think the
easiest solution is to chain the sptes in a lru, and to start dropping
them when memory pressure start. We could drop one spte every X pages
collected by the VM. So the "age" time factor depends on the VM
velocity and we totally avoid useless shadow page faults when there's
no VM pressure. When VM pressure increases, the kvm non-present fault
will then take care to refresh the PG_referenced bit. This should
solve the aging-issue for long lived and hot sptes. This should
improve the responsiveness of the guest OS during "initial" swap
pressure (after the initial swap pressure, the working set finds
itself in ram again). So it should avoid some swapout/swapin not
required jitter during the initial swap. I see this mostly as a kvm
internal optimization, not strictly related to the mmu notifiers
though.

  reply	other threads:[~2008-01-17 16:34 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13 16:24 [PATCH] mmu notifiers #v2 Andrea Arcangeli
2008-01-13 21:11 ` Benjamin Herrenschmidt
2008-01-14 20:02 ` Christoph Lameter
2008-01-15  4:28   ` Benjamin Herrenschmidt
2008-01-15 12:44   ` Andrea Arcangeli
2008-01-15 20:18     ` Benjamin Herrenschmidt
2008-01-16  1:06       ` Andrea Arcangeli
2008-01-16  9:01 ` Brice Goglin
2008-01-16 10:19   ` Andrea Arcangeli
2008-01-16 17:42 ` Rik van Riel
2008-01-16 17:48   ` Izik Eidus
2008-01-17 16:23     ` Andrea Arcangeli [this message]
2008-01-17 18:21       ` Izik Eidus
2008-01-17 19:32         ` Andrea Arcangeli
2008-01-21 12:52           ` [PATCH] mmu notifiers #v3 Andrea Arcangeli
2008-01-22  2:21             ` Rik van Riel
2008-01-22 14:12             ` [kvm-devel] " Avi Kivity
2008-01-22 14:43               ` Andrea Arcangeli
2008-01-22 20:08                 ` [kvm-devel] [PATCH] mmu notifiers #v4 Andrea Arcangeli
2008-01-22 20:34                   ` [kvm-devel] [PATCH] export notifier #1 Christoph Lameter
2008-01-22 22:31                     ` Andrea Arcangeli
2008-01-22 22:53                       ` Christoph Lameter
2008-01-23 10:27                         ` Avi Kivity
2008-01-23 10:52                           ` Robin Holt
2008-01-23 12:04                             ` Andrea Arcangeli
2008-01-23 12:34                               ` Robin Holt
2008-01-23 19:48                               ` Christoph Lameter
2008-01-23 19:58                                 ` Robin Holt
2008-01-23 19:47                             ` Christoph Lameter
2008-01-24  5:56                               ` Avi Kivity
2008-01-24 12:26                                 ` Andrea Arcangeli
2008-01-24 12:34                                   ` Avi Kivity
2008-01-23 11:41                         ` Andrea Arcangeli
2008-01-23 12:32                           ` Robin Holt
2008-01-23 17:33                             ` Andrea Arcangeli
2008-01-23 20:27                               ` Christoph Lameter
2008-01-24 15:42                                 ` Andrea Arcangeli
2008-01-24 20:07                                   ` Christoph Lameter
2008-01-25  6:35                                     ` Avi Kivity
2008-01-23 20:18                           ` Christoph Lameter
2008-01-24 14:34                             ` Andrea Arcangeli
2008-01-24 14:41                               ` Andrea Arcangeli
2008-01-24 15:15                               ` Avi Kivity
2008-01-24 15:18                                 ` Avi Kivity
2008-01-24 20:01                               ` Christoph Lameter
2008-01-22 23:36                     ` Benjamin Herrenschmidt
2008-01-23  0:40                       ` Christoph Lameter
2008-01-23  1:21                         ` Robin Holt
2008-01-23 12:51                     ` Gerd Hoffmann
2008-01-23 13:19                       ` Robin Holt
2008-01-23 14:12                         ` Gerd Hoffmann
2008-01-23 14:18                           ` Robin Holt
2008-01-23 14:35                             ` Gerd Hoffmann
2008-01-23 15:48                               ` Robin Holt
2008-01-23 14:17                         ` Avi Kivity
2008-01-24  4:03                           ` Benjamin Herrenschmidt
2008-01-23 15:41                       ` Andrea Arcangeli
2008-01-23 17:47                         ` Gerd Hoffmann
2008-01-24  6:01                           ` Avi Kivity
2008-01-24  6:45                           ` Jeremy Fitzhardinge
2008-01-23 20:40                         ` Christoph Lameter
2008-01-24  2:00                   ` Enhance mmu notifiers to accomplish a lockless implementation (incomplete) Robin Holt
2008-01-24  4:05                     ` Robin Holt
2008-01-22 19:28             ` [PATCH] mmu notifiers #v3 Peter Zijlstra
2008-01-22 20:31               ` Christoph Lameter
2008-01-22 20:31               ` Andrea Arcangeli
2008-01-22 22:10                 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080117162302.GI7170@v2.random \
    --to=andrea@cpushare.com \
    --cc=akpm@osdl.org \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=benh@kernel.crashing.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=izike@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).