From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756397AbYAQSbP (ORCPT ); Thu, 17 Jan 2008 13:31:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752960AbYAQSa7 (ORCPT ); Thu, 17 Jan 2008 13:30:59 -0500 Received: from mis011.exch011.intermedia.net ([64.78.21.10]:12346 "EHLO mis011.exch011.intermedia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751621AbYAQSa6 (ORCPT ); Thu, 17 Jan 2008 13:30:58 -0500 Message-ID: <478F9C9C.7070500@qumranet.com> Date: Thu, 17 Jan 2008 20:21:16 +0200 From: Izik Eidus User-Agent: Thunderbird 2.0.0.9 (X11/20071115) MIME-Version: 1.0 To: Andrea Arcangeli CC: Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm-devel@lists.sourceforge.net, Avi Kivity , clameter@sgi.com, daniel.blueman@quadrics.com, holt@sgi.com, steiner@sgi.com, Andrew Morton , Hugh Dickins , Nick Piggin , Benjamin Herrenschmidt , andrea@qumranet.com Subject: Re: [PATCH] mmu notifiers #v2 References: <20080113162418.GE8736@v2.random> <20080116124256.44033d48@bree.surriel.com> <478E4356.7030303@qumranet.com> <20080117162302.GI7170@v2.random> In-Reply-To: <20080117162302.GI7170@v2.random> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 17 Jan 2008 18:30:56.0929 (UTC) FILETIME=[19B2F110:01C85937] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Andrea Arcangeli wrote: > On Wed, Jan 16, 2008 at 07:48:06PM +0200, Izik Eidus wrote: > >> Rik van Riel wrote: >> >>> On Sun, 13 Jan 2008 17:24:18 +0100 >>> Andrea Arcangeli wrote: >>> >>> >>> >>>> In my basic initial patch I only track the tlb flushes which should be >>>> the minimum required to have a nice linux-VM controlled swapping >>>> behavior of the KVM gphysical memory. >>>> >>> I have a vaguely related question on KVM swapping. >>> >>> Do page accesses inside KVM guests get propagated to the host >>> OS, so Linux can choose a reasonable page for eviction, or is >>> the pageout of KVM guest pages essentially random? >>> > > Right, selection of the guest OS pages to swap is partly random but > wait: _only_ for the long-cached and hot spte entries. It's certainly > not entirely random. > > As the shadow-cache is a bit dynamic, every new instantiated spte will > refresh the PG_referenced bit in follow_page already (through minor > faults). not-present fault of swapped non-present sptes, can trigger > minor faults from swapcache too and they'll refresh young regular > ptes. > > >> right now when kvm remove pte from the shadow cache, it mark as access the >> page that this pte pointed to. >> > > Yes: the referenced bit in the mmu-notifier invalidate case isn't > useful because it's set right before freeing the page. > > >> it was a good solution untill the mmut notifiers beacuse the pages were >> pinned and couldnt be swapped to disk >> > > It probably still makes sense for sptes removed because of other > reasons (not mmu notifier invalidates). > agree > >> so now it will have to do something more sophisticated or at least mark as >> access every page pointed by pte >> that get insrted to the shadow cache.... >> > > I think that should already be the case, see the mark_page_accessed in > follow_page, isn't FOLL_TOUCH set, isn't it? > yes you are right FOLL_TOUCH is set. > The only thing we clearly miss is a logic that refreshes the > PG_referenced bitflag for "hot" sptes that remains instantiated and > cached for a long time. For regular linux ptes this is done by the cpu > through the young bitflag. But note that not all architectures have > the young bitflag support in hardware! So I suppose the swapping of > the KVM task, is like the swapping any other task but on an alpha > CPU. It works good enough in practice even if we clearly have room for > further optimizations in this area (like there would be on archs w/o > young bit updated in hardware too). > > To refresh the PG_referenced bit for long lived hot sptes, I think the > easiest solution is to chain the sptes in a lru, and to start dropping > them when memory pressure start. We could drop one spte every X pages > collected by the VM. So the "age" time factor depends on the VM > velocity and we totally avoid useless shadow page faults when there's > no VM pressure. When VM pressure increases, the kvm non-present fault > will then take care to refresh the PG_referenced bit. This should > solve the aging-issue for long lived and hot sptes. This should > improve the responsiveness of the guest OS during "initial" swap > pressure (after the initial swap pressure, the working set finds > itself in ram again). So it should avoid some swapout/swapin not > required jitter during the initial swap. I see this mostly as a kvm > internal optimization, not strictly related to the mmu notifiers > though. > ohh i like it, this is cleaver solution, and i guess the cost of the vmexits wont be too high if it will be not too much aggressive....