From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764292AbYBTAwX (ORCPT ); Tue, 19 Feb 2008 19:52:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758064AbYBTAwO (ORCPT ); Tue, 19 Feb 2008 19:52:14 -0500 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:59975 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750924AbYBTAwN (ORCPT ); Tue, 19 Feb 2008 19:52:13 -0500 Date: Wed, 20 Feb 2008 01:52:06 +0100 From: Andrea Arcangeli To: Nick Piggin Cc: Jack Steiner , akpm@linux-foundation.org, Robin Holt , Avi Kivity , Izik Eidus , kvm-devel@lists.sourceforge.net, Peter Zijlstra , general@lists.openfabrics.org, Steve Wise , Roland Dreier , Kanoj Sarcar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com, Christoph Lameter Subject: Re: [patch] my mmu notifiers Message-ID: <20080220005206.GP7128@v2.random> References: <20080219084357.GA22249@wotan.suse.de> <20080219135851.GI7128@v2.random> <20080219142725.GA23200@sgi.com> <20080219230427.GB18912@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080219230427.GB18912@wotan.suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote: > On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote: > > > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote: > > > > understand the need for invalidate_begin/invalidate_end pairs at all. > > > > > > The need of the pairs is crystal clear to me: range_begin is needed > > > for GRU _but_only_if_ range_end is called after releasing the > > > reference that the VM holds on the page. _begin will flush the GRU tlb > > > and at the same time it will take a mutex that will block further GRU > > > tlb-miss-interrupts (no idea how they manange those nightmare locking, > > > I didn't even try to add more locking to KVM and I get away with the > > > fact KVM takes the pin on the page itself). > > > > As it turns out, no actual mutex is required. _begin_ simply increments a > > count of active range invalidates, _end_ decrements the count. New TLB > > dropins are deferred while range callouts are active. > > > > This would appear to be racy but the GRU has special hardware that > > simplifies locking. When the GRU sees a TLB invalidate, all outstanding > > misses & potentially inflight TLB dropins are marked by the GRU with a > > "kill" bit. When the dropin finally occurs, the dropin is ignored & the > > instruction is simply restarted. The instruction will fault again & the TLB > > dropin will be repeated. This is optimized for the case where invalidates > > are rare - true for users of the GRU. > > OK (thanks to Robin as well). Now I understand why you are using it, > but I don't understand why you don't defer new TLBs after the point > where the linux pte changes. If you can do that, then you look and > act much more like a TLB from the point of view of the Linux vm. Christoph was forced to put the invalidate_range callback _after_ dropping the PT lock because xpmem has to wait I/O there. But invalidate_range is called after freeing the VM reference on the pages so then GRU needed a _range_begin too because GRU has to flush the tlb before the VM reference on the page is released (xpmem and KVM pin the pages mapped by the secondary mmu, GRU doesn't). So then invalidate_range was renamed to invalidate_range_end.