From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761321AbYCGRuc (ORCPT ); Fri, 7 Mar 2008 12:50:32 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758050AbYCGRuW (ORCPT ); Fri, 7 Mar 2008 12:50:22 -0500 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:58887 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754280AbYCGRuW (ORCPT ); Fri, 7 Mar 2008 12:50:22 -0500 Date: Fri, 7 Mar 2008 18:50:19 +0100 From: Andrea Arcangeli To: Peter Zijlstra Cc: Christoph Lameter , Jack Steiner , Nick Piggin , akpm@linux-foundation.org, Robin Holt , Avi Kivity , kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org, Steve Wise , Roland Dreier , Kanoj Sarcar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com Subject: Re: [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4) Message-ID: <20080307175019.GK24114@v2.random> References: <20080303220502.GA5301@v2.random> <47CC9B57.5050402@qumranet.com> <20080304133020.GC5301@v2.random> <20080304222030.GB8951@v2.random> <20080307151722.GD24114@v2.random> <20080307152328.GE24114@v2.random> <1204908762.8514.114.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1204908762.8514.114.camel@twins> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 07, 2008 at 05:52:42PM +0100, Peter Zijlstra wrote: > hlist_del_rcu(&mn->hlist) > > > + rcu_read_unlock(); > > kfree(mn); > > > young |= mn->ops->clear_flush_young(mn, mm, address); > > *BANG* My objective was to allow mmu_notifier_register/unregister to be called with the same mmu notifier object, I didn't mean the object could have been freed until ->release is called. However you reminded me that after unregistering ->release won't be called so unregister isn't very useful and I doubt we can keep it ;). In the meantime I've also been thinking that we could need the write_seqlock in mmu_notifier_register, to know when to restart the loop if somebody does a mmu_notifier_register; synchronize_rcu(). Otherwise there's no way to be sure the mmu notifier will start firing immediately after synchronize_rcu. I'm unsure if it's acceptable that in-progress mmu notifier invocations, don't need to notice the fact that somebody did mmu_notifier_register; synchronize_rcu. If they don't need to notice, then we can just drop unregister and all rcu_read_lock()s instead of adding write_seqlock to the register operation. Overall my effort is to try to avoid expand the list walk with explicit memory barriers like in EMM while trying to be equally efficient. Another issue is that the _begin/_end logic doesn't provide any guarantee that the _begin will start firing before _end, if a kernel module is loaded while another cpu is already running inside some munmap operation etc.. The KVM usage of mmu notifier has no problem with that detail, but KVM doesn't use _begin at all, I wonder if others would have problems. This is a kind of a separate problem, but quite related to the question if the notifiers must be guaranteed to start firing immediately after mmu_notifier_unregister;synchronize_rcu or not, that's why I mentioned it here. Once I get comments on the suggested direction for these details, I'll quickly repost a replacement patch for 3/4. Thanks Peter!