From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750757AbVJXTMJ (ORCPT ); Mon, 24 Oct 2005 15:12:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751133AbVJXTMJ (ORCPT ); Mon, 24 Oct 2005 15:12:09 -0400 Received: from gold.veritas.com ([143.127.12.110]:55911 "EHLO gold.veritas.com") by vger.kernel.org with ESMTP id S1750757AbVJXTMI (ORCPT ); Mon, 24 Oct 2005 15:12:08 -0400 Date: Mon, 24 Oct 2005 20:11:10 +0100 (BST) From: Hugh Dickins X-X-Sender: hugh@goblin.wat.veritas.com To: David Howells cc: Anton Altaparmakov , Andrew Morton , torvalds@osdl.org, Christoph Hellwig , linux-kernel@vger.kernel.org Subject: Re: [PATCH] Add notification of page becoming writable to VMA ops In-Reply-To: <9792.1130171024@warthog.cambridge.redhat.com> Message-ID: References: <1130168619.19518.43.camel@imp.csi.cam.ac.uk> <1130167005.19518.35.camel@imp.csi.cam.ac.uk> <7872.1130167591@warthog.cambridge.redhat.com> <9792.1130171024@warthog.cambridge.redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-OriginalArrivalTime: 24 Oct 2005 19:12:08.0008 (UTC) FILETIME=[D3E96C80:01C5D8CE] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 24 Oct 2005, David Howells wrote: > > The attached patch adds a new VMA operation to notify a filesystem or other > driver about the MMU generating a fault because userspace attempted to write > to a page mapped through a read-only PTE. > > This facility permits the filesystem or driver to: > > (*) Implement storage allocation/reservation on attempted write, and so to > deal with problems such as ENOSPC more gracefully (perhaps by generating > SIGBUS). > > (*) Delay making the page writable until the contents have been written to a > backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS. > It permits the filesystem to have some guarantee about the state of the > cache. I've only given it a quick look, it looks pretty good, but too hastily thrown together, without understanding of the intervening changes: > --- linux-2.6.14-rc4-mm1/mm/memory.c 2005-10-17 14:26:44.000000000 +0100 > +++ linux-2.6.14-rc4-mm1-cachefs/mm/memory.c 2005-10-20 18:53:04.000000000 +0100 > @@ -1261,19 +1261,53 @@ static int do_wp_page(struct mm_struct * > + if (unlikely(vma->vm_flags & VM_SHARED)) { > + if (vma->vm_ops && vma->vm_ops->page_mkwrite) { > + /* > + * Notify the page owner without the lock held, > + * so they can sleep if they want to. > + */ > + pte_unmap(page_table); > + if (!PageReserved(old_page)) > + page_cache_get(old_page); > + spin_unlock(&mm->page_table_lock); No, you need to pay attention to Nick's PageReserved removal, and my pte lock stuff, throughout do_wp_page - there shouldn't be any references to PageReserved or page_table_lock there now (and you'll need to recheck the mapping/locking/unlocking/unmapping). Sorry, I don't have the time to spare to do it myself right now. > + page_table = pte_offset_map(pmd, address); > + if (!pte_same(*page_table, orig_pte)) { > + ret |= VM_FAULT_WRITE; No, don't add VM_FAULT_WRITE in this case: you should only do that when you've gone through the maybe_mkwrite yourself; this case should remain the default VM_FAULT_MINOR. > @@ -1847,18 +1890,28 @@ retry: > + } else { > + /* if the page will be shareable, see if the backing > + * address space wants to know that the page is about > + * to become writable */ > + if (vma->vm_ops->page_mkwrite && > + vma->vm_ops->page_mkwrite(vma, new_page) < 0) > + return VM_FAULT_SIGBUS; > + } > } This isn't necessarily wrong, and may be exactly how it was before, I don't remember. But it implies that when page_mkwrite fails, it page_cache_releases the page. Is that desirable? Or should that be left to the caller? > @@ -1945,7 +1998,7 @@ static int do_file_page(struct mm_struct Drop all those changes to do_file_page (which I added), they're no longer necessary. A case appeared which made it clear that we cannot rely on resolving this issue for get_user_pages in a single call to handle_mm_fault, and that's why the VM_FAULT_WRITE stuff got added. This complication of do_file_page was always ugly, and I'm delighted to drop it. Whereas the call to do_wp_page from do_swap_page is less obtrusive and may still be a worthwhile optimization, though I added it for the same disgraced reason a year or more back. Hugh