From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boaz Harrosh Subject: Re: [PATCH 1/3] mm(v4.1): New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Date: Tue, 07 Apr 2015 11:51:40 +0300 Message-ID: <55239A9C.5060303@plexistor.com> References: <55239645.9000507@plexistor.com> <552397E6.5030506@plexistor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Stable Tree To: Boaz Harrosh , Dave Chinner , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , Jan Kara , Hugh Dickins , Mel Gorman , linux-mm@kvack.org, linux-nvdimm , linux-fsdevel , Eryu Guan , Christoph Hellwig Return-path: Received: from mail-wg0-f47.google.com ([74.125.82.47]:34526 "EHLO mail-wg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751242AbbDGIvp (ORCPT ); Tue, 7 Apr 2015 04:51:45 -0400 Received: by wgbdm7 with SMTP id dm7so48667802wgb.1 for ; Tue, 07 Apr 2015 01:51:43 -0700 (PDT) In-Reply-To: <552397E6.5030506@plexistor.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 04/07/2015 11:40 AM, Boaz Harrosh wrote: > Crap this is the wrong version I have a [v3] with some more of Kirill's comments fixes. Will resend Sorry for the noise Boaz > [v2] > Based on linux-next/akpm [3dc4623]. For v4.1 merge window > Incorporated comments from Andrew And Kirill > > [v1] > This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) > to get notified when access is a write to a read-only PFN. > > This can happen if we mmap() a file then first mmap-read from it > to page-in a read-only PFN, than we mmap-write to the same page. > > We need this functionality to fix a DAX bug, where in the scenario > above we fail to set ctime/mtime though we modified the file. > An xfstest is attached to this patchset that shows the failure > and the fix. (A DAX patch will follow) > > This functionality is extra important for us, because upon > dirtying of a pmem page we also want to RDMA the page to a > remote cluster node. > > We define a new pfn_mkwrite and do not reuse page_mkwrite because > 1 - The name ;-) > 2 - But mainly because it would take a very long and tedious > audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP > users. To make sure they do not now CRASH. For example current > DAX code (which this is for) would crash. > If we would want to reuse page_mkwrite, We will need to first > patch all users, so to not-crash-on-no-page. Then enable this > patch. But even if I did that I would not sleep so well at night. > Adding a new vector is the safest thing to do, and is not that > expensive. an extra pointer at a static function vector per driver. > Also the new vector is better for performance, because else we > Will call all current Kernel vectors, so to: > check-ha-no-page-do-nothing and return. > > No need to call it from do_shared_fault because do_wp_page is called to > change pte permissions anyway. > > CC: Matthew Wilcox > CC: Kirill A. Shutemov > CC: Jan Kara > CC: Andrew Morton > CC: Hugh Dickins > CC: Mel Gorman > CC: linux-mm@kvack.org > > Signed-off-by: Yigal Korman > Signed-off-by: Boaz Harrosh > --- > include/linux/mm.h | 3 +++ > mm/memory.c | 35 +++++++++++++++++++++++++++++++---- > 2 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index d584b95..70c47f2 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -251,6 +251,9 @@ struct vm_operations_struct { > * writable, if an error is returned it will cause a SIGBUS */ > int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); > > + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ > + int (*pfn_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); > + > /* called by access_process_vm when get_user_pages() fails, typically > * for use by special VMAs that can switch between memory and hardware > */ > diff --git a/mm/memory.c b/mm/memory.c > index 59f6268..6e8f3f6 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1982,6 +1982,19 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, > return ret; > } > > +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) > +{ > + struct vm_fault vmf = { > + .page = 0, > + .pgoff = (((address & PAGE_MASK) - vma->vm_start) > + >> PAGE_SHIFT) + vma->vm_pgoff, > + .virtual_address = (void __user *)(address & PAGE_MASK), > + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE, > + }; > + > + return vma->vm_ops->pfn_mkwrite(vma, &vmf); > +} > + > /* > * Handle write page faults for pages that can be reused in the current vma > * > @@ -2259,14 +2272,28 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > * VM_PFNMAP VMA. > * > * We should not cow pages in a shared writeable mapping. > - * Just mark the pages writable as we can't do any dirty > - * accounting on raw pfn maps. > + * Just mark the pages writable and/or call ops->pfn_mkwrite. > */ > if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == > - (VM_WRITE|VM_SHARED)) > + (VM_WRITE|VM_SHARED)) { > + if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { > + int ret; > + > + pte_unmap_unlock(page_table, ptl); > + ret = do_pfn_mkwrite(vma, address); > + if (ret & VM_FAULT_ERROR) > + return ret; > + page_table = pte_offset_map_lock(mm, pmd, > + address, &ptl); > + /* Did pfn_mkwrite already fixed up the pte */ > + if (!pte_same(*page_table, orig_pte)) { > + pte_unmap_unlock(page_table, ptl); > + return ret; > + } > + } > return wp_page_reuse(mm, vma, address, page_table, ptl, > orig_pte, old_page, 0, 0); > - > + } > pte_unmap_unlock(page_table, ptl); > return wp_page_copy(mm, vma, address, page_table, pmd, > orig_pte, old_page); > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <55239A9C.5060303@plexistor.com> Date: Tue, 07 Apr 2015 11:51:40 +0300 From: Boaz Harrosh MIME-Version: 1.0 To: Boaz Harrosh , Dave Chinner , Matthew Wilcox , Andrew Morton , "Kirill A. Shutemov" , Jan Kara , Hugh Dickins , Mel Gorman , linux-mm@kvack.org, linux-nvdimm , linux-fsdevel , Eryu Guan , Christoph Hellwig CC: Stable Tree Subject: Re: [PATCH 1/3] mm(v4.1): New pfn_mkwrite same as page_mkwrite for VM_PFNMAP References: <55239645.9000507@plexistor.com> <552397E6.5030506@plexistor.com> In-Reply-To: <552397E6.5030506@plexistor.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On 04/07/2015 11:40 AM, Boaz Harrosh wrote: > Crap this is the wrong version I have a [v3] with some more of Kirill's comments fixes. Will resend Sorry for the noise Boaz > [v2] > Based on linux-next/akpm [3dc4623]. For v4.1 merge window > Incorporated comments from Andrew And Kirill > > [v1] > This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) > to get notified when access is a write to a read-only PFN. > > This can happen if we mmap() a file then first mmap-read from it > to page-in a read-only PFN, than we mmap-write to the same page. > > We need this functionality to fix a DAX bug, where in the scenario > above we fail to set ctime/mtime though we modified the file. > An xfstest is attached to this patchset that shows the failure > and the fix. (A DAX patch will follow) > > This functionality is extra important for us, because upon > dirtying of a pmem page we also want to RDMA the page to a > remote cluster node. > > We define a new pfn_mkwrite and do not reuse page_mkwrite because > 1 - The name ;-) > 2 - But mainly because it would take a very long and tedious > audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP > users. To make sure they do not now CRASH. For example current > DAX code (which this is for) would crash. > If we would want to reuse page_mkwrite, We will need to first > patch all users, so to not-crash-on-no-page. Then enable this > patch. But even if I did that I would not sleep so well at night. > Adding a new vector is the safest thing to do, and is not that > expensive. an extra pointer at a static function vector per driver. > Also the new vector is better for performance, because else we > Will call all current Kernel vectors, so to: > check-ha-no-page-do-nothing and return. > > No need to call it from do_shared_fault because do_wp_page is called to > change pte permissions anyway. > > CC: Matthew Wilcox > CC: Kirill A. Shutemov > CC: Jan Kara > CC: Andrew Morton > CC: Hugh Dickins > CC: Mel Gorman > CC: linux-mm@kvack.org > > Signed-off-by: Yigal Korman > Signed-off-by: Boaz Harrosh > --- > include/linux/mm.h | 3 +++ > mm/memory.c | 35 +++++++++++++++++++++++++++++++---- > 2 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index d584b95..70c47f2 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -251,6 +251,9 @@ struct vm_operations_struct { > * writable, if an error is returned it will cause a SIGBUS */ > int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); > > + /* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */ > + int (*pfn_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf); > + > /* called by access_process_vm when get_user_pages() fails, typically > * for use by special VMAs that can switch between memory and hardware > */ > diff --git a/mm/memory.c b/mm/memory.c > index 59f6268..6e8f3f6 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1982,6 +1982,19 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, > return ret; > } > > +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) > +{ > + struct vm_fault vmf = { > + .page = 0, > + .pgoff = (((address & PAGE_MASK) - vma->vm_start) > + >> PAGE_SHIFT) + vma->vm_pgoff, > + .virtual_address = (void __user *)(address & PAGE_MASK), > + .flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE, > + }; > + > + return vma->vm_ops->pfn_mkwrite(vma, &vmf); > +} > + > /* > * Handle write page faults for pages that can be reused in the current vma > * > @@ -2259,14 +2272,28 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > * VM_PFNMAP VMA. > * > * We should not cow pages in a shared writeable mapping. > - * Just mark the pages writable as we can't do any dirty > - * accounting on raw pfn maps. > + * Just mark the pages writable and/or call ops->pfn_mkwrite. > */ > if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == > - (VM_WRITE|VM_SHARED)) > + (VM_WRITE|VM_SHARED)) { > + if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { > + int ret; > + > + pte_unmap_unlock(page_table, ptl); > + ret = do_pfn_mkwrite(vma, address); > + if (ret & VM_FAULT_ERROR) > + return ret; > + page_table = pte_offset_map_lock(mm, pmd, > + address, &ptl); > + /* Did pfn_mkwrite already fixed up the pte */ > + if (!pte_same(*page_table, orig_pte)) { > + pte_unmap_unlock(page_table, ptl); > + return ret; > + } > + } > return wp_page_reuse(mm, vma, address, page_table, ptl, > orig_pte, old_page, 0, 0); > - > + } > pte_unmap_unlock(page_table, ptl); > return wp_page_copy(mm, vma, address, page_table, pmd, > orig_pte, old_page); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org