From: Andrew Morton <akpm@linux-foundation.org> To: Boaz Harrosh <boaz@plexistor.com> Cc: Dave Chinner <david@fromorbit.com>, Matthew Wilcox <matthew.r.wilcox@intel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>, linux-mm@kvack.org, linux-nvdimm <linux-nvdimm@ml01.01.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Eryu Guan <eguan@redhat.com> Subject: Re: [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Date: Mon, 23 Mar 2015 15:49:03 -0700 [thread overview] Message-ID: <20150323154903.5f5263095a4f7eff59bc9bb8@linux-foundation.org> (raw) In-Reply-To: <55100BDC.7000901@plexistor.com> On Mon, 23 Mar 2015 14:49:32 +0200 Boaz Harrosh <boaz@plexistor.com> wrote: > From: Yigal Korman <yigal@plexistor.com> > > This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) > to get notified when access is a write to a read-only PFN. > > This can happen if we mmap() a file then first mmap-read from it > to page-in a read-only PFN, than we mmap-write to the same page. > > We need this functionality to fix a DAX bug, where in the scenario > above we fail to set ctime/mtime though we modified the file. > An xfstest is attached to this patchset that shows the failure > and the fix. (A DAX patch will follow) > > This functionality is extra important for us, because upon > dirtying of a pmem page we also want to RDMA the page to a > remote cluster node. > > We define a new pfn_mkwrite and do not reuse page_mkwrite because > 1 - The name ;-) > 2 - But mainly because it would take a very long and tedious > audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP > users. To make sure they do not now CRASH. For example current > DAX code (which this is for) would crash. > If we would want to reuse page_mkwrite, We will need to first > patch all users, so to not-crash-on-no-page. Then enable this > patch. But even if I did that I would not sleep so well at night. > Adding a new vector is the safest thing to do, and is not that > expensive. an extra pointer at a static function vector per driver. > Also the new vector is better for performance, because else we > Will call all current Kernel vectors, so to: > check-ha-no-page-do-nothing and return. > > No need to call it from do_shared_fault because do_wp_page is called to > change pte permissions anyway. Looks OK to me. > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1982,6 +1982,22 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, > return ret; > } > > +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) > +{ > + struct vm_fault vmf; > + > + if (!vma->vm_ops || !vma->vm_ops->pfn_mkwrite) > + return 0; > + > + vmf.page = 0; > + vmf.pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT) + > + vma->vm_pgoff; > + vmf.virtual_address = (void __user *)(address & PAGE_MASK); > + vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; > + > + return vma->vm_ops->pfn_mkwrite(vma, &vmf); > +} It might be a little neater to use if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { struct vm_fault vmf = { ... }; ... } > /* > * This routine handles present pages, when users try to write > * to a shared page. It is done by copying the page to a new address > @@ -2025,8 +2041,17 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > * accounting on raw pfn maps. > */ > if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == > - (VM_WRITE|VM_SHARED)) > + (VM_WRITE|VM_SHARED)) { > + pte_unmap_unlock(page_table, ptl); > + ret = do_pfn_mkwrite(vma, address); > + if (ret & VM_FAULT_ERROR) > + return ret; > + page_table = pte_offset_map_lock(mm, pmd, address, > + &ptl); > + if (!pte_same(*page_table, orig_pte)) > + goto unlock; > goto reuse; > + } > goto gotten; There are significant pending changes in this area. See linux-next, or http://ozlabs.org/~akpm/mmots/broken-out/mm-refactor-*
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org> To: Boaz Harrosh <boaz@plexistor.com> Cc: Dave Chinner <david@fromorbit.com>, Matthew Wilcox <matthew.r.wilcox@intel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Jan Kara <jack@suse.cz>, Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>, linux-mm@kvack.org, linux-nvdimm <linux-nvdimm@ml01.01.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Eryu Guan <eguan@redhat.com> Subject: Re: [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Date: Mon, 23 Mar 2015 15:49:03 -0700 [thread overview] Message-ID: <20150323154903.5f5263095a4f7eff59bc9bb8@linux-foundation.org> (raw) In-Reply-To: <55100BDC.7000901@plexistor.com> On Mon, 23 Mar 2015 14:49:32 +0200 Boaz Harrosh <boaz@plexistor.com> wrote: > From: Yigal Korman <yigal@plexistor.com> > > This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) > to get notified when access is a write to a read-only PFN. > > This can happen if we mmap() a file then first mmap-read from it > to page-in a read-only PFN, than we mmap-write to the same page. > > We need this functionality to fix a DAX bug, where in the scenario > above we fail to set ctime/mtime though we modified the file. > An xfstest is attached to this patchset that shows the failure > and the fix. (A DAX patch will follow) > > This functionality is extra important for us, because upon > dirtying of a pmem page we also want to RDMA the page to a > remote cluster node. > > We define a new pfn_mkwrite and do not reuse page_mkwrite because > 1 - The name ;-) > 2 - But mainly because it would take a very long and tedious > audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP > users. To make sure they do not now CRASH. For example current > DAX code (which this is for) would crash. > If we would want to reuse page_mkwrite, We will need to first > patch all users, so to not-crash-on-no-page. Then enable this > patch. But even if I did that I would not sleep so well at night. > Adding a new vector is the safest thing to do, and is not that > expensive. an extra pointer at a static function vector per driver. > Also the new vector is better for performance, because else we > Will call all current Kernel vectors, so to: > check-ha-no-page-do-nothing and return. > > No need to call it from do_shared_fault because do_wp_page is called to > change pte permissions anyway. Looks OK to me. > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1982,6 +1982,22 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, > return ret; > } > > +static int do_pfn_mkwrite(struct vm_area_struct *vma, unsigned long address) > +{ > + struct vm_fault vmf; > + > + if (!vma->vm_ops || !vma->vm_ops->pfn_mkwrite) > + return 0; > + > + vmf.page = 0; > + vmf.pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT) + > + vma->vm_pgoff; > + vmf.virtual_address = (void __user *)(address & PAGE_MASK); > + vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; > + > + return vma->vm_ops->pfn_mkwrite(vma, &vmf); > +} It might be a little neater to use if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) { struct vm_fault vmf = { ... }; ... } > /* > * This routine handles present pages, when users try to write > * to a shared page. It is done by copying the page to a new address > @@ -2025,8 +2041,17 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > * accounting on raw pfn maps. > */ > if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) == > - (VM_WRITE|VM_SHARED)) > + (VM_WRITE|VM_SHARED)) { > + pte_unmap_unlock(page_table, ptl); > + ret = do_pfn_mkwrite(vma, address); > + if (ret & VM_FAULT_ERROR) > + return ret; > + page_table = pte_offset_map_lock(mm, pmd, address, > + &ptl); > + if (!pte_same(*page_table, orig_pte)) > + goto unlock; > goto reuse; > + } > goto gotten; There are significant pending changes in this area. See linux-next, or http://ozlabs.org/~akpm/mmots/broken-out/mm-refactor-* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-23 22:49 UTC|newest] Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-03-23 12:47 [PATCH 0/3 v3] dax: Fix mmap-write not updating c/mtime Boaz Harrosh 2015-03-23 12:47 ` Boaz Harrosh 2015-03-23 12:49 ` [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Boaz Harrosh 2015-03-23 22:49 ` Andrew Morton [this message] 2015-03-23 22:49 ` Andrew Morton 2015-03-23 12:52 ` [PATCH 2/3] dax: use pfn_mkwrite to update c/mtime + freeze protection Boaz Harrosh 2015-03-23 12:54 ` [PATCH 3/3] RFC: dax: dax_prepare_freeze Boaz Harrosh 2015-03-23 22:40 ` Dave Chinner 2015-03-23 22:40 ` Dave Chinner 2015-03-24 6:14 ` Boaz Harrosh 2015-03-24 6:14 ` Boaz Harrosh 2015-03-25 2:22 ` Dave Chinner 2015-03-25 2:22 ` Dave Chinner 2015-03-25 8:10 ` Boaz Harrosh 2015-03-25 9:29 ` Dave Chinner 2015-03-25 9:29 ` Dave Chinner 2015-03-25 10:19 ` Boaz Harrosh 2015-03-25 10:19 ` Boaz Harrosh 2015-03-25 20:00 ` Dave Chinner 2015-03-25 20:00 ` Dave Chinner 2015-03-26 8:02 ` Boaz Harrosh 2015-03-26 20:58 ` Dave Chinner 2015-03-26 20:58 ` Dave Chinner 2015-03-24 12:37 ` Boaz Harrosh 2015-03-24 12:37 ` Boaz Harrosh 2015-03-25 2:26 ` Dave Chinner 2015-03-25 2:26 ` Dave Chinner 2015-03-25 8:31 ` Boaz Harrosh 2015-03-25 8:31 ` Boaz Harrosh 2015-03-25 9:41 ` Dave Chinner 2015-03-25 9:41 ` Dave Chinner 2015-03-25 10:40 ` Boaz Harrosh 2015-03-25 10:40 ` Boaz Harrosh 2015-03-25 20:05 ` Dave Chinner 2015-03-25 20:05 ` Dave Chinner 2015-03-23 12:56 ` [PATCH v4] xfstest: generic/080 test that mmap-write updates c/mtime Boaz Harrosh 2015-03-25 13:34 [PATCH 0/3 v4] dax: some dax fixes and cleanups Boaz Harrosh 2015-03-25 13:38 ` [PATCH 1/3] mm: New pfn_mkwrite same as page_mkwrite for VM_PFNMAP Boaz Harrosh 2015-03-25 14:34 ` Kirill A. Shutemov 2015-03-26 7:49 ` Boaz Harrosh 2015-03-25 15:08 ` Dave Hansen 2015-03-25 15:13 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150323154903.5f5263095a4f7eff59bc9bb8@linux-foundation.org \ --to=akpm@linux-foundation.org \ --cc=boaz@plexistor.com \ --cc=david@fromorbit.com \ --cc=eguan@redhat.com \ --cc=hughd@google.com \ --cc=jack@suse.cz \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@ml01.01.org \ --cc=matthew.r.wilcox@intel.com \ --cc=mgorman@suse.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.