All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-09-27 16:08 Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
                   ` (20 more replies)
  0 siblings, 21 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Hello,

this is a third revision of my patches to clear dirty bits from radix tree of
DAX inodes when caches for corresponding pfns have been flushed. This patch set
is significantly larger than the previous version because I'm changing how
->fault, ->page_mkwrite, and ->pfn_mkwrite handlers may choose to handle the
fault so that we don't have to leak details about DAX locking into the generic
code. In principle, these patches enable handlers to easily update PTEs and do
other work necessary to finish the fault without duplicating the functionality
present in the generic code. I'd be really like feedback from mm folks whether
such changes to fault handling code are fine or what they'd do differently.

The patches pass testing with xfstests on ext4 and xfs on my end
- just be aware they are basis for further DAX fixes without which some
stress tests can still trigger failures. I'll be sending these fixes separately
in order to keep the series of reasonable size. For full testing, you
can pull all the patches from

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git dax

but be aware I will likely rebase that branch and do other nasty stuff with
it so don't use it as a basis of your git trees.

Changes since v2:
* rebased on top of 4.8-rc8 - this involved dealing with new fault_env
  structure
* changed calling convention for fault helpers

Changes since v1:
* make sure all PTE updates happen under radix tree entry lock to protect
  against races between faults & write-protecting code
* remove information about DAX locking from mm/memory.c
* smaller updates based on Ross' feedback

----
Background information regarding the motivation:

Currently we never clear dirty bits in the radix tree of a DAX inode. Thus
fsync(2) flushes all the dirty pfns again and again. This patches implement
clearing of the dirty tag in the radix tree so that we issue flush only when
needed.

The difficulty with clearing the dirty tag is that we have to protect against
a concurrent page fault setting the dirty tag and writing new data into the
page. So we need a lock serializing page fault and clearing of the dirty tag
and write-protecting PTEs (so that we get another pagefault when pfn is written
to again and we have to set the dirty tag again).

The effect of the patch set is easily visible:

Writing 1 GB of data via mmap, then fsync twice.

Before this patch set both fsyncs take ~205 ms on my test machine, after the
patch set the first fsync takes ~283 ms (the additional cost of walking PTEs,
clearing dirty bits etc. is very noticeable), the second fsync takes below
1 us.

As a bonus, these patches make filesystem freezing for DAX filesystems
reliable because mappings are now properly writeprotected while freezing the
fs.

Patches have passed xfstests for both xfs and ext4.

								Honza

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH 01/20] mm: Change type of vmf->virtual_address
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
       [not found]   ` <1474992504-20133-2-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
  2016-10-14 18:02     ` Ross Zwisler
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it. So just change the type of that
entry to unsigned long immediately.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 arch/powerpc/platforms/cell/spufs/file.c     |  4 ++--
 arch/x86/entry/vdso/vma.c                    |  4 ++--
 drivers/char/agp/alpha-agp.c                 |  2 +-
 drivers/char/mspec.c                         |  2 +-
 drivers/dax/dax.c                            |  2 +-
 drivers/gpu/drm/armada/armada_gem.c          |  2 +-
 drivers/gpu/drm/drm_vm.c                     |  9 ++++-----
 drivers/gpu/drm/etnaviv/etnaviv_gem.c        |  7 +++----
 drivers/gpu/drm/exynos/exynos_drm_gem.c      |  5 ++---
 drivers/gpu/drm/gma500/framebuffer.c         |  2 +-
 drivers/gpu/drm/gma500/gem.c                 |  5 ++---
 drivers/gpu/drm/i915/i915_gem.c              |  5 ++---
 drivers/gpu/drm/msm/msm_gem.c                |  7 +++----
 drivers/gpu/drm/omapdrm/omap_gem.c           | 17 +++++++----------
 drivers/gpu/drm/tegra/gem.c                  |  4 ++--
 drivers/gpu/drm/ttm/ttm_bo_vm.c              |  2 +-
 drivers/gpu/drm/udl/udl_gem.c                |  5 ++---
 drivers/gpu/drm/vgem/vgem_drv.c              |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c    |  5 ++---
 drivers/misc/cxl/context.c                   |  2 +-
 drivers/misc/sgi-gru/grumain.c               |  2 +-
 drivers/staging/android/ion/ion.c            |  2 +-
 drivers/staging/lustre/lustre/llite/vvp_io.c |  8 +++++---
 drivers/xen/privcmd.c                        |  2 +-
 fs/dax.c                                     |  4 ++--
 include/linux/mm.h                           |  2 +-
 mm/memory.c                                  |  7 +++----
 27 files changed, 55 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index 06254467e4dd..f7b33a477b95 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -236,7 +236,7 @@ static int
 spufs_mem_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct spu_context *ctx	= vma->vm_file->private_data;
-	unsigned long address = (unsigned long)vmf->virtual_address;
+	unsigned long address = vmf->virtual_address;
 	unsigned long pfn, offset;
 
 	offset = vmf->pgoff << PAGE_SHIFT;
@@ -355,7 +355,7 @@ static int spufs_ps_fault(struct vm_area_struct *vma,
 		down_read(&current->mm->mmap_sem);
 	} else {
 		area = ctx->spu->problem_phys + ps_offs;
-		vm_insert_pfn(vma, (unsigned long)vmf->virtual_address,
+		vm_insert_pfn(vma, vmf->virtual_address,
 					(area + offset) >> PAGE_SHIFT);
 		spu_context_trace(spufs_ps_fault__insert, ctx, ctx->spu);
 	}
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index f840766659a8..113e0155c6b5 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -157,7 +157,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 		return VM_FAULT_SIGBUS;
 
 	if (sym_offset == image->sym_vvar_page) {
-		ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address,
+		ret = vm_insert_pfn(vma, vmf->virtual_address,
 				    __pa_symbol(&__vvar_page) >> PAGE_SHIFT);
 	} else if (sym_offset == image->sym_pvclock_page) {
 		struct pvclock_vsyscall_time_info *pvti =
@@ -165,7 +165,7 @@ static int vvar_fault(const struct vm_special_mapping *sm,
 		if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
 			ret = vm_insert_pfn(
 				vma,
-				(unsigned long)vmf->virtual_address,
+				vmf->virtual_address,
 				__pa(pvti) >> PAGE_SHIFT);
 		}
 	}
diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c
index 199b8e99f7d7..537b1dc14c9f 100644
--- a/drivers/char/agp/alpha-agp.c
+++ b/drivers/char/agp/alpha-agp.c
@@ -19,7 +19,7 @@ static int alpha_core_agp_vm_fault(struct vm_area_struct *vma,
 	unsigned long pa;
 	struct page *page;
 
-	dma_addr = (unsigned long)vmf->virtual_address - vma->vm_start
+	dma_addr = vmf->virtual_address - vma->vm_start
 						+ agp->aperture.bus_base;
 	pa = agp->ops->translate(agp, dma_addr);
 
diff --git a/drivers/char/mspec.c b/drivers/char/mspec.c
index f3f92d5fcda0..36eb17c16951 100644
--- a/drivers/char/mspec.c
+++ b/drivers/char/mspec.c
@@ -227,7 +227,7 @@ mspec_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * be because another thread has installed the pte first, so it
 	 * is no problem.
 	 */
-	vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn);
+	vm_insert_pfn(vma, vmf->virtual_address, pfn);
 
 	return VM_FAULT_NOPAGE;
 }
diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c
index 29f600f2c447..c4e9e5cdf9dd 100644
--- a/drivers/dax/dax.c
+++ b/drivers/dax/dax.c
@@ -381,7 +381,7 @@ static phys_addr_t pgoff_to_phys(struct dax_dev *dax_dev, pgoff_t pgoff,
 static int __dax_dev_fault(struct dax_dev *dax_dev, struct vm_area_struct *vma,
 		struct vm_fault *vmf)
 {
-	unsigned long vaddr = (unsigned long) vmf->virtual_address;
+	unsigned long vaddr = vmf->virtual_address;
 	struct device *dev = dax_dev->dev;
 	struct dax_region *dax_region;
 	int rc = VM_FAULT_SIGBUS;
diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c
index cb8f0347b934..11cdd8f0273a 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -17,7 +17,7 @@
 static int armada_gem_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct armada_gem_object *obj = drm_to_armada_gem(vma->vm_private_data);
-	unsigned long addr = (unsigned long)vmf->virtual_address;
+	unsigned long addr = vmf->virtual_address;
 	unsigned long pfn = obj->phys_addr >> PAGE_SHIFT;
 	int ret;
 
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index caa4e4ca616d..47b1aed4a142 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -124,8 +124,7 @@ static int drm_do_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		 * Using vm_pgoff as a selector forces us to use this unusual
 		 * addressing scheme.
 		 */
-		resource_size_t offset = (unsigned long)vmf->virtual_address -
-			vma->vm_start;
+		resource_size_t offset = vmf->virtual_address - vma->vm_start;
 		resource_size_t baddr = map->offset + offset;
 		struct drm_agp_mem *agpmem;
 		struct page *page;
@@ -195,7 +194,7 @@ static int drm_do_vm_shm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (!map)
 		return VM_FAULT_SIGBUS;	/* Nothing allocated */
 
-	offset = (unsigned long)vmf->virtual_address - vma->vm_start;
+	offset = vmf->virtual_address - vma->vm_start;
 	i = (unsigned long)map->handle + offset;
 	page = vmalloc_to_page((void *)i);
 	if (!page)
@@ -301,7 +300,7 @@ static int drm_do_vm_dma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (!dma->pagelist)
 		return VM_FAULT_SIGBUS;	/* Nothing allocated */
 
-	offset = (unsigned long)vmf->virtual_address - vma->vm_start;	/* vm_[pg]off[set] should be 0 */
+	offset = vmf->virtual_address - vma->vm_start;	/* vm_[pg]off[set] should be 0 */
 	page_nr = offset >> PAGE_SHIFT; /* page_nr could just be vmf->pgoff */
 	page = virt_to_page((void *)dma->pagelist[page_nr]);
 
@@ -337,7 +336,7 @@ static int drm_do_vm_sg_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (!entry->pagelist)
 		return VM_FAULT_SIGBUS;	/* Nothing allocated */
 
-	offset = (unsigned long)vmf->virtual_address - vma->vm_start;
+	offset = vmf->virtual_address - vma->vm_start;
 	map_offset = map->offset - (unsigned long)dev->sg->virtual;
 	page_offset = (offset >> PAGE_SHIFT) + (map_offset >> PAGE_SHIFT);
 	page = entry->pagelist[page_offset];
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 5ce3603e6eac..4bfc8e67dbb0 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -202,15 +202,14 @@ int etnaviv_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* We don't use vmf->pgoff since that has the fake offset: */
-	pgoff = ((unsigned long)vmf->virtual_address -
-			vma->vm_start) >> PAGE_SHIFT;
+	pgoff = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	page = pages[pgoff];
 
-	VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address,
+	VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->virtual_address,
 	     page_to_pfn(page), page_to_pfn(page) << PAGE_SHIFT);
 
-	ret = vm_insert_page(vma, (unsigned long)vmf->virtual_address, page);
+	ret = vm_insert_page(vma, vmf->virtual_address, page);
 
 out:
 	switch (ret) {
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index f2ae72ba7d5a..283305afa06a 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -455,8 +455,7 @@ int exynos_drm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	pgoff_t page_offset;
 	int ret;
 
-	page_offset = ((unsigned long)vmf->virtual_address -
-			vma->vm_start) >> PAGE_SHIFT;
+	page_offset = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	if (page_offset >= (exynos_gem->size >> PAGE_SHIFT)) {
 		DRM_ERROR("invalid page offset\n");
@@ -465,7 +464,7 @@ int exynos_drm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	pfn = page_to_pfn(exynos_gem->pages[page_offset]);
-	ret = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
+	ret = vm_insert_mixed(vma, vmf->virtual_address,
 			__pfn_to_pfn_t(pfn, PFN_DEV));
 
 out:
diff --git a/drivers/gpu/drm/gma500/framebuffer.c b/drivers/gpu/drm/gma500/framebuffer.c
index 0fcdce0817de..a6093bfa57bf 100644
--- a/drivers/gpu/drm/gma500/framebuffer.c
+++ b/drivers/gpu/drm/gma500/framebuffer.c
@@ -126,7 +126,7 @@ static int psbfb_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 				  psbfb->gtt->offset;
 
 	page_num = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-	address = (unsigned long)vmf->virtual_address - (vmf->pgoff << PAGE_SHIFT);
+	address = vmf->virtual_address - (vmf->pgoff << PAGE_SHIFT);
 
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
diff --git a/drivers/gpu/drm/gma500/gem.c b/drivers/gpu/drm/gma500/gem.c
index 6d1cb6b370b1..a720c46f8ebb 100644
--- a/drivers/gpu/drm/gma500/gem.c
+++ b/drivers/gpu/drm/gma500/gem.c
@@ -197,15 +197,14 @@ int psb_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 	/* Page relative to the VMA start - we must calculate this ourselves
 	   because vmf->pgoff is the fake GEM offset */
-	page_offset = ((unsigned long) vmf->virtual_address - vma->vm_start)
-				>> PAGE_SHIFT;
+	page_offset = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	/* CPU view of the page, don't go via the GART for CPU writes */
 	if (r->stolen)
 		pfn = (dev_priv->stolen_base + r->offset) >> PAGE_SHIFT;
 	else
 		pfn = page_to_pfn(r->pages[page_offset]);
-	ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn);
+	ret = vm_insert_pfn(vma, vmf->virtual_address, pfn);
 
 fail:
 	mutex_unlock(&dev_priv->mmap_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a77ce9983f69..b13d929b8cab 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2020,8 +2020,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	intel_runtime_pm_get(dev_priv);
 
 	/* We don't use vmf->pgoff since that has the fake offset */
-	page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >>
-		PAGE_SHIFT;
+	page_offset = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
@@ -2112,7 +2111,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 			obj->fault_mappable = true;
 		} else
 			ret = vm_insert_pfn(vma,
-					    (unsigned long)vmf->virtual_address,
+					    vmf->virtual_address,
 					    pfn + page_offset);
 	}
 unpin:
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 85f3047e05ae..e099c43b9875 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -225,15 +225,14 @@ int msm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	}
 
 	/* We don't use vmf->pgoff since that has the fake offset: */
-	pgoff = ((unsigned long)vmf->virtual_address -
-			vma->vm_start) >> PAGE_SHIFT;
+	pgoff = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	pfn = page_to_pfn(pages[pgoff]);
 
-	VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address,
+	VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->virtual_address,
 			pfn, pfn << PAGE_SHIFT);
 
-	ret = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
+	ret = vm_insert_mixed(vma, vmf->virtual_address,
 			__pfn_to_pfn_t(pfn, PFN_DEV));
 
 out_unlock:
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index 505dee0db973..2da0c8f06763 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -396,8 +396,7 @@ static int fault_1d(struct drm_gem_object *obj,
 	pgoff_t pgoff;
 
 	/* We don't use vmf->pgoff since that has the fake offset: */
-	pgoff = ((unsigned long)vmf->virtual_address -
-			vma->vm_start) >> PAGE_SHIFT;
+	pgoff = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	if (omap_obj->pages) {
 		omap_gem_cpu_sync(obj, pgoff);
@@ -407,10 +406,10 @@ static int fault_1d(struct drm_gem_object *obj,
 		pfn = (omap_obj->paddr >> PAGE_SHIFT) + pgoff;
 	}
 
-	VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address,
+	VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->virtual_address,
 			pfn, pfn << PAGE_SHIFT);
 
-	return vm_insert_mixed(vma, (unsigned long)vmf->virtual_address,
+	return vm_insert_mixed(vma, vmf->virtual_address,
 			__pfn_to_pfn_t(pfn, PFN_DEV));
 }
 
@@ -425,7 +424,7 @@ static int fault_2d(struct drm_gem_object *obj,
 	struct page *pages[64];  /* XXX is this too much to have on stack? */
 	unsigned long pfn;
 	pgoff_t pgoff, base_pgoff;
-	void __user *vaddr;
+	unsigned long vaddr;
 	int i, ret, slots;
 
 	/*
@@ -445,8 +444,7 @@ static int fault_2d(struct drm_gem_object *obj,
 	const int m = 1 + ((omap_obj->width << fmt) / PAGE_SIZE);
 
 	/* We don't use vmf->pgoff since that has the fake offset: */
-	pgoff = ((unsigned long)vmf->virtual_address -
-			vma->vm_start) >> PAGE_SHIFT;
+	pgoff = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	/*
 	 * Actual address we start mapping at is rounded down to previous slot
@@ -501,12 +499,11 @@ static int fault_2d(struct drm_gem_object *obj,
 
 	pfn = entry->paddr >> PAGE_SHIFT;
 
-	VERB("Inserting %p pfn %lx, pa %lx", vmf->virtual_address,
+	VERB("Inserting %p pfn %lx, pa %lx", (void *)vmf->virtual_address,
 			pfn, pfn << PAGE_SHIFT);
 
 	for (i = n; i > 0; i--) {
-		vm_insert_mixed(vma, (unsigned long)vaddr,
-				__pfn_to_pfn_t(pfn, PFN_DEV));
+		vm_insert_mixed(vma, vaddr, __pfn_to_pfn_t(pfn, PFN_DEV));
 		pfn += priv->usergart[fmt].stride_pfn;
 		vaddr += PAGE_SIZE * m;
 	}
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index aa60d9909ea2..55c2d846fd85 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -427,10 +427,10 @@ static int tegra_bo_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (!bo->pages)
 		return VM_FAULT_SIGBUS;
 
-	offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
+	offset = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 	page = bo->pages[offset];
 
-	err = vm_insert_page(vma, (unsigned long)vmf->virtual_address, page);
+	err = vm_insert_page(vma, vmf->virtual_address, page);
 	switch (err) {
 	case -EAGAIN:
 	case 0:
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a6ed9d5e5167..9f703d7ea1a4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -101,7 +101,7 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct page *page;
 	int ret;
 	int i;
-	unsigned long address = (unsigned long)vmf->virtual_address;
+	unsigned long address = vmf->virtual_address;
 	int retval = VM_FAULT_NOPAGE;
 	struct ttm_mem_type_manager *man =
 		&bdev->man[bo->mem.mem_type];
diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c
index 818e70712b18..db3f5b912602 100644
--- a/drivers/gpu/drm/udl/udl_gem.c
+++ b/drivers/gpu/drm/udl/udl_gem.c
@@ -107,14 +107,13 @@ int udl_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	unsigned int page_offset;
 	int ret = 0;
 
-	page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >>
-		PAGE_SHIFT;
+	page_offset = (vmf->virtual_address - vma->vm_start) >> PAGE_SHIFT;
 
 	if (!obj->pages)
 		return VM_FAULT_SIGBUS;
 
 	page = obj->pages[page_offset];
-	ret = vm_insert_page(vma, (unsigned long)vmf->virtual_address, page);
+	ret = vm_insert_page(vma, vmf->virtual_address, page);
 	switch (ret) {
 	case -EAGAIN:
 	case 0:
diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index c15bafb06665..914c59960d76 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -54,7 +54,7 @@ static int vgem_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct drm_vgem_gem_object *obj = vma->vm_private_data;
 	/* We don't use vmf->pgoff since that has the fake offset */
-	unsigned long vaddr = (unsigned long)vmf->virtual_address;
+	unsigned long vaddr = vmf->virtual_address;
 	struct page *page;
 
 	page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping,
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index f300f060b3f3..eaa30933f51b 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -436,13 +436,12 @@ static int videobuf_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct page *page;
 
 	dprintk(3, "fault: fault @ %08lx [vma %08lx-%08lx]\n",
-		(unsigned long)vmf->virtual_address,
-		vma->vm_start, vma->vm_end);
+		vmf->virtual_address, vma->vm_start, vma->vm_end);
 
 	page = alloc_page(GFP_USER | __GFP_DMA32);
 	if (!page)
 		return VM_FAULT_OOM;
-	clear_user_highpage(page, (unsigned long)vmf->virtual_address);
+	clear_user_highpage(page, vmf->virtual_address);
 	vmf->page = page;
 
 	return 0;
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index c466ee2b0c97..76031a56cfb7 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -117,7 +117,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu *afu, bool master,
 static int cxl_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct cxl_context *ctx = vma->vm_file->private_data;
-	unsigned long address = (unsigned long)vmf->virtual_address;
+	unsigned long address = vmf->virtual_address;
 	u64 area, offset;
 
 	offset = vmf->pgoff << PAGE_SHIFT;
diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
index 1525870f460a..e06daa6c2a04 100644
--- a/drivers/misc/sgi-gru/grumain.c
+++ b/drivers/misc/sgi-gru/grumain.c
@@ -932,7 +932,7 @@ int gru_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	unsigned long paddr, vaddr;
 	unsigned long expires;
 
-	vaddr = (unsigned long)vmf->virtual_address;
+	vaddr = vmf->virtual_address;
 	gru_dbg(grudev, "vma %p, vaddr 0x%lx (0x%lx)\n",
 		vma, vaddr, GSEG_BASE(vaddr));
 	STAT(nopfn);
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index a2cf93b59016..da869117fb2f 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -1022,7 +1022,7 @@ static int ion_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	BUG_ON(!buffer->pages || !buffer->pages[vmf->pgoff]);
 
 	pfn = page_to_pfn(ion_buffer_page(buffer->pages[vmf->pgoff]));
-	ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn);
+	ret = vm_insert_pfn(vma, vmf->virtual_address, pfn);
 	mutex_unlock(&buffer->lock);
 	if (ret)
 		return VM_FAULT_ERROR;
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index 94916dcc6caa..feaf77895727 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -1002,7 +1002,7 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 		       "page %p map %p index %lu flags %lx count %u priv %0lx: got addr %p type NOPAGE\n",
 		       vmf->page, vmf->page->mapping, vmf->page->index,
 		       (long)vmf->page->flags, page_count(vmf->page),
-		       page_private(vmf->page), vmf->virtual_address);
+		       page_private(vmf->page), (void *)vmf->virtual_address);
 		if (unlikely(!(cfio->ft_flags & VM_FAULT_LOCKED))) {
 			lock_page(vmf->page);
 			cfio->ft_flags |= VM_FAULT_LOCKED;
@@ -1013,12 +1013,14 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
 	}
 
 	if (cfio->ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) {
-		CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", vmf->virtual_address);
+		CDEBUG(D_PAGE, "got addr %p - SIGBUS\n",
+		       (void *)vmf->virtual_address);
 		return -EFAULT;
 	}
 
 	if (cfio->ft_flags & VM_FAULT_OOM) {
-		CDEBUG(D_PAGE, "got addr %p - OOM\n", vmf->virtual_address);
+		CDEBUG(D_PAGE, "got addr %p - OOM\n",
+		       (void *)vmf->virtual_address);
 		return -ENOMEM;
 	}
 
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 702040fe2001..731eb53aead3 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -602,7 +602,7 @@ static int privcmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	printk(KERN_DEBUG "privcmd_fault: vma=%p %lx-%lx, pgoff=%lx, uv=%p\n",
 	       vma, vma->vm_start, vma->vm_end,
-	       vmf->pgoff, vmf->virtual_address);
+	       vmf->pgoff, (void *)vmf->virtual_address);
 
 	return VM_FAULT_SIGBUS;
 }
diff --git a/fs/dax.c b/fs/dax.c
index cc025f82ef07..0dc251ca77b8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -794,7 +794,7 @@ static int dax_insert_mapping(struct address_space *mapping,
 		struct block_device *bdev, sector_t sector, size_t size,
 		void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	unsigned long vaddr = (unsigned long)vmf->virtual_address;
+	unsigned long vaddr = vmf->virtual_address;
 	struct blk_dax_ctl dax = {
 		.sector = sector,
 		.size = size,
@@ -832,7 +832,7 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	struct inode *inode = mapping->host;
 	void *entry;
 	struct buffer_head bh;
-	unsigned long vaddr = (unsigned long)vmf->virtual_address;
+	unsigned long vaddr = vmf->virtual_address;
 	unsigned blkbits = inode->i_blkbits;
 	sector_t block;
 	pgoff_t size;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef815b9cd426..a5636d646022 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -295,7 +295,7 @@ struct vm_fault {
 	unsigned int flags;		/* FAULT_FLAG_xxx flags */
 	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 	pgoff_t pgoff;			/* Logical page offset based on vma */
-	void __user *virtual_address;	/* Faulting virtual address */
+	unsigned long virtual_address;	/* Faulting virtual address */
 
 	struct page *cow_page;		/* Handler may choose to COW */
 	struct page *page;		/* ->fault handlers should return a
diff --git a/mm/memory.c b/mm/memory.c
index 793fe0f9841c..406b8728e141 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2040,7 +2040,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 	struct vm_fault vmf;
 	int ret;
 
-	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
+	vmf.virtual_address = address & PAGE_MASK;
 	vmf.pgoff = page->index;
 	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
 	vmf.gfp_mask = __get_fault_gfp_mask(vma);
@@ -2275,8 +2275,7 @@ static int wp_pfn_shared(struct fault_env *fe,  pte_t orig_pte)
 		struct vm_fault vmf = {
 			.page = NULL,
 			.pgoff = linear_page_index(vma, fe->address),
-			.virtual_address =
-				(void __user *)(fe->address & PAGE_MASK),
+			.virtual_address = fe->address & PAGE_MASK,
 			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
 		};
 		int ret;
@@ -2850,7 +2849,7 @@ static int __do_fault(struct fault_env *fe, pgoff_t pgoff,
 	struct vm_fault vmf;
 	int ret;
 
-	vmf.virtual_address = (void __user *)(fe->address & PAGE_MASK);
+	vmf.virtual_address = fe->address & PAGE_MASK;
 	vmf.pgoff = pgoff;
 	vmf.flags = fe->flags;
 	vmf.page = NULL;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 02/20] mm: Join struct fault_env and vm_fault
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
       [not found]   ` <1474992504-20133-3-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
  2016-09-27 16:08   ` Jan Kara
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently we have two different structures for passing fault information
around - struct vm_fault and struct fault_env. DAX will need more
information in struct vm_fault to handle its faults so the content of
that structure would become event closer to fault_env. Furthermore it
would need to generate struct fault_env to be able to call some of the
generic functions. So at this point I don't think there's much use in
keeping these two structures separate. Just embed into struct vm_fault
all that is needed to use it for both purposes.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/Locking |   2 +-
 fs/userfaultfd.c                  |  22 +-
 include/linux/huge_mm.h           |  10 +-
 include/linux/mm.h                |  28 +-
 include/linux/userfaultfd_k.h     |   4 +-
 mm/filemap.c                      |  14 +-
 mm/huge_memory.c                  | 173 ++++++------
 mm/internal.h                     |   2 +-
 mm/khugepaged.c                   |  20 +-
 mm/memory.c                       | 549 +++++++++++++++++++-------------------
 mm/nommu.c                        |   2 +-
 11 files changed, 414 insertions(+), 412 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index d30fb2cb5066..02961390f4ba 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -549,7 +549,7 @@ till "end_pgoff". ->map_pages() is called with page table locked and must
 not block.  If it's not possible to reach a page without blocking,
 filesystem should skip it. Filesystem should use do_set_pte() to setup
 page table entry. Pointer to entry associated with the page is passed in
-"pte" field in fault_env structure. Pointers to entries for other offsets
+"pte" field in vm_fault structure. Pointers to entries for other offsets
 should be calculated relative to "pte".
 
 	->page_mkwrite() is called when a previously read-only pte is
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 85959d8324df..d96e2f30084b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -257,9 +257,9 @@ out:
  * fatal_signal_pending()s, and the mmap_sem must be released before
  * returning it.
  */
-int handle_userfault(struct fault_env *fe, unsigned long reason)
+int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 {
-	struct mm_struct *mm = fe->vma->vm_mm;
+	struct mm_struct *mm = vmf->vma->vm_mm;
 	struct userfaultfd_ctx *ctx;
 	struct userfaultfd_wait_queue uwq;
 	int ret;
@@ -268,7 +268,7 @@ int handle_userfault(struct fault_env *fe, unsigned long reason)
 	BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
 	ret = VM_FAULT_SIGBUS;
-	ctx = fe->vma->vm_userfaultfd_ctx.ctx;
+	ctx = vmf->vma->vm_userfaultfd_ctx.ctx;
 	if (!ctx)
 		goto out;
 
@@ -301,17 +301,18 @@ int handle_userfault(struct fault_env *fe, unsigned long reason)
 	 * without first stopping userland access to the memory. For
 	 * VM_UFFD_MISSING userfaults this is enough for now.
 	 */
-	if (unlikely(!(fe->flags & FAULT_FLAG_ALLOW_RETRY))) {
+	if (unlikely(!(vmf->flags & FAULT_FLAG_ALLOW_RETRY))) {
 		/*
 		 * Validate the invariant that nowait must allow retry
 		 * to be sure not to return SIGBUS erroneously on
 		 * nowait invocations.
 		 */
-		BUG_ON(fe->flags & FAULT_FLAG_RETRY_NOWAIT);
+		BUG_ON(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);
 #ifdef CONFIG_DEBUG_VM
 		if (printk_ratelimit()) {
 			printk(KERN_WARNING
-			       "FAULT_FLAG_ALLOW_RETRY missing %x\n", fe->flags);
+			       "FAULT_FLAG_ALLOW_RETRY missing %x\n",
+			       vmf->flags);
 			dump_stack();
 		}
 #endif
@@ -323,7 +324,7 @@ int handle_userfault(struct fault_env *fe, unsigned long reason)
 	 * and wait.
 	 */
 	ret = VM_FAULT_RETRY;
-	if (fe->flags & FAULT_FLAG_RETRY_NOWAIT)
+	if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
 		goto out;
 
 	/* take the reference before dropping the mmap_sem */
@@ -331,11 +332,11 @@ int handle_userfault(struct fault_env *fe, unsigned long reason)
 
 	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
 	uwq.wq.private = current;
-	uwq.msg = userfault_msg(fe->address, fe->flags, reason);
+	uwq.msg = userfault_msg(vmf->address, vmf->flags, reason);
 	uwq.ctx = ctx;
 
 	return_to_userland =
-		(fe->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
+		(vmf->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
 		(FAULT_FLAG_USER|FAULT_FLAG_KILLABLE);
 
 	spin_lock(&ctx->fault_pending_wqh.lock);
@@ -353,7 +354,8 @@ int handle_userfault(struct fault_env *fe, unsigned long reason)
 			  TASK_KILLABLE);
 	spin_unlock(&ctx->fault_pending_wqh.lock);
 
-	must_wait = userfaultfd_must_wait(ctx, fe->address, fe->flags, reason);
+	must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
+					  reason);
 	up_read(&mm->mmap_sem);
 
 	if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 6f14de45b5ce..79c7fc511c4c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -1,12 +1,12 @@
 #ifndef _LINUX_HUGE_MM_H
 #define _LINUX_HUGE_MM_H
 
-extern int do_huge_pmd_anonymous_page(struct fault_env *fe);
+extern int do_huge_pmd_anonymous_page(struct vm_fault *vmf);
 extern int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			 pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
 			 struct vm_area_struct *vma);
-extern void huge_pmd_set_accessed(struct fault_env *fe, pmd_t orig_pmd);
-extern int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd);
+extern void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd);
+extern int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd);
 extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 					  unsigned long addr,
 					  pmd_t *pmd,
@@ -138,7 +138,7 @@ static inline int hpage_nr_pages(struct page *page)
 	return 1;
 }
 
-extern int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t orig_pmd);
+extern int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
 
 extern struct page *huge_zero_page;
 
@@ -203,7 +203,7 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 	return NULL;
 }
 
-static inline int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t orig_pmd)
+static inline int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd)
 {
 	return 0;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a5636d646022..5fc6daf5242c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -292,10 +292,16 @@ extern pgprot_t protection_map[16];
  * pgoff should be used in favour of virtual_address, if possible.
  */
 struct vm_fault {
+	struct vm_area_struct *vma;	/* Target VMA */
 	unsigned int flags;		/* FAULT_FLAG_xxx flags */
 	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 	pgoff_t pgoff;			/* Logical page offset based on vma */
-	unsigned long virtual_address;	/* Faulting virtual address */
+	unsigned long address;		/* Faulting virtual address */
+	unsigned long virtual_address;	/* Faulting virtual address masked by
+					 * PAGE_MASK */
+	pmd_t *pmd;			/* Pointer to pmd entry matching
+					 * the 'address'
+					 */
 
 	struct page *cow_page;		/* Handler may choose to COW */
 	struct page *page;		/* ->fault handlers should return a
@@ -309,19 +315,7 @@ struct vm_fault {
 					 * VM_FAULT_DAX_LOCKED and fill in
 					 * entry here.
 					 */
-};
-
-/*
- * Page fault context: passes though page fault handler instead of endless list
- * of function arguments.
- */
-struct fault_env {
-	struct vm_area_struct *vma;	/* Target VMA */
-	unsigned long address;		/* Faulting virtual address */
-	unsigned int flags;		/* FAULT_FLAG_xxx flags */
-	pmd_t *pmd;			/* Pointer to pmd entry matching
-					 * the 'address'
-					 */
+	/* These three entries are valid only while holding ptl lock */
 	pte_t *pte;			/* Pointer to pte entry matching
 					 * the 'address'. NULL if the page
 					 * table hasn't been allocated.
@@ -351,7 +345,7 @@ struct vm_operations_struct {
 	int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
 	int (*pmd_fault)(struct vm_area_struct *, unsigned long address,
 						pmd_t *, unsigned int flags);
-	void (*map_pages)(struct fault_env *fe,
+	void (*map_pages)(struct vm_fault *vmf,
 			pgoff_t start_pgoff, pgoff_t end_pgoff);
 
 	/* notification that a previously read-only page is about to become
@@ -625,7 +619,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 	return pte;
 }
 
-int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg,
+int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page);
 #endif
 
@@ -2103,7 +2097,7 @@ extern void truncate_inode_pages_final(struct address_space *);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern int filemap_fault(struct vm_area_struct *, struct vm_fault *);
-extern void filemap_map_pages(struct fault_env *fe,
+extern void filemap_map_pages(struct vm_fault *vmf,
 		pgoff_t start_pgoff, pgoff_t end_pgoff);
 extern int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index dd66a952e8cd..11b92b047a1e 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -27,7 +27,7 @@
 #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS)
 
-extern int handle_userfault(struct fault_env *fe, unsigned long reason);
+extern int handle_userfault(struct vm_fault *vmf, unsigned long reason);
 
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
 			    unsigned long src_start, unsigned long len);
@@ -55,7 +55,7 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 #else /* CONFIG_USERFAULTFD */
 
 /* mm helpers */
-static inline int handle_userfault(struct fault_env *fe, unsigned long reason)
+static inline int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 {
 	return VM_FAULT_SIGBUS;
 }
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a287dfc5372..6e0b98e7fe43 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2193,12 +2193,12 @@ page_not_uptodate:
 }
 EXPORT_SYMBOL(filemap_fault);
 
-void filemap_map_pages(struct fault_env *fe,
+void filemap_map_pages(struct vm_fault *vmf,
 		pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
 	struct radix_tree_iter iter;
 	void **slot;
-	struct file *file = fe->vma->vm_file;
+	struct file *file = vmf->vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
 	pgoff_t last_pgoff = start_pgoff;
 	loff_t size;
@@ -2254,11 +2254,11 @@ repeat:
 		if (file->f_ra.mmap_miss > 0)
 			file->f_ra.mmap_miss--;
 
-		fe->address += (iter.index - last_pgoff) << PAGE_SHIFT;
-		if (fe->pte)
-			fe->pte += iter.index - last_pgoff;
+		vmf->address += (iter.index - last_pgoff) << PAGE_SHIFT;
+		if (vmf->pte)
+			vmf->pte += iter.index - last_pgoff;
 		last_pgoff = iter.index;
-		if (alloc_set_pte(fe, NULL, page))
+		if (alloc_set_pte(vmf, NULL, page))
 			goto unlock;
 		unlock_page(page);
 		goto next;
@@ -2268,7 +2268,7 @@ skip:
 		put_page(page);
 next:
 		/* Huge page is mapped? No need to proceed. */
-		if (pmd_trans_huge(*fe->pmd))
+		if (pmd_trans_huge(*vmf->pmd))
 			break;
 		if (iter.index == end_pgoff)
 			break;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 53ae6d00656a..4fed6ac52318 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -469,13 +469,13 @@ void prep_transhuge_page(struct page *page)
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
 }
 
-static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
+static int __do_huge_pmd_anonymous_page(struct vm_fault *vmf, struct page *page,
 		gfp_t gfp)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct mem_cgroup *memcg;
 	pgtable_t pgtable;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
@@ -500,9 +500,9 @@ static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
 	 */
 	__SetPageUptodate(page);
 
-	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
-	if (unlikely(!pmd_none(*fe->pmd))) {
-		spin_unlock(fe->ptl);
+	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_none(*vmf->pmd))) {
+		spin_unlock(vmf->ptl);
 		mem_cgroup_cancel_charge(page, memcg, true);
 		put_page(page);
 		pte_free(vma->vm_mm, pgtable);
@@ -513,11 +513,11 @@ static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
 		if (userfaultfd_missing(vma)) {
 			int ret;
 
-			spin_unlock(fe->ptl);
+			spin_unlock(vmf->ptl);
 			mem_cgroup_cancel_charge(page, memcg, true);
 			put_page(page);
 			pte_free(vma->vm_mm, pgtable);
-			ret = handle_userfault(fe, VM_UFFD_MISSING);
+			ret = handle_userfault(vmf, VM_UFFD_MISSING);
 			VM_BUG_ON(ret & VM_FAULT_FALLBACK);
 			return ret;
 		}
@@ -527,11 +527,11 @@ static int __do_huge_pmd_anonymous_page(struct fault_env *fe, struct page *page,
 		page_add_new_anon_rmap(page, vma, haddr, true);
 		mem_cgroup_commit_charge(page, memcg, false, true);
 		lru_cache_add_active_or_unevictable(page, vma);
-		pgtable_trans_huge_deposit(vma->vm_mm, fe->pmd, pgtable);
-		set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry);
+		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
+		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
 		add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		atomic_long_inc(&vma->vm_mm->nr_ptes);
-		spin_unlock(fe->ptl);
+		spin_unlock(vmf->ptl);
 		count_vm_event(THP_FAULT_ALLOC);
 	}
 
@@ -578,12 +578,12 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
 	return true;
 }
 
-int do_huge_pmd_anonymous_page(struct fault_env *fe)
+int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	gfp_t gfp;
 	struct page *page;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
 		return VM_FAULT_FALLBACK;
@@ -591,7 +591,7 @@ int do_huge_pmd_anonymous_page(struct fault_env *fe)
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
 		return VM_FAULT_OOM;
-	if (!(fe->flags & FAULT_FLAG_WRITE) &&
+	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm) &&
 			transparent_hugepage_use_zero_page()) {
 		pgtable_t pgtable;
@@ -607,22 +607,22 @@ int do_huge_pmd_anonymous_page(struct fault_env *fe)
 			count_vm_event(THP_FAULT_FALLBACK);
 			return VM_FAULT_FALLBACK;
 		}
-		fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
+		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
 		ret = 0;
 		set = false;
-		if (pmd_none(*fe->pmd)) {
+		if (pmd_none(*vmf->pmd)) {
 			if (userfaultfd_missing(vma)) {
-				spin_unlock(fe->ptl);
-				ret = handle_userfault(fe, VM_UFFD_MISSING);
+				spin_unlock(vmf->ptl);
+				ret = handle_userfault(vmf, VM_UFFD_MISSING);
 				VM_BUG_ON(ret & VM_FAULT_FALLBACK);
 			} else {
 				set_huge_zero_page(pgtable, vma->vm_mm, vma,
-						   haddr, fe->pmd, zero_page);
-				spin_unlock(fe->ptl);
+						   haddr, vmf->pmd, zero_page);
+				spin_unlock(vmf->ptl);
 				set = true;
 			}
 		} else
-			spin_unlock(fe->ptl);
+			spin_unlock(vmf->ptl);
 		if (!set) {
 			pte_free(vma->vm_mm, pgtable);
 			put_huge_zero_page();
@@ -636,7 +636,7 @@ int do_huge_pmd_anonymous_page(struct fault_env *fe)
 		return VM_FAULT_FALLBACK;
 	}
 	prep_transhuge_page(page);
-	return __do_huge_pmd_anonymous_page(fe, page, gfp);
+	return __do_huge_pmd_anonymous_page(vmf, page, gfp);
 }
 
 static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
@@ -807,30 +807,30 @@ out:
 	return ret;
 }
 
-void huge_pmd_set_accessed(struct fault_env *fe, pmd_t orig_pmd)
+void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd)
 {
 	pmd_t entry;
 	unsigned long haddr;
 
-	fe->ptl = pmd_lock(fe->vma->vm_mm, fe->pmd);
-	if (unlikely(!pmd_same(*fe->pmd, orig_pmd)))
+	vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
 		goto unlock;
 
 	entry = pmd_mkyoung(orig_pmd);
-	haddr = fe->address & HPAGE_PMD_MASK;
-	if (pmdp_set_access_flags(fe->vma, haddr, fe->pmd, entry,
-				fe->flags & FAULT_FLAG_WRITE))
-		update_mmu_cache_pmd(fe->vma, fe->address, fe->pmd);
+	haddr = vmf->address & HPAGE_PMD_MASK;
+	if (pmdp_set_access_flags(vmf->vma, haddr, vmf->pmd, entry,
+				vmf->flags & FAULT_FLAG_WRITE))
+		update_mmu_cache_pmd(vmf->vma, vmf->address, vmf->pmd);
 
 unlock:
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 }
 
-static int do_huge_pmd_wp_page_fallback(struct fault_env *fe, pmd_t orig_pmd,
+static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
 		struct page *page)
 {
-	struct vm_area_struct *vma = fe->vma;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	struct vm_area_struct *vma = vmf->vma;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	struct mem_cgroup *memcg;
 	pgtable_t pgtable;
 	pmd_t _pmd;
@@ -849,7 +849,7 @@ static int do_huge_pmd_wp_page_fallback(struct fault_env *fe, pmd_t orig_pmd,
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE |
 					       __GFP_OTHER_NODE, vma,
-					       fe->address, page_to_nid(page));
+					       vmf->address, page_to_nid(page));
 		if (unlikely(!pages[i] ||
 			     mem_cgroup_try_charge(pages[i], vma->vm_mm,
 				     GFP_KERNEL, &memcg, false))) {
@@ -880,15 +880,15 @@ static int do_huge_pmd_wp_page_fallback(struct fault_env *fe, pmd_t orig_pmd,
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
 	mmu_notifier_invalidate_range_start(vma->vm_mm, mmun_start, mmun_end);
 
-	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
-	if (unlikely(!pmd_same(*fe->pmd, orig_pmd)))
+	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
 		goto out_free_pages;
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 
-	pmdp_huge_clear_flush_notify(vma, haddr, fe->pmd);
+	pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
 	/* leave pmd empty until pte is filled */
 
-	pgtable = pgtable_trans_huge_withdraw(vma->vm_mm, fe->pmd);
+	pgtable = pgtable_trans_huge_withdraw(vma->vm_mm, vmf->pmd);
 	pmd_populate(vma->vm_mm, &_pmd, pgtable);
 
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
@@ -897,20 +897,20 @@ static int do_huge_pmd_wp_page_fallback(struct fault_env *fe, pmd_t orig_pmd,
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
-		page_add_new_anon_rmap(pages[i], fe->vma, haddr, false);
+		page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false);
 		mem_cgroup_commit_charge(pages[i], memcg, false, false);
 		lru_cache_add_active_or_unevictable(pages[i], vma);
-		fe->pte = pte_offset_map(&_pmd, haddr);
-		VM_BUG_ON(!pte_none(*fe->pte));
-		set_pte_at(vma->vm_mm, haddr, fe->pte, entry);
-		pte_unmap(fe->pte);
+		vmf->pte = pte_offset_map(&_pmd, haddr);
+		VM_BUG_ON(!pte_none(*vmf->pte));
+		set_pte_at(vma->vm_mm, haddr, vmf->pte, entry);
+		pte_unmap(vmf->pte);
 	}
 	kfree(pages);
 
 	smp_wmb(); /* make pte visible before pmd */
-	pmd_populate(vma->vm_mm, fe->pmd, pgtable);
+	pmd_populate(vma->vm_mm, vmf->pmd, pgtable);
 	page_remove_rmap(page, true);
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 
 	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end);
 
@@ -921,7 +921,7 @@ out:
 	return ret;
 
 out_free_pages:
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end);
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		memcg = (void *)page_private(pages[i]);
@@ -933,23 +933,23 @@ out_free_pages:
 	goto out;
 }
 
-int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
+int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *page = NULL, *new_page;
 	struct mem_cgroup *memcg;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	unsigned long mmun_start;	/* For mmu_notifiers */
 	unsigned long mmun_end;		/* For mmu_notifiers */
 	gfp_t huge_gfp;			/* for allocation and charge */
 	int ret = 0;
 
-	fe->ptl = pmd_lockptr(vma->vm_mm, fe->pmd);
+	vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd);
 	VM_BUG_ON_VMA(!vma->anon_vma, vma);
 	if (is_huge_zero_pmd(orig_pmd))
 		goto alloc;
-	spin_lock(fe->ptl);
-	if (unlikely(!pmd_same(*fe->pmd, orig_pmd)))
+	spin_lock(vmf->ptl);
+	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
 		goto out_unlock;
 
 	page = pmd_page(orig_pmd);
@@ -962,13 +962,13 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
 		pmd_t entry;
 		entry = pmd_mkyoung(orig_pmd);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-		if (pmdp_set_access_flags(vma, haddr, fe->pmd, entry,  1))
-			update_mmu_cache_pmd(vma, fe->address, fe->pmd);
+		if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry,  1))
+			update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 		ret |= VM_FAULT_WRITE;
 		goto out_unlock;
 	}
 	get_page(page);
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 alloc:
 	if (transparent_hugepage_enabled(vma) &&
 	    !transparent_hugepage_debug_cow()) {
@@ -981,12 +981,12 @@ alloc:
 		prep_transhuge_page(new_page);
 	} else {
 		if (!page) {
-			split_huge_pmd(vma, fe->pmd, fe->address);
+			split_huge_pmd(vma, vmf->pmd, vmf->address);
 			ret |= VM_FAULT_FALLBACK;
 		} else {
-			ret = do_huge_pmd_wp_page_fallback(fe, orig_pmd, page);
+			ret = do_huge_pmd_wp_page_fallback(vmf, orig_pmd, page);
 			if (ret & VM_FAULT_OOM) {
-				split_huge_pmd(vma, fe->pmd, fe->address);
+				split_huge_pmd(vma, vmf->pmd, vmf->address);
 				ret |= VM_FAULT_FALLBACK;
 			}
 			put_page(page);
@@ -998,7 +998,7 @@ alloc:
 	if (unlikely(mem_cgroup_try_charge(new_page, vma->vm_mm,
 					huge_gfp, &memcg, true))) {
 		put_page(new_page);
-		split_huge_pmd(vma, fe->pmd, fe->address);
+		split_huge_pmd(vma, vmf->pmd, vmf->address);
 		if (page)
 			put_page(page);
 		ret |= VM_FAULT_FALLBACK;
@@ -1018,11 +1018,11 @@ alloc:
 	mmun_end   = haddr + HPAGE_PMD_SIZE;
 	mmu_notifier_invalidate_range_start(vma->vm_mm, mmun_start, mmun_end);
 
-	spin_lock(fe->ptl);
+	spin_lock(vmf->ptl);
 	if (page)
 		put_page(page);
-	if (unlikely(!pmd_same(*fe->pmd, orig_pmd))) {
-		spin_unlock(fe->ptl);
+	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) {
+		spin_unlock(vmf->ptl);
 		mem_cgroup_cancel_charge(new_page, memcg, true);
 		put_page(new_page);
 		goto out_mn;
@@ -1030,12 +1030,12 @@ alloc:
 		pmd_t entry;
 		entry = mk_huge_pmd(new_page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-		pmdp_huge_clear_flush_notify(vma, haddr, fe->pmd);
+		pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
 		page_add_new_anon_rmap(new_page, vma, haddr, true);
 		mem_cgroup_commit_charge(new_page, memcg, false, true);
 		lru_cache_add_active_or_unevictable(new_page, vma);
-		set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry);
-		update_mmu_cache_pmd(vma, fe->address, fe->pmd);
+		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
+		update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 		if (!page) {
 			add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 			put_huge_zero_page();
@@ -1046,13 +1046,13 @@ alloc:
 		}
 		ret |= VM_FAULT_WRITE;
 	}
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 out_mn:
 	mmu_notifier_invalidate_range_end(vma->vm_mm, mmun_start, mmun_end);
 out:
 	return ret;
 out_unlock:
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 	return ret;
 }
 
@@ -1125,12 +1125,12 @@ out:
 }
 
 /* NUMA hinting page fault entry point for trans huge pmds */
-int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
+int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct anon_vma *anon_vma = NULL;
 	struct page *page;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	int page_nid = -1, this_nid = numa_node_id();
 	int target_nid, last_cpupid = -1;
 	bool page_locked;
@@ -1138,8 +1138,8 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	bool was_writable;
 	int flags = 0;
 
-	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
-	if (unlikely(!pmd_same(pmd, *fe->pmd)))
+	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_same(pmd, *vmf->pmd)))
 		goto out_unlock;
 
 	/*
@@ -1147,9 +1147,9 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	 * without disrupting NUMA hinting information. Do not relock and
 	 * check_same as the page may no longer be mapped.
 	 */
-	if (unlikely(pmd_trans_migrating(*fe->pmd))) {
-		page = pmd_page(*fe->pmd);
-		spin_unlock(fe->ptl);
+	if (unlikely(pmd_trans_migrating(*vmf->pmd))) {
+		page = pmd_page(*vmf->pmd);
+		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
 		goto out;
 	}
@@ -1182,7 +1182,7 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 
 	/* Migration could have started since the pmd_trans_migrating check */
 	if (!page_locked) {
-		spin_unlock(fe->ptl);
+		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
 		page_nid = -1;
 		goto out;
@@ -1193,12 +1193,12 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	 * to serialises splits
 	 */
 	get_page(page);
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 	anon_vma = page_lock_anon_vma_read(page);
 
 	/* Confirm the PMD did not change while page_table_lock was released */
-	spin_lock(fe->ptl);
-	if (unlikely(!pmd_same(pmd, *fe->pmd))) {
+	spin_lock(vmf->ptl);
+	if (unlikely(!pmd_same(pmd, *vmf->pmd))) {
 		unlock_page(page);
 		put_page(page);
 		page_nid = -1;
@@ -1216,9 +1216,9 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	 * Migrate the THP to the requested node, returns with page unlocked
 	 * and access rights restored.
 	 */
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 	migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
-				fe->pmd, pmd, fe->address, page, target_nid);
+				vmf->pmd, pmd, vmf->address, page, target_nid);
 	if (migrated) {
 		flags |= TNF_MIGRATED;
 		page_nid = target_nid;
@@ -1233,18 +1233,19 @@ clear_pmdnuma:
 	pmd = pmd_mkyoung(pmd);
 	if (was_writable)
 		pmd = pmd_mkwrite(pmd);
-	set_pmd_at(vma->vm_mm, haddr, fe->pmd, pmd);
-	update_mmu_cache_pmd(vma, fe->address, fe->pmd);
+	set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
+	update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 	unlock_page(page);
 out_unlock:
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 
 out:
 	if (anon_vma)
 		page_unlock_anon_vma_read(anon_vma);
 
 	if (page_nid != -1)
-		task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, fe->flags);
+		task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
+				vmf->flags);
 
 	return 0;
 }
diff --git a/mm/internal.h b/mm/internal.h
index 1501304f87a4..cc80060914f6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -36,7 +36,7 @@
 /* Do not use these with a slab allocator */
 #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
 
-int do_swap_page(struct fault_env *fe, pte_t orig_pte);
+int do_swap_page(struct vm_fault *vmf, pte_t orig_pte);
 
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 728d7790dc2d..f88b2d3810a7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -875,7 +875,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 {
 	pte_t pteval;
 	int swapped_in = 0, ret = 0;
-	struct fault_env fe = {
+	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address,
 		.flags = FAULT_FLAG_ALLOW_RETRY,
@@ -887,19 +887,19 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 		trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
 		return false;
 	}
-	fe.pte = pte_offset_map(pmd, address);
-	for (; fe.address < address + HPAGE_PMD_NR*PAGE_SIZE;
-			fe.pte++, fe.address += PAGE_SIZE) {
-		pteval = *fe.pte;
+	vmf.pte = pte_offset_map(pmd, address);
+	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
+			vmf.pte++, vmf.address += PAGE_SIZE) {
+		pteval = *vmf.pte;
 		if (!is_swap_pte(pteval))
 			continue;
 		swapped_in++;
-		ret = do_swap_page(&fe, pteval);
+		ret = do_swap_page(&vmf, pteval);
 
 		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
 		if (ret & VM_FAULT_RETRY) {
 			down_read(&mm->mmap_sem);
-			if (hugepage_vma_revalidate(mm, address, &fe.vma)) {
+			if (hugepage_vma_revalidate(mm, address, &vmf.vma)) {
 				/* vma is no longer available, don't continue to swapin */
 				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
 				return false;
@@ -913,10 +913,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 			return false;
 		}
 		/* pte is unmapped now, we need to map it */
-		fe.pte = pte_offset_map(pmd, fe.address);
+		vmf.pte = pte_offset_map(pmd, vmf.address);
 	}
-	fe.pte--;
-	pte_unmap(fe.pte);
+	vmf.pte--;
+	pte_unmap(vmf.pte);
 	trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1);
 	return true;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 406b8728e141..447a1ef4a9e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2070,11 +2070,11 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
+static inline int wp_page_reuse(struct vm_fault *vmf, pte_t orig_pte,
 			struct page *page, int page_mkwrite, int dirty_shared)
-	__releases(fe->ptl)
+	__releases(vmf->ptl)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	pte_t entry;
 	/*
 	 * Clear the pages cpupid information as the existing
@@ -2084,12 +2084,12 @@ static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
 	if (page)
 		page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1);
 
-	flush_cache_page(vma, fe->address, pte_pfn(orig_pte));
+	flush_cache_page(vma, vmf->address, pte_pfn(orig_pte));
 	entry = pte_mkyoung(orig_pte);
 	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-	if (ptep_set_access_flags(vma, fe->address, fe->pte, entry, 1))
-		update_mmu_cache(vma, fe->address, fe->pte);
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
+		update_mmu_cache(vma, vmf->address, vmf->pte);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 
 	if (dirty_shared) {
 		struct address_space *mapping;
@@ -2135,15 +2135,15 @@ static inline int wp_page_reuse(struct fault_env *fe, pte_t orig_pte,
  *   held to the old page, as well as updating the rmap.
  * - In any case, unlock the PTL and drop the reference we took to the old page.
  */
-static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
+static int wp_page_copy(struct vm_fault *vmf, pte_t orig_pte,
 		struct page *old_page)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	struct page *new_page = NULL;
 	pte_t entry;
 	int page_copied = 0;
-	const unsigned long mmun_start = fe->address & PAGE_MASK;
+	const unsigned long mmun_start = vmf->address & PAGE_MASK;
 	const unsigned long mmun_end = mmun_start + PAGE_SIZE;
 	struct mem_cgroup *memcg;
 
@@ -2151,15 +2151,16 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 		goto oom;
 
 	if (is_zero_pfn(pte_pfn(orig_pte))) {
-		new_page = alloc_zeroed_user_highpage_movable(vma, fe->address);
+		new_page = alloc_zeroed_user_highpage_movable(vma,
+							      vmf->address);
 		if (!new_page)
 			goto oom;
 	} else {
 		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
-				fe->address);
+				vmf->address);
 		if (!new_page)
 			goto oom;
-		cow_user_page(new_page, old_page, fe->address, vma);
+		cow_user_page(new_page, old_page, vmf->address, vma);
 	}
 
 	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg, false))
@@ -2172,8 +2173,8 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 	/*
 	 * Re-check the pte - we dropped the lock
 	 */
-	fe->pte = pte_offset_map_lock(mm, fe->pmd, fe->address, &fe->ptl);
-	if (likely(pte_same(*fe->pte, orig_pte))) {
+	vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl);
+	if (likely(pte_same(*vmf->pte, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
 				dec_mm_counter_fast(mm,
@@ -2183,7 +2184,7 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 		} else {
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
 		}
-		flush_cache_page(vma, fe->address, pte_pfn(orig_pte));
+		flush_cache_page(vma, vmf->address, pte_pfn(orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		/*
@@ -2192,8 +2193,8 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush_notify(vma, fe->address, fe->pte);
-		page_add_new_anon_rmap(new_page, vma, fe->address, false);
+		ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
+		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(new_page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 		/*
@@ -2201,8 +2202,8 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 		 * mmu page tables (such as kvm shadow page tables), we want the
 		 * new page to be mapped directly into the secondary page table.
 		 */
-		set_pte_at_notify(mm, fe->address, fe->pte, entry);
-		update_mmu_cache(vma, fe->address, fe->pte);
+		set_pte_at_notify(mm, vmf->address, vmf->pte, entry);
+		update_mmu_cache(vma, vmf->address, vmf->pte);
 		if (old_page) {
 			/*
 			 * Only after switching the pte to the new page may
@@ -2239,7 +2240,7 @@ static int wp_page_copy(struct fault_env *fe, pte_t orig_pte,
 	if (new_page)
 		put_page(new_page);
 
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 	if (old_page) {
 		/*
@@ -2267,42 +2268,42 @@ oom:
  * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED
  * mapping
  */
-static int wp_pfn_shared(struct fault_env *fe,  pte_t orig_pte)
+static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
-		struct vm_fault vmf = {
+		struct vm_fault vmf2 = {
 			.page = NULL,
-			.pgoff = linear_page_index(vma, fe->address),
-			.virtual_address = fe->address & PAGE_MASK,
+			.pgoff = linear_page_index(vma, vmf->address),
+			.virtual_address = vmf->address & PAGE_MASK,
 			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
 		};
 		int ret;
 
-		pte_unmap_unlock(fe->pte, fe->ptl);
-		ret = vma->vm_ops->pfn_mkwrite(vma, &vmf);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		ret = vma->vm_ops->pfn_mkwrite(vma, &vmf2);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
-		fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-				&fe->ptl);
+		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+				vmf->address, &vmf->ptl);
 		/*
 		 * We might have raced with another page fault while we
 		 * released the pte_offset_map_lock.
 		 */
-		if (!pte_same(*fe->pte, orig_pte)) {
-			pte_unmap_unlock(fe->pte, fe->ptl);
+		if (!pte_same(*vmf->pte, orig_pte)) {
+			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			return 0;
 		}
 	}
-	return wp_page_reuse(fe, orig_pte, NULL, 0, 0);
+	return wp_page_reuse(vmf, orig_pte, NULL, 0, 0);
 }
 
-static int wp_page_shared(struct fault_env *fe, pte_t orig_pte,
+static int wp_page_shared(struct vm_fault *vmf, pte_t orig_pte,
 		struct page *old_page)
-	__releases(fe->ptl)
+	__releases(vmf->ptl)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	int page_mkwrite = 0;
 
 	get_page(old_page);
@@ -2310,8 +2311,8 @@ static int wp_page_shared(struct fault_env *fe, pte_t orig_pte,
 	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
 		int tmp;
 
-		pte_unmap_unlock(fe->pte, fe->ptl);
-		tmp = do_page_mkwrite(vma, old_page, fe->address);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		tmp = do_page_mkwrite(vma, old_page, vmf->address);
 		if (unlikely(!tmp || (tmp &
 				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			put_page(old_page);
@@ -2323,18 +2324,18 @@ static int wp_page_shared(struct fault_env *fe, pte_t orig_pte,
 		 * they did, we just return, as we can count on the
 		 * MMU to tell us if they didn't also make it writable.
 		 */
-		fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-						 &fe->ptl);
-		if (!pte_same(*fe->pte, orig_pte)) {
+		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+						vmf->address, &vmf->ptl);
+		if (!pte_same(*vmf->pte, orig_pte)) {
 			unlock_page(old_page);
-			pte_unmap_unlock(fe->pte, fe->ptl);
+			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			put_page(old_page);
 			return 0;
 		}
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(fe, orig_pte, old_page, page_mkwrite, 1);
+	return wp_page_reuse(vmf, orig_pte, old_page, page_mkwrite, 1);
 }
 
 /*
@@ -2355,13 +2356,13 @@ static int wp_page_shared(struct fault_env *fe, pte_t orig_pte,
  * but allow concurrent faults), with pte both mapped and locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
-static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
-	__releases(fe->ptl)
+static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
+	__releases(vmf->ptl)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *old_page;
 
-	old_page = vm_normal_page(vma, fe->address, orig_pte);
+	old_page = vm_normal_page(vma, vmf->address, orig_pte);
 	if (!old_page) {
 		/*
 		 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
@@ -2372,10 +2373,10 @@ static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
 		 */
 		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 				     (VM_WRITE|VM_SHARED))
-			return wp_pfn_shared(fe, orig_pte);
+			return wp_pfn_shared(vmf, orig_pte);
 
-		pte_unmap_unlock(fe->pte, fe->ptl);
-		return wp_page_copy(fe, orig_pte, old_page);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return wp_page_copy(vmf, orig_pte, old_page);
 	}
 
 	/*
@@ -2386,13 +2387,13 @@ static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
 		int total_mapcount;
 		if (!trylock_page(old_page)) {
 			get_page(old_page);
-			pte_unmap_unlock(fe->pte, fe->ptl);
+			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			lock_page(old_page);
-			fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd,
-					fe->address, &fe->ptl);
-			if (!pte_same(*fe->pte, orig_pte)) {
+			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+					vmf->address, &vmf->ptl);
+			if (!pte_same(*vmf->pte, orig_pte)) {
 				unlock_page(old_page);
-				pte_unmap_unlock(fe->pte, fe->ptl);
+				pte_unmap_unlock(vmf->pte, vmf->ptl);
 				put_page(old_page);
 				return 0;
 			}
@@ -2410,12 +2411,12 @@ static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
 				page_move_anon_rmap(old_page, vma);
 			}
 			unlock_page(old_page);
-			return wp_page_reuse(fe, orig_pte, old_page, 0, 0);
+			return wp_page_reuse(vmf, orig_pte, old_page, 0, 0);
 		}
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		return wp_page_shared(fe, orig_pte, old_page);
+		return wp_page_shared(vmf, orig_pte, old_page);
 	}
 
 	/*
@@ -2423,8 +2424,8 @@ static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
 	 */
 	get_page(old_page);
 
-	pte_unmap_unlock(fe->pte, fe->ptl);
-	return wp_page_copy(fe, orig_pte, old_page);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
+	return wp_page_copy(vmf, orig_pte, old_page);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
@@ -2512,9 +2513,9 @@ EXPORT_SYMBOL(unmap_mapping_range);
  * We return with the mmap_sem locked or unlocked in the same cases
  * as does filemap_fault().
  */
-int do_swap_page(struct fault_env *fe, pte_t orig_pte)
+int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *page, *swapcache;
 	struct mem_cgroup *memcg;
 	swp_entry_t entry;
@@ -2523,17 +2524,18 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	int exclusive = 0;
 	int ret = 0;
 
-	if (!pte_unmap_same(vma->vm_mm, fe->pmd, fe->pte, orig_pte))
+	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, orig_pte))
 		goto out;
 
 	entry = pte_to_swp_entry(orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {
-			migration_entry_wait(vma->vm_mm, fe->pmd, fe->address);
+			migration_entry_wait(vma->vm_mm, vmf->pmd,
+					     vmf->address);
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
 		} else {
-			print_bad_pte(vma, fe->address, orig_pte, NULL);
+			print_bad_pte(vma, vmf->address, orig_pte, NULL);
 			ret = VM_FAULT_SIGBUS;
 		}
 		goto out;
@@ -2541,16 +2543,16 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	page = lookup_swap_cache(entry);
 	if (!page) {
-		page = swapin_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vma, fe->address);
+		page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vma,
+					vmf->address);
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
 			 * while we released the pte lock.
 			 */
-			fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd,
-					fe->address, &fe->ptl);
-			if (likely(pte_same(*fe->pte, orig_pte)))
+			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+					vmf->address, &vmf->ptl);
+			if (likely(pte_same(*vmf->pte, orig_pte)))
 				ret = VM_FAULT_OOM;
 			delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 			goto unlock;
@@ -2572,7 +2574,7 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	}
 
 	swapcache = page;
-	locked = lock_page_or_retry(page, vma->vm_mm, fe->flags);
+	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 	if (!locked) {
@@ -2589,7 +2591,7 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
 		goto out_page;
 
-	page = ksm_might_need_to_copy(page, vma, fe->address);
+	page = ksm_might_need_to_copy(page, vma, vmf->address);
 	if (unlikely(!page)) {
 		ret = VM_FAULT_OOM;
 		page = swapcache;
@@ -2605,9 +2607,9 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	/*
 	 * Back out if somebody else already faulted in this pte.
 	 */
-	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-			&fe->ptl);
-	if (unlikely(!pte_same(*fe->pte, orig_pte)))
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
+			&vmf->ptl);
+	if (unlikely(!pte_same(*vmf->pte, orig_pte)))
 		goto out_nomap;
 
 	if (unlikely(!PageUptodate(page))) {
@@ -2628,22 +2630,22 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
-	if ((fe->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
+	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
-		fe->flags &= ~FAULT_FLAG_WRITE;
+		vmf->flags &= ~FAULT_FLAG_WRITE;
 		ret |= VM_FAULT_WRITE;
 		exclusive = RMAP_EXCLUSIVE;
 	}
 	flush_icache_page(vma, page);
 	if (pte_swp_soft_dirty(orig_pte))
 		pte = pte_mksoft_dirty(pte);
-	set_pte_at(vma->vm_mm, fe->address, fe->pte, pte);
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	if (page == swapcache) {
-		do_page_add_anon_rmap(page, vma, fe->address, exclusive);
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		mem_cgroup_commit_charge(page, memcg, true, false);
 		activate_page(page);
 	} else { /* ksm created a completely new copy */
-		page_add_new_anon_rmap(page, vma, fe->address, false);
+		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
@@ -2666,22 +2668,22 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 		put_page(swapcache);
 	}
 
-	if (fe->flags & FAULT_FLAG_WRITE) {
-		ret |= do_wp_page(fe, pte);
+	if (vmf->flags & FAULT_FLAG_WRITE) {
+		ret |= do_wp_page(vmf, pte);
 		if (ret & VM_FAULT_ERROR)
 			ret &= VM_FAULT_ERROR;
 		goto out;
 	}
 
 	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(vma, fe->address, fe->pte);
+	update_mmu_cache(vma, vmf->address, vmf->pte);
 unlock:
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
 	return ret;
 out_nomap:
 	mem_cgroup_cancel_charge(page, memcg, false);
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out_page:
 	unlock_page(page);
 out_release:
@@ -2732,9 +2734,9 @@ static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned lo
  * but allow concurrent faults), and pte mapped but not yet locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
-static int do_anonymous_page(struct fault_env *fe)
+static int do_anonymous_page(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct mem_cgroup *memcg;
 	struct page *page;
 	pte_t entry;
@@ -2744,7 +2746,7 @@ static int do_anonymous_page(struct fault_env *fe)
 		return VM_FAULT_SIGBUS;
 
 	/* Check if we need to add a guard page to the stack */
-	if (check_stack_guard_page(vma, fe->address) < 0)
+	if (check_stack_guard_page(vma, vmf->address) < 0)
 		return VM_FAULT_SIGSEGV;
 
 	/*
@@ -2757,26 +2759,26 @@ static int do_anonymous_page(struct fault_env *fe)
 	 *
 	 * Here we only have down_read(mmap_sem).
 	 */
-	if (pte_alloc(vma->vm_mm, fe->pmd, fe->address))
+	if (pte_alloc(vma->vm_mm, vmf->pmd, vmf->address))
 		return VM_FAULT_OOM;
 
 	/* See the comment in pte_alloc_one_map() */
-	if (unlikely(pmd_trans_unstable(fe->pmd)))
+	if (unlikely(pmd_trans_unstable(vmf->pmd)))
 		return 0;
 
 	/* Use the zero-page for reads */
-	if (!(fe->flags & FAULT_FLAG_WRITE) &&
+	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
 			!mm_forbids_zeropage(vma->vm_mm)) {
-		entry = pte_mkspecial(pfn_pte(my_zero_pfn(fe->address),
+		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
 						vma->vm_page_prot));
-		fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-				&fe->ptl);
-		if (!pte_none(*fe->pte))
+		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+				vmf->address, &vmf->ptl);
+		if (!pte_none(*vmf->pte))
 			goto unlock;
 		/* Deliver the page fault to userland, check inside PT lock */
 		if (userfaultfd_missing(vma)) {
-			pte_unmap_unlock(fe->pte, fe->ptl);
-			return handle_userfault(fe, VM_UFFD_MISSING);
+			pte_unmap_unlock(vmf->pte, vmf->ptl);
+			return handle_userfault(vmf, VM_UFFD_MISSING);
 		}
 		goto setpte;
 	}
@@ -2784,7 +2786,7 @@ static int do_anonymous_page(struct fault_env *fe)
 	/* Allocate our own private page. */
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
-	page = alloc_zeroed_user_highpage_movable(vma, fe->address);
+	page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
 	if (!page)
 		goto oom;
 
@@ -2802,30 +2804,30 @@ static int do_anonymous_page(struct fault_env *fe)
 	if (vma->vm_flags & VM_WRITE)
 		entry = pte_mkwrite(pte_mkdirty(entry));
 
-	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-			&fe->ptl);
-	if (!pte_none(*fe->pte))
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
+			&vmf->ptl);
+	if (!pte_none(*vmf->pte))
 		goto release;
 
 	/* Deliver the page fault to userland, check inside PT lock */
 	if (userfaultfd_missing(vma)) {
-		pte_unmap_unlock(fe->pte, fe->ptl);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		mem_cgroup_cancel_charge(page, memcg, false);
 		put_page(page);
-		return handle_userfault(fe, VM_UFFD_MISSING);
+		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, vma, fe->address, false);
+	page_add_new_anon_rmap(page, vma, vmf->address, false);
 	mem_cgroup_commit_charge(page, memcg, false, false);
 	lru_cache_add_active_or_unevictable(page, vma);
 setpte:
-	set_pte_at(vma->vm_mm, fe->address, fe->pte, entry);
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
 	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(vma, fe->address, fe->pte);
+	update_mmu_cache(vma, vmf->address, vmf->pte);
 unlock:
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return 0;
 release:
 	mem_cgroup_cancel_charge(page, memcg, false);
@@ -2842,62 +2844,62 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct fault_env *fe, pgoff_t pgoff,
+static int __do_fault(struct vm_fault *vmf, pgoff_t pgoff,
 		struct page *cow_page, struct page **page, void **entry)
 {
-	struct vm_area_struct *vma = fe->vma;
-	struct vm_fault vmf;
+	struct vm_area_struct *vma = vmf->vma;
+	struct vm_fault vmf2;
 	int ret;
 
-	vmf.virtual_address = fe->address & PAGE_MASK;
-	vmf.pgoff = pgoff;
-	vmf.flags = fe->flags;
-	vmf.page = NULL;
-	vmf.gfp_mask = __get_fault_gfp_mask(vma);
-	vmf.cow_page = cow_page;
+	vmf2.virtual_address = vmf->address & PAGE_MASK;
+	vmf2.pgoff = pgoff;
+	vmf2.flags = vmf->flags;
+	vmf2.page = NULL;
+	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
+	vmf2.cow_page = cow_page;
 
-	ret = vma->vm_ops->fault(vma, &vmf);
+	ret = vma->vm_ops->fault(vma, &vmf2);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 	if (ret & VM_FAULT_DAX_LOCKED) {
-		*entry = vmf.entry;
+		*entry = vmf2.entry;
 		return ret;
 	}
 
-	if (unlikely(PageHWPoison(vmf.page))) {
+	if (unlikely(PageHWPoison(vmf2.page))) {
 		if (ret & VM_FAULT_LOCKED)
-			unlock_page(vmf.page);
-		put_page(vmf.page);
+			unlock_page(vmf2.page);
+		put_page(vmf2.page);
 		return VM_FAULT_HWPOISON;
 	}
 
 	if (unlikely(!(ret & VM_FAULT_LOCKED)))
-		lock_page(vmf.page);
+		lock_page(vmf2.page);
 	else
-		VM_BUG_ON_PAGE(!PageLocked(vmf.page), vmf.page);
+		VM_BUG_ON_PAGE(!PageLocked(vmf2.page), vmf2.page);
 
-	*page = vmf.page;
+	*page = vmf2.page;
 	return ret;
 }
 
-static int pte_alloc_one_map(struct fault_env *fe)
+static int pte_alloc_one_map(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 
-	if (!pmd_none(*fe->pmd))
+	if (!pmd_none(*vmf->pmd))
 		goto map_pte;
-	if (fe->prealloc_pte) {
-		fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
-		if (unlikely(!pmd_none(*fe->pmd))) {
-			spin_unlock(fe->ptl);
+	if (vmf->prealloc_pte) {
+		vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+		if (unlikely(!pmd_none(*vmf->pmd))) {
+			spin_unlock(vmf->ptl);
 			goto map_pte;
 		}
 
 		atomic_long_inc(&vma->vm_mm->nr_ptes);
-		pmd_populate(vma->vm_mm, fe->pmd, fe->prealloc_pte);
-		spin_unlock(fe->ptl);
-		fe->prealloc_pte = 0;
-	} else if (unlikely(pte_alloc(vma->vm_mm, fe->pmd, fe->address))) {
+		pmd_populate(vma->vm_mm, vmf->pmd, vmf->prealloc_pte);
+		spin_unlock(vmf->ptl);
+		vmf->prealloc_pte = 0;
+	} else if (unlikely(pte_alloc(vma->vm_mm, vmf->pmd, vmf->address))) {
 		return VM_FAULT_OOM;
 	}
 map_pte:
@@ -2912,11 +2914,11 @@ map_pte:
 	 * through an atomic read in C, which is what pmd_trans_unstable()
 	 * provides.
 	 */
-	if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+	if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
 		return VM_FAULT_NOPAGE;
 
-	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
-			&fe->ptl);
+	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
+			&vmf->ptl);
 	return 0;
 }
 
@@ -2934,11 +2936,11 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 	return true;
 }
 
-static int do_set_pmd(struct fault_env *fe, struct page *page)
+static int do_set_pmd(struct vm_fault *vmf, struct page *page)
 {
-	struct vm_area_struct *vma = fe->vma;
-	bool write = fe->flags & FAULT_FLAG_WRITE;
-	unsigned long haddr = fe->address & HPAGE_PMD_MASK;
+	struct vm_area_struct *vma = vmf->vma;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	pmd_t entry;
 	int i, ret;
 
@@ -2948,8 +2950,8 @@ static int do_set_pmd(struct fault_env *fe, struct page *page)
 	ret = VM_FAULT_FALLBACK;
 	page = compound_head(page);
 
-	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
-	if (unlikely(!pmd_none(*fe->pmd)))
+	vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+	if (unlikely(!pmd_none(*vmf->pmd)))
 		goto out;
 
 	for (i = 0; i < HPAGE_PMD_NR; i++)
@@ -2962,19 +2964,19 @@ static int do_set_pmd(struct fault_env *fe, struct page *page)
 	add_mm_counter(vma->vm_mm, MM_FILEPAGES, HPAGE_PMD_NR);
 	page_add_file_rmap(page, true);
 
-	set_pmd_at(vma->vm_mm, haddr, fe->pmd, entry);
+	set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
 
-	update_mmu_cache_pmd(vma, haddr, fe->pmd);
+	update_mmu_cache_pmd(vma, haddr, vmf->pmd);
 
 	/* fault is handled */
 	ret = 0;
 	count_vm_event(THP_FILE_MAPPED);
 out:
-	spin_unlock(fe->ptl);
+	spin_unlock(vmf->ptl);
 	return ret;
 }
 #else
-static int do_set_pmd(struct fault_env *fe, struct page *page)
+static int do_set_pmd(struct vm_fault *vmf, struct page *page)
 {
 	BUILD_BUG();
 	return 0;
@@ -2985,41 +2987,42 @@ static int do_set_pmd(struct fault_env *fe, struct page *page)
  * alloc_set_pte - setup new PTE entry for given page and add reverse page
  * mapping. If needed, the fucntion allocates page table or use pre-allocated.
  *
- * @fe: fault environment
+ * @vmf: fault environment
  * @memcg: memcg to charge page (only for private mappings)
  * @page: page to map
  *
- * Caller must take care of unlocking fe->ptl, if fe->pte is non-NULL on return.
+ * Caller must take care of unlocking vmf->ptl, if vmf->pte is non-NULL on
+ * return.
  *
  * Target users are page handler itself and implementations of
  * vm_ops->map_pages.
  */
-int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg,
+int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page)
 {
-	struct vm_area_struct *vma = fe->vma;
-	bool write = fe->flags & FAULT_FLAG_WRITE;
+	struct vm_area_struct *vma = vmf->vma;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	pte_t entry;
 	int ret;
 
-	if (pmd_none(*fe->pmd) && PageTransCompound(page) &&
+	if (pmd_none(*vmf->pmd) && PageTransCompound(page) &&
 			IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) {
 		/* THP on COW? */
 		VM_BUG_ON_PAGE(memcg, page);
 
-		ret = do_set_pmd(fe, page);
+		ret = do_set_pmd(vmf, page);
 		if (ret != VM_FAULT_FALLBACK)
 			return ret;
 	}
 
-	if (!fe->pte) {
-		ret = pte_alloc_one_map(fe);
+	if (!vmf->pte) {
+		ret = pte_alloc_one_map(vmf);
 		if (ret)
 			return ret;
 	}
 
 	/* Re-check under ptl */
-	if (unlikely(!pte_none(*fe->pte)))
+	if (unlikely(!pte_none(*vmf->pte)))
 		return VM_FAULT_NOPAGE;
 
 	flush_icache_page(vma, page);
@@ -3029,17 +3032,17 @@ int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg,
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, fe->address, false);
+		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page, false);
 	}
-	set_pte_at(vma->vm_mm, fe->address, fe->pte, entry);
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
 	/* no need to invalidate: a not-present page won't be cached */
-	update_mmu_cache(vma, fe->address, fe->pte);
+	update_mmu_cache(vma, vmf->address, vmf->pte);
 
 	return 0;
 }
@@ -3108,17 +3111,17 @@ late_initcall(fault_around_debugfs);
  * fault_around_pages() value (and therefore to page order).  This way it's
  * easier to guarantee that we don't cross page table boundaries.
  */
-static int do_fault_around(struct fault_env *fe, pgoff_t start_pgoff)
+static int do_fault_around(struct vm_fault *vmf, pgoff_t start_pgoff)
 {
-	unsigned long address = fe->address, nr_pages, mask;
+	unsigned long address = vmf->address, nr_pages, mask;
 	pgoff_t end_pgoff;
 	int off, ret = 0;
 
 	nr_pages = READ_ONCE(fault_around_bytes) >> PAGE_SHIFT;
 	mask = ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK;
 
-	fe->address = max(address & mask, fe->vma->vm_start);
-	off = ((address - fe->address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
+	vmf->address = max(address & mask, vmf->vma->vm_start);
+	off = ((address - vmf->address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
 	start_pgoff -= off;
 
 	/*
@@ -3126,49 +3129,51 @@ static int do_fault_around(struct fault_env *fe, pgoff_t start_pgoff)
 	 *  or fault_around_pages() from start_pgoff, depending what is nearest.
 	 */
 	end_pgoff = start_pgoff -
-		((fe->address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
+		((vmf->address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
 		PTRS_PER_PTE - 1;
-	end_pgoff = min3(end_pgoff, vma_pages(fe->vma) + fe->vma->vm_pgoff - 1,
+	end_pgoff = min3(end_pgoff,
+			vma_pages(vmf->vma) + vmf->vma->vm_pgoff - 1,
 			start_pgoff + nr_pages - 1);
 
-	if (pmd_none(*fe->pmd)) {
-		fe->prealloc_pte = pte_alloc_one(fe->vma->vm_mm, fe->address);
-		if (!fe->prealloc_pte)
+	if (pmd_none(*vmf->pmd)) {
+		vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm,
+						  vmf->address);
+		if (!vmf->prealloc_pte)
 			goto out;
 		smp_wmb(); /* See comment in __pte_alloc() */
 	}
 
-	fe->vma->vm_ops->map_pages(fe, start_pgoff, end_pgoff);
+	vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
 
 	/* preallocated pagetable is unused: free it */
-	if (fe->prealloc_pte) {
-		pte_free(fe->vma->vm_mm, fe->prealloc_pte);
-		fe->prealloc_pte = 0;
+	if (vmf->prealloc_pte) {
+		pte_free(vmf->vma->vm_mm, vmf->prealloc_pte);
+		vmf->prealloc_pte = 0;
 	}
 	/* Huge page is mapped? Page fault is solved */
-	if (pmd_trans_huge(*fe->pmd)) {
+	if (pmd_trans_huge(*vmf->pmd)) {
 		ret = VM_FAULT_NOPAGE;
 		goto out;
 	}
 
 	/* ->map_pages() haven't done anything useful. Cold page cache? */
-	if (!fe->pte)
+	if (!vmf->pte)
 		goto out;
 
 	/* check if the page fault is solved */
-	fe->pte -= (fe->address >> PAGE_SHIFT) - (address >> PAGE_SHIFT);
-	if (!pte_none(*fe->pte))
+	vmf->pte -= (vmf->address >> PAGE_SHIFT) - (address >> PAGE_SHIFT);
+	if (!pte_none(*vmf->pte))
 		ret = VM_FAULT_NOPAGE;
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
-	fe->address = address;
-	fe->pte = NULL;
+	vmf->address = address;
+	vmf->pte = NULL;
 	return ret;
 }
 
-static int do_read_fault(struct fault_env *fe, pgoff_t pgoff)
+static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
 	int ret = 0;
 
@@ -3178,27 +3183,27 @@ static int do_read_fault(struct fault_env *fe, pgoff_t pgoff)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(fe, pgoff);
+		ret = do_fault_around(vmf, pgoff);
 		if (ret)
 			return ret;
 	}
 
-	ret = __do_fault(fe, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
-	ret |= alloc_set_pte(fe, NULL, fault_page);
-	if (fe->pte)
-		pte_unmap_unlock(fe->pte, fe->ptl);
+	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	if (vmf->pte)
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	unlock_page(fault_page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		put_page(fault_page);
 	return ret;
 }
 
-static int do_cow_fault(struct fault_env *fe, pgoff_t pgoff)
+static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page, *new_page;
 	void *fault_entry;
 	struct mem_cgroup *memcg;
@@ -3207,7 +3212,7 @@ static int do_cow_fault(struct fault_env *fe, pgoff_t pgoff)
 	if (unlikely(anon_vma_prepare(vma)))
 		return VM_FAULT_OOM;
 
-	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, fe->address);
+	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
 	if (!new_page)
 		return VM_FAULT_OOM;
 
@@ -3217,17 +3222,17 @@ static int do_cow_fault(struct fault_env *fe, pgoff_t pgoff)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(fe, pgoff, new_page, &fault_page, &fault_entry);
+	ret = __do_fault(vmf, pgoff, new_page, &fault_page, &fault_entry);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(new_page, fault_page, fe->address, vma);
+		copy_user_highpage(new_page, fault_page, vmf->address, vma);
 	__SetPageUptodate(new_page);
 
-	ret |= alloc_set_pte(fe, memcg, new_page);
-	if (fe->pte)
-		pte_unmap_unlock(fe->pte, fe->ptl);
+	ret |= alloc_set_pte(vmf, memcg, new_page);
+	if (vmf->pte)
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (!(ret & VM_FAULT_DAX_LOCKED)) {
 		unlock_page(fault_page);
 		put_page(fault_page);
@@ -3243,15 +3248,15 @@ uncharge_out:
 	return ret;
 }
 
-static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
+static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
 	struct address_space *mapping;
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(fe, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3261,7 +3266,7 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
 	 */
 	if (vma->vm_ops->page_mkwrite) {
 		unlock_page(fault_page);
-		tmp = do_page_mkwrite(vma, fault_page, fe->address);
+		tmp = do_page_mkwrite(vma, fault_page, vmf->address);
 		if (unlikely(!tmp ||
 				(tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			put_page(fault_page);
@@ -3269,9 +3274,9 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
 		}
 	}
 
-	ret |= alloc_set_pte(fe, NULL, fault_page);
-	if (fe->pte)
-		pte_unmap_unlock(fe->pte, fe->ptl);
+	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	if (vmf->pte)
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
 					VM_FAULT_RETRY))) {
 		unlock_page(fault_page);
@@ -3309,19 +3314,19 @@ static int do_shared_fault(struct fault_env *fe, pgoff_t pgoff)
  * The mmap_sem may have been released depending on flags and our
  * return value.  See filemap_fault() and __lock_page_or_retry().
  */
-static int do_fault(struct fault_env *fe)
+static int do_fault(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = fe->vma;
-	pgoff_t pgoff = linear_page_index(vma, fe->address);
+	struct vm_area_struct *vma = vmf->vma;
+	pgoff_t pgoff = linear_page_index(vma, vmf->address);
 
 	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
 	if (!vma->vm_ops->fault)
 		return VM_FAULT_SIGBUS;
-	if (!(fe->flags & FAULT_FLAG_WRITE))
-		return do_read_fault(fe, pgoff);
+	if (!(vmf->flags & FAULT_FLAG_WRITE))
+		return do_read_fault(vmf, pgoff);
 	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(fe, pgoff);
-	return do_shared_fault(fe, pgoff);
+		return do_cow_fault(vmf, pgoff);
+	return do_shared_fault(vmf, pgoff);
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
@@ -3339,9 +3344,9 @@ static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 	return mpol_misplaced(page, vma, addr);
 }
 
-static int do_numa_page(struct fault_env *fe, pte_t pte)
+static int do_numa_page(struct vm_fault *vmf, pte_t pte)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	struct page *page = NULL;
 	int page_nid = -1;
 	int last_cpupid;
@@ -3359,10 +3364,10 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
 	* page table entry is not accessible, so there would be no
 	* concurrent hardware modifications to the PTE.
 	*/
-	fe->ptl = pte_lockptr(vma->vm_mm, fe->pmd);
-	spin_lock(fe->ptl);
-	if (unlikely(!pte_same(*fe->pte, pte))) {
-		pte_unmap_unlock(fe->pte, fe->ptl);
+	vmf->ptl = pte_lockptr(vma->vm_mm, vmf->pmd);
+	spin_lock(vmf->ptl);
+	if (unlikely(!pte_same(*vmf->pte, pte))) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		goto out;
 	}
 
@@ -3371,18 +3376,18 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
 	pte = pte_mkyoung(pte);
 	if (was_writable)
 		pte = pte_mkwrite(pte);
-	set_pte_at(vma->vm_mm, fe->address, fe->pte, pte);
-	update_mmu_cache(vma, fe->address, fe->pte);
+	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
+	update_mmu_cache(vma, vmf->address, vmf->pte);
 
-	page = vm_normal_page(vma, fe->address, pte);
+	page = vm_normal_page(vma, vmf->address, pte);
 	if (!page) {
-		pte_unmap_unlock(fe->pte, fe->ptl);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		return 0;
 	}
 
 	/* TODO: handle PTE-mapped THP */
 	if (PageCompound(page)) {
-		pte_unmap_unlock(fe->pte, fe->ptl);
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		return 0;
 	}
 
@@ -3406,9 +3411,9 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
 
 	last_cpupid = page_cpupid_last(page);
 	page_nid = page_to_nid(page);
-	target_nid = numa_migrate_prep(page, vma, fe->address, page_nid,
+	target_nid = numa_migrate_prep(page, vma, vmf->address, page_nid,
 			&flags);
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (target_nid == -1) {
 		put_page(page);
 		goto out;
@@ -3428,28 +3433,28 @@ out:
 	return 0;
 }
 
-static int create_huge_pmd(struct fault_env *fe)
+static int create_huge_pmd(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = fe->vma;
+	struct vm_area_struct *vma = vmf->vma;
 	if (vma_is_anonymous(vma))
-		return do_huge_pmd_anonymous_page(fe);
+		return do_huge_pmd_anonymous_page(vmf);
 	if (vma->vm_ops->pmd_fault)
-		return vma->vm_ops->pmd_fault(vma, fe->address, fe->pmd,
-				fe->flags);
+		return vma->vm_ops->pmd_fault(vma, vmf->address, vmf->pmd,
+				vmf->flags);
 	return VM_FAULT_FALLBACK;
 }
 
-static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
+static int wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
 {
-	if (vma_is_anonymous(fe->vma))
-		return do_huge_pmd_wp_page(fe, orig_pmd);
-	if (fe->vma->vm_ops->pmd_fault)
-		return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd,
-				fe->flags);
+	if (vma_is_anonymous(vmf->vma))
+		return do_huge_pmd_wp_page(vmf, orig_pmd);
+	if (vmf->vma->vm_ops->pmd_fault)
+		return vmf->vma->vm_ops->pmd_fault(vmf->vma, vmf->address,
+				vmf->pmd, vmf->flags);
 
 	/* COW handled on pte level: split pmd */
-	VM_BUG_ON_VMA(fe->vma->vm_flags & VM_SHARED, fe->vma);
-	split_huge_pmd(fe->vma, fe->pmd, fe->address);
+	VM_BUG_ON_VMA(vmf->vma->vm_flags & VM_SHARED, vmf->vma);
+	split_huge_pmd(vmf->vma, vmf->pmd, vmf->address);
 
 	return VM_FAULT_FALLBACK;
 }
@@ -3474,21 +3479,21 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
  * The mmap_sem may have been released depending on flags and our return value.
  * See filemap_fault() and __lock_page_or_retry().
  */
-static int handle_pte_fault(struct fault_env *fe)
+static int handle_pte_fault(struct vm_fault *vmf)
 {
 	pte_t entry;
 
-	if (unlikely(pmd_none(*fe->pmd))) {
+	if (unlikely(pmd_none(*vmf->pmd))) {
 		/*
 		 * Leave __pte_alloc() until later: because vm_ops->fault may
 		 * want to allocate huge page, and if we expose page table
 		 * for an instant, it will be difficult to retract from
 		 * concurrent faults and from rmap lookups.
 		 */
-		fe->pte = NULL;
+		vmf->pte = NULL;
 	} else {
 		/* See comment in pte_alloc_one_map() */
-		if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+		if (pmd_trans_unstable(vmf->pmd) || pmd_devmap(*vmf->pmd))
 			return 0;
 		/*
 		 * A regular pmd is established and it can't morph into a huge
@@ -3496,9 +3501,9 @@ static int handle_pte_fault(struct fault_env *fe)
 		 * mmap_sem read mode and khugepaged takes it in write mode.
 		 * So now it's safe to run pte_offset_map().
 		 */
-		fe->pte = pte_offset_map(fe->pmd, fe->address);
+		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
 
-		entry = *fe->pte;
+		entry = *vmf->pte;
 
 		/*
 		 * some architectures can have larger ptes than wordsize,
@@ -3510,37 +3515,37 @@ static int handle_pte_fault(struct fault_env *fe)
 		 */
 		barrier();
 		if (pte_none(entry)) {
-			pte_unmap(fe->pte);
-			fe->pte = NULL;
+			pte_unmap(vmf->pte);
+			vmf->pte = NULL;
 		}
 	}
 
-	if (!fe->pte) {
-		if (vma_is_anonymous(fe->vma))
-			return do_anonymous_page(fe);
+	if (!vmf->pte) {
+		if (vma_is_anonymous(vmf->vma))
+			return do_anonymous_page(vmf);
 		else
-			return do_fault(fe);
+			return do_fault(vmf);
 	}
 
 	if (!pte_present(entry))
-		return do_swap_page(fe, entry);
+		return do_swap_page(vmf, entry);
 
-	if (pte_protnone(entry) && vma_is_accessible(fe->vma))
-		return do_numa_page(fe, entry);
+	if (pte_protnone(entry) && vma_is_accessible(vmf->vma))
+		return do_numa_page(vmf, entry);
 
-	fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
-	spin_lock(fe->ptl);
-	if (unlikely(!pte_same(*fe->pte, entry)))
+	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+	spin_lock(vmf->ptl);
+	if (unlikely(!pte_same(*vmf->pte, entry)))
 		goto unlock;
-	if (fe->flags & FAULT_FLAG_WRITE) {
+	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
-			return do_wp_page(fe, entry);
+			return do_wp_page(vmf, entry);
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-	if (ptep_set_access_flags(fe->vma, fe->address, fe->pte, entry,
-				fe->flags & FAULT_FLAG_WRITE)) {
-		update_mmu_cache(fe->vma, fe->address, fe->pte);
+	if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
+				vmf->flags & FAULT_FLAG_WRITE)) {
+		update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
 	} else {
 		/*
 		 * This is needed only for protection faults but the arch code
@@ -3548,11 +3553,11 @@ static int handle_pte_fault(struct fault_env *fe)
 		 * This still avoids useless tlb flushes for .text page faults
 		 * with threads.
 		 */
-		if (fe->flags & FAULT_FLAG_WRITE)
-			flush_tlb_fix_spurious_fault(fe->vma, fe->address);
+		if (vmf->flags & FAULT_FLAG_WRITE)
+			flush_tlb_fix_spurious_fault(vmf->vma, vmf->address);
 	}
 unlock:
-	pte_unmap_unlock(fe->pte, fe->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return 0;
 }
 
@@ -3565,7 +3570,7 @@ unlock:
 static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags)
 {
-	struct fault_env fe = {
+	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address,
 		.flags = flags,
@@ -3578,35 +3583,35 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	pud = pud_alloc(mm, pgd, address);
 	if (!pud)
 		return VM_FAULT_OOM;
-	fe.pmd = pmd_alloc(mm, pud, address);
-	if (!fe.pmd)
+	vmf.pmd = pmd_alloc(mm, pud, address);
+	if (!vmf.pmd)
 		return VM_FAULT_OOM;
-	if (pmd_none(*fe.pmd) && transparent_hugepage_enabled(vma)) {
-		int ret = create_huge_pmd(&fe);
+	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
+		int ret = create_huge_pmd(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
 			return ret;
 	} else {
-		pmd_t orig_pmd = *fe.pmd;
+		pmd_t orig_pmd = *vmf.pmd;
 		int ret;
 
 		barrier();
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
 			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
-				return do_huge_pmd_numa_page(&fe, orig_pmd);
+				return do_huge_pmd_numa_page(&vmf, orig_pmd);
 
-			if ((fe.flags & FAULT_FLAG_WRITE) &&
+			if ((vmf.flags & FAULT_FLAG_WRITE) &&
 					!pmd_write(orig_pmd)) {
-				ret = wp_huge_pmd(&fe, orig_pmd);
+				ret = wp_huge_pmd(&vmf, orig_pmd);
 				if (!(ret & VM_FAULT_FALLBACK))
 					return ret;
 			} else {
-				huge_pmd_set_accessed(&fe, orig_pmd);
+				huge_pmd_set_accessed(&vmf, orig_pmd);
 				return 0;
 			}
 		}
 	}
 
-	return handle_pte_fault(&fe);
+	return handle_pte_fault(&vmf);
 }
 
 /*
diff --git a/mm/nommu.c b/mm/nommu.c
index 95daf81a4855..24965ffde318 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1809,7 +1809,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(filemap_fault);
 
-void filemap_map_pages(struct fault_env *fe,
+void filemap_map_pages(struct vm_fault *vmf,
 		pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
 	BUG();
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

struct vm_fault has already pgoff entry. Use it instead of passing pgoff
as a separate argument and then assigning it later.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 447a1ef4a9e3..4c2ec9a9d8af 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
 		struct vm_fault vmf2 = {
 			.page = NULL,
-			.pgoff = linear_page_index(vma, vmf->address),
+			.pgoff = vmf->pgoff,
 			.virtual_address = vmf->address & PAGE_MASK,
 			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
 		};
@@ -2844,15 +2844,15 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct vm_fault *vmf, pgoff_t pgoff,
-		struct page *cow_page, struct page **page, void **entry)
+static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
+		      struct page **page, void **entry)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct vm_fault vmf2;
 	int ret;
 
 	vmf2.virtual_address = vmf->address & PAGE_MASK;
-	vmf2.pgoff = pgoff;
+	vmf2.pgoff = vmf->pgoff;
 	vmf2.flags = vmf->flags;
 	vmf2.page = NULL;
 	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
@@ -3111,9 +3111,10 @@ late_initcall(fault_around_debugfs);
  * fault_around_pages() value (and therefore to page order).  This way it's
  * easier to guarantee that we don't cross page table boundaries.
  */
-static int do_fault_around(struct vm_fault *vmf, pgoff_t start_pgoff)
+static int do_fault_around(struct vm_fault *vmf)
 {
 	unsigned long address = vmf->address, nr_pages, mask;
+	pgoff_t start_pgoff = vmf->pgoff;
 	pgoff_t end_pgoff;
 	int off, ret = 0;
 
@@ -3171,7 +3172,7 @@ out:
 	return ret;
 }
 
-static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_read_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3183,12 +3184,12 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(vmf, pgoff);
+		ret = do_fault_around(vmf);
 		if (ret)
 			return ret;
 	}
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3201,7 +3202,7 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	return ret;
 }
 
-static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_cow_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page, *new_page;
@@ -3222,7 +3223,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(vmf, pgoff, new_page, &fault_page, &fault_entry);
+	ret = __do_fault(vmf, new_page, &fault_page, &fault_entry);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
@@ -3237,7 +3238,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		unlock_page(fault_page);
 		put_page(fault_page);
 	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, pgoff);
+		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
 	}
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
@@ -3248,7 +3249,7 @@ uncharge_out:
 	return ret;
 }
 
-static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3256,7 +3257,7 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3317,16 +3318,15 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 static int do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	pgoff_t pgoff = linear_page_index(vma, vmf->address);
 
 	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
 	if (!vma->vm_ops->fault)
 		return VM_FAULT_SIGBUS;
 	if (!(vmf->flags & FAULT_FLAG_WRITE))
-		return do_read_fault(vmf, pgoff);
+		return do_read_fault(vmf);
 	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(vmf, pgoff);
-	return do_shared_fault(vmf, pgoff);
+		return do_cow_fault(vmf);
+	return do_shared_fault(vmf);
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
@@ -3574,6 +3574,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 		.vma = vma,
 		.address = address,
 		.flags = flags,
+		.pgoff = linear_page_index(vma, address),
 	};
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

struct vm_fault has already pgoff entry. Use it instead of passing pgoff
as a separate argument and then assigning it later.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 447a1ef4a9e3..4c2ec9a9d8af 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
 		struct vm_fault vmf2 = {
 			.page = NULL,
-			.pgoff = linear_page_index(vma, vmf->address),
+			.pgoff = vmf->pgoff,
 			.virtual_address = vmf->address & PAGE_MASK,
 			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
 		};
@@ -2844,15 +2844,15 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct vm_fault *vmf, pgoff_t pgoff,
-		struct page *cow_page, struct page **page, void **entry)
+static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
+		      struct page **page, void **entry)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct vm_fault vmf2;
 	int ret;
 
 	vmf2.virtual_address = vmf->address & PAGE_MASK;
-	vmf2.pgoff = pgoff;
+	vmf2.pgoff = vmf->pgoff;
 	vmf2.flags = vmf->flags;
 	vmf2.page = NULL;
 	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
@@ -3111,9 +3111,10 @@ late_initcall(fault_around_debugfs);
  * fault_around_pages() value (and therefore to page order).  This way it's
  * easier to guarantee that we don't cross page table boundaries.
  */
-static int do_fault_around(struct vm_fault *vmf, pgoff_t start_pgoff)
+static int do_fault_around(struct vm_fault *vmf)
 {
 	unsigned long address = vmf->address, nr_pages, mask;
+	pgoff_t start_pgoff = vmf->pgoff;
 	pgoff_t end_pgoff;
 	int off, ret = 0;
 
@@ -3171,7 +3172,7 @@ out:
 	return ret;
 }
 
-static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_read_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3183,12 +3184,12 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(vmf, pgoff);
+		ret = do_fault_around(vmf);
 		if (ret)
 			return ret;
 	}
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3201,7 +3202,7 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	return ret;
 }
 
-static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_cow_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page, *new_page;
@@ -3222,7 +3223,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(vmf, pgoff, new_page, &fault_page, &fault_entry);
+	ret = __do_fault(vmf, new_page, &fault_page, &fault_entry);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
@@ -3237,7 +3238,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		unlock_page(fault_page);
 		put_page(fault_page);
 	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, pgoff);
+		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
 	}
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
@@ -3248,7 +3249,7 @@ uncharge_out:
 	return ret;
 }
 
-static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3256,7 +3257,7 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3317,16 +3318,15 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 static int do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	pgoff_t pgoff = linear_page_index(vma, vmf->address);
 
 	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
 	if (!vma->vm_ops->fault)
 		return VM_FAULT_SIGBUS;
 	if (!(vmf->flags & FAULT_FLAG_WRITE))
-		return do_read_fault(vmf, pgoff);
+		return do_read_fault(vmf);
 	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(vmf, pgoff);
-	return do_shared_fault(vmf, pgoff);
+		return do_cow_fault(vmf);
+	return do_shared_fault(vmf);
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
@@ -3574,6 +3574,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 		.vma = vma,
 		.address = address,
 		.flags = flags,
+		.pgoff = linear_page_index(vma, address),
 	};
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

struct vm_fault has already pgoff entry. Use it instead of passing pgoff
as a separate argument and then assigning it later.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 447a1ef4a9e3..4c2ec9a9d8af 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
 		struct vm_fault vmf2 = {
 			.page = NULL,
-			.pgoff = linear_page_index(vma, vmf->address),
+			.pgoff = vmf->pgoff,
 			.virtual_address = vmf->address & PAGE_MASK,
 			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
 		};
@@ -2844,15 +2844,15 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct vm_fault *vmf, pgoff_t pgoff,
-		struct page *cow_page, struct page **page, void **entry)
+static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
+		      struct page **page, void **entry)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct vm_fault vmf2;
 	int ret;
 
 	vmf2.virtual_address = vmf->address & PAGE_MASK;
-	vmf2.pgoff = pgoff;
+	vmf2.pgoff = vmf->pgoff;
 	vmf2.flags = vmf->flags;
 	vmf2.page = NULL;
 	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
@@ -3111,9 +3111,10 @@ late_initcall(fault_around_debugfs);
  * fault_around_pages() value (and therefore to page order).  This way it's
  * easier to guarantee that we don't cross page table boundaries.
  */
-static int do_fault_around(struct vm_fault *vmf, pgoff_t start_pgoff)
+static int do_fault_around(struct vm_fault *vmf)
 {
 	unsigned long address = vmf->address, nr_pages, mask;
+	pgoff_t start_pgoff = vmf->pgoff;
 	pgoff_t end_pgoff;
 	int off, ret = 0;
 
@@ -3171,7 +3172,7 @@ out:
 	return ret;
 }
 
-static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_read_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3183,12 +3184,12 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(vmf, pgoff);
+		ret = do_fault_around(vmf);
 		if (ret)
 			return ret;
 	}
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3201,7 +3202,7 @@ static int do_read_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	return ret;
 }
 
-static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_cow_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page, *new_page;
@@ -3222,7 +3223,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(vmf, pgoff, new_page, &fault_page, &fault_entry);
+	ret = __do_fault(vmf, new_page, &fault_page, &fault_entry);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
@@ -3237,7 +3238,7 @@ static int do_cow_fault(struct vm_fault *vmf, pgoff_t pgoff)
 		unlock_page(fault_page);
 		put_page(fault_page);
 	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, pgoff);
+		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
 	}
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
@@ -3248,7 +3249,7 @@ uncharge_out:
 	return ret;
 }
 
-static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
+static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *fault_page;
@@ -3256,7 +3257,7 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(vmf, pgoff, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf, NULL, &fault_page, NULL);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3317,16 +3318,15 @@ static int do_shared_fault(struct vm_fault *vmf, pgoff_t pgoff)
 static int do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	pgoff_t pgoff = linear_page_index(vma, vmf->address);
 
 	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
 	if (!vma->vm_ops->fault)
 		return VM_FAULT_SIGBUS;
 	if (!(vmf->flags & FAULT_FLAG_WRITE))
-		return do_read_fault(vmf, pgoff);
+		return do_read_fault(vmf);
 	if (!(vma->vm_flags & VM_SHARED))
-		return do_cow_fault(vmf, pgoff);
-	return do_shared_fault(vmf, pgoff);
+		return do_cow_fault(vmf);
+	return do_shared_fault(vmf);
 }
 
 static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
@@ -3574,6 +3574,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 		.vma = vma,
 		.address = address,
 		.flags = flags,
+		.pgoff = linear_page_index(vma, address),
 	};
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 04/20] mm: Use passed vm_fault structure in __do_fault()
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Instead of creating another vm_fault structure, use the one passed to
__do_fault() for passing arguments into fault handler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 4c2ec9a9d8af..b7f1f535e079 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,37 +2848,31 @@ static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
 		      struct page **page, void **entry)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct vm_fault vmf2;
 	int ret;
 
-	vmf2.virtual_address = vmf->address & PAGE_MASK;
-	vmf2.pgoff = vmf->pgoff;
-	vmf2.flags = vmf->flags;
-	vmf2.page = NULL;
-	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
-	vmf2.cow_page = cow_page;
+	vmf->cow_page = cow_page;
 
-	ret = vma->vm_ops->fault(vma, &vmf2);
+	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 	if (ret & VM_FAULT_DAX_LOCKED) {
-		*entry = vmf2.entry;
+		*entry = vmf->entry;
 		return ret;
 	}
 
-	if (unlikely(PageHWPoison(vmf2.page))) {
+	if (unlikely(PageHWPoison(vmf->page))) {
 		if (ret & VM_FAULT_LOCKED)
-			unlock_page(vmf2.page);
-		put_page(vmf2.page);
+			unlock_page(vmf->page);
+		put_page(vmf->page);
 		return VM_FAULT_HWPOISON;
 	}
 
 	if (unlikely(!(ret & VM_FAULT_LOCKED)))
-		lock_page(vmf2.page);
+		lock_page(vmf->page);
 	else
-		VM_BUG_ON_PAGE(!PageLocked(vmf2.page), vmf2.page);
+		VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page);
 
-	*page = vmf2.page;
+	*page = vmf->page;
 	return ret;
 }
 
@@ -3573,8 +3567,10 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address,
+		.virtual_address = address & PAGE_MASK,
 		.flags = flags,
 		.pgoff = linear_page_index(vma, address),
+		.gfp_mask = __get_fault_gfp_mask(vma),
 	};
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 04/20] mm: Use passed vm_fault structure in __do_fault()
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Instead of creating another vm_fault structure, use the one passed to
__do_fault() for passing arguments into fault handler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 4c2ec9a9d8af..b7f1f535e079 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,37 +2848,31 @@ static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
 		      struct page **page, void **entry)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct vm_fault vmf2;
 	int ret;
 
-	vmf2.virtual_address = vmf->address & PAGE_MASK;
-	vmf2.pgoff = vmf->pgoff;
-	vmf2.flags = vmf->flags;
-	vmf2.page = NULL;
-	vmf2.gfp_mask = __get_fault_gfp_mask(vma);
-	vmf2.cow_page = cow_page;
+	vmf->cow_page = cow_page;
 
-	ret = vma->vm_ops->fault(vma, &vmf2);
+	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 	if (ret & VM_FAULT_DAX_LOCKED) {
-		*entry = vmf2.entry;
+		*entry = vmf->entry;
 		return ret;
 	}
 
-	if (unlikely(PageHWPoison(vmf2.page))) {
+	if (unlikely(PageHWPoison(vmf->page))) {
 		if (ret & VM_FAULT_LOCKED)
-			unlock_page(vmf2.page);
-		put_page(vmf2.page);
+			unlock_page(vmf->page);
+		put_page(vmf->page);
 		return VM_FAULT_HWPOISON;
 	}
 
 	if (unlikely(!(ret & VM_FAULT_LOCKED)))
-		lock_page(vmf2.page);
+		lock_page(vmf->page);
 	else
-		VM_BUG_ON_PAGE(!PageLocked(vmf2.page), vmf2.page);
+		VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page);
 
-	*page = vmf2.page;
+	*page = vmf->page;
 	return ret;
 }
 
@@ -3573,8 +3567,10 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 	struct vm_fault vmf = {
 		.vma = vma,
 		.address = address,
+		.virtual_address = address & PAGE_MASK,
 		.flags = flags,
 		.pgoff = linear_page_index(vma, address),
+		.gfp_mask = __get_fault_gfp_mask(vma),
 	};
 	struct mm_struct *mm = vma->vm_mm;
 	pgd_t *pgd;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 05/20] mm: Trim __do_fault() arguments
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Use vm_fault structure to pass cow_page, page, and entry in and out of
the function. That reduces number of __do_fault() arguments from 4 to 1.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 53 +++++++++++++++++++++++------------------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index b7f1f535e079..ba7760fb7db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2844,26 +2844,22 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
-		      struct page **page, void **entry)
+static int __do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	int ret;
 
-	vmf->cow_page = cow_page;
-
 	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
-	if (ret & VM_FAULT_DAX_LOCKED) {
-		*entry = vmf->entry;
+	if (ret & VM_FAULT_DAX_LOCKED)
 		return ret;
-	}
 
 	if (unlikely(PageHWPoison(vmf->page))) {
 		if (ret & VM_FAULT_LOCKED)
 			unlock_page(vmf->page);
 		put_page(vmf->page);
+		vmf->page = NULL;
 		return VM_FAULT_HWPOISON;
 	}
 
@@ -2872,7 +2868,6 @@ static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
 	else
 		VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page);
 
-	*page = vmf->page;
 	return ret;
 }
 
@@ -3169,7 +3164,6 @@ out:
 static int do_read_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page;
 	int ret = 0;
 
 	/*
@@ -3183,24 +3177,23 @@ static int do_read_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
-	ret = __do_fault(vmf, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
-	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	ret |= alloc_set_pte(vmf, NULL, vmf->page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-	unlock_page(fault_page);
+	unlock_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-		put_page(fault_page);
+		put_page(vmf->page);
 	return ret;
 }
 
 static int do_cow_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page, *new_page;
-	void *fault_entry;
+	struct page *new_page;
 	struct mem_cgroup *memcg;
 	int ret;
 
@@ -3217,20 +3210,21 @@ static int do_cow_fault(struct vm_fault *vmf)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(vmf, new_page, &fault_page, &fault_entry);
+	vmf->cow_page = new_page;
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(new_page, fault_page, vmf->address, vma);
+		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(new_page);
 
 	ret |= alloc_set_pte(vmf, memcg, new_page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (!(ret & VM_FAULT_DAX_LOCKED)) {
-		unlock_page(fault_page);
-		put_page(fault_page);
+		unlock_page(vmf->page);
+		put_page(vmf->page);
 	} else {
 		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
 	}
@@ -3246,12 +3240,11 @@ uncharge_out:
 static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page;
 	struct address_space *mapping;
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(vmf, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3260,26 +3253,26 @@ static int do_shared_fault(struct vm_fault *vmf)
 	 * about to become writable
 	 */
 	if (vma->vm_ops->page_mkwrite) {
-		unlock_page(fault_page);
-		tmp = do_page_mkwrite(vma, fault_page, vmf->address);
+		unlock_page(vmf->page);
+		tmp = do_page_mkwrite(vma, vmf->page, vmf->address);
 		if (unlikely(!tmp ||
 				(tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-			put_page(fault_page);
+			put_page(vmf->page);
 			return tmp;
 		}
 	}
 
-	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	ret |= alloc_set_pte(vmf, NULL, vmf->page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
 					VM_FAULT_RETRY))) {
-		unlock_page(fault_page);
-		put_page(fault_page);
+		unlock_page(vmf->page);
+		put_page(vmf->page);
 		return ret;
 	}
 
-	if (set_page_dirty(fault_page))
+	if (set_page_dirty(vmf->page))
 		dirtied = 1;
 	/*
 	 * Take a local copy of the address_space - page.mapping may be zeroed
@@ -3287,8 +3280,8 @@ static int do_shared_fault(struct vm_fault *vmf)
 	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
 	 * release semantics to prevent the compiler from undoing this copying.
 	 */
-	mapping = page_rmapping(fault_page);
-	unlock_page(fault_page);
+	mapping = page_rmapping(vmf->page);
+	unlock_page(vmf->page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
 		/*
 		 * Some device drivers do not set page.mapping but still
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 05/20] mm: Trim __do_fault() arguments
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Use vm_fault structure to pass cow_page, page, and entry in and out of
the function. That reduces number of __do_fault() arguments from 4 to 1.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 53 +++++++++++++++++++++++------------------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index b7f1f535e079..ba7760fb7db2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2844,26 +2844,22 @@ oom:
  * released depending on flags and vma->vm_ops->fault() return value.
  * See filemap_fault() and __lock_page_retry().
  */
-static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
-		      struct page **page, void **entry)
+static int __do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	int ret;
 
-	vmf->cow_page = cow_page;
-
 	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
-	if (ret & VM_FAULT_DAX_LOCKED) {
-		*entry = vmf->entry;
+	if (ret & VM_FAULT_DAX_LOCKED)
 		return ret;
-	}
 
 	if (unlikely(PageHWPoison(vmf->page))) {
 		if (ret & VM_FAULT_LOCKED)
 			unlock_page(vmf->page);
 		put_page(vmf->page);
+		vmf->page = NULL;
 		return VM_FAULT_HWPOISON;
 	}
 
@@ -2872,7 +2868,6 @@ static int __do_fault(struct vm_fault *vmf, struct page *cow_page,
 	else
 		VM_BUG_ON_PAGE(!PageLocked(vmf->page), vmf->page);
 
-	*page = vmf->page;
 	return ret;
 }
 
@@ -3169,7 +3164,6 @@ out:
 static int do_read_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page;
 	int ret = 0;
 
 	/*
@@ -3183,24 +3177,23 @@ static int do_read_fault(struct vm_fault *vmf)
 			return ret;
 	}
 
-	ret = __do_fault(vmf, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
-	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	ret |= alloc_set_pte(vmf, NULL, vmf->page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-	unlock_page(fault_page);
+	unlock_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-		put_page(fault_page);
+		put_page(vmf->page);
 	return ret;
 }
 
 static int do_cow_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page, *new_page;
-	void *fault_entry;
+	struct page *new_page;
 	struct mem_cgroup *memcg;
 	int ret;
 
@@ -3217,20 +3210,21 @@ static int do_cow_fault(struct vm_fault *vmf)
 		return VM_FAULT_OOM;
 	}
 
-	ret = __do_fault(vmf, new_page, &fault_page, &fault_entry);
+	vmf->cow_page = new_page;
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(new_page, fault_page, vmf->address, vma);
+		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(new_page);
 
 	ret |= alloc_set_pte(vmf, memcg, new_page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (!(ret & VM_FAULT_DAX_LOCKED)) {
-		unlock_page(fault_page);
-		put_page(fault_page);
+		unlock_page(vmf->page);
+		put_page(vmf->page);
 	} else {
 		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
 	}
@@ -3246,12 +3240,11 @@ uncharge_out:
 static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *fault_page;
 	struct address_space *mapping;
 	int dirtied = 0;
 	int ret, tmp;
 
-	ret = __do_fault(vmf, NULL, &fault_page, NULL);
+	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
@@ -3260,26 +3253,26 @@ static int do_shared_fault(struct vm_fault *vmf)
 	 * about to become writable
 	 */
 	if (vma->vm_ops->page_mkwrite) {
-		unlock_page(fault_page);
-		tmp = do_page_mkwrite(vma, fault_page, vmf->address);
+		unlock_page(vmf->page);
+		tmp = do_page_mkwrite(vma, vmf->page, vmf->address);
 		if (unlikely(!tmp ||
 				(tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-			put_page(fault_page);
+			put_page(vmf->page);
 			return tmp;
 		}
 	}
 
-	ret |= alloc_set_pte(vmf, NULL, fault_page);
+	ret |= alloc_set_pte(vmf, NULL, vmf->page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
 					VM_FAULT_RETRY))) {
-		unlock_page(fault_page);
-		put_page(fault_page);
+		unlock_page(vmf->page);
+		put_page(vmf->page);
 		return ret;
 	}
 
-	if (set_page_dirty(fault_page))
+	if (set_page_dirty(vmf->page))
 		dirtied = 1;
 	/*
 	 * Take a local copy of the address_space - page.mapping may be zeroed
@@ -3287,8 +3280,8 @@ static int do_shared_fault(struct vm_fault *vmf)
 	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
 	 * release semantics to prevent the compiler from undoing this copying.
 	 */
-	mapping = page_rmapping(fault_page);
-	unlock_page(fault_page);
+	mapping = page_rmapping(vmf->page);
+	unlock_page(vmf->page);
 	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
 		/*
 		 * Some device drivers do not set page.mapping but still
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared()
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Instead of creating another vm_fault structure, use the one passed to
wp_pfn_shared() for passing arguments into pfn_mkwrite handler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ba7760fb7db2..48de8187d7b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2273,16 +2273,11 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	struct vm_area_struct *vma = vmf->vma;
 
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
-		struct vm_fault vmf2 = {
-			.page = NULL,
-			.pgoff = vmf->pgoff,
-			.virtual_address = vmf->address & PAGE_MASK,
-			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
-		};
 		int ret;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		ret = vma->vm_ops->pfn_mkwrite(vma, &vmf2);
+		vmf->flags |= FAULT_FLAG_MKWRITE;
+		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared()
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Instead of creating another vm_fault structure, use the one passed to
wp_pfn_shared() for passing arguments into pfn_mkwrite handler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ba7760fb7db2..48de8187d7b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2273,16 +2273,11 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	struct vm_area_struct *vma = vmf->vma;
 
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
-		struct vm_fault vmf2 = {
-			.page = NULL,
-			.pgoff = vmf->pgoff,
-			.virtual_address = vmf->address & PAGE_MASK,
-			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
-		};
 		int ret;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		ret = vma->vm_ops->pfn_mkwrite(vma, &vmf2);
+		vmf->flags |= FAULT_FLAG_MKWRITE;
+		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared()
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Instead of creating another vm_fault structure, use the one passed to
wp_pfn_shared() for passing arguments into pfn_mkwrite handler.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ba7760fb7db2..48de8187d7b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2273,16 +2273,11 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 	struct vm_area_struct *vma = vmf->vma;
 
 	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
-		struct vm_fault vmf2 = {
-			.page = NULL,
-			.pgoff = vmf->pgoff,
-			.virtual_address = vmf->address & PAGE_MASK,
-			.flags = FAULT_FLAG_WRITE | FAULT_FLAG_MKWRITE,
-		};
 		int ret;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		ret = vma->vm_ops->pfn_mkwrite(vma, &vmf2);
+		vmf->flags |= FAULT_FLAG_MKWRITE;
+		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 07/20] mm: Add orig_pte field into vm_fault
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (5 preceding siblings ...)
  2016-09-27 16:08   ` Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-17 16:45     ` Ross Zwisler
  2016-09-27 16:08   ` Jan Kara
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Add orig_pte field to vm_fault structure to allow ->page_mkwrite
handlers to fully handle the fault. This also allows us to save some
passing of extra arguments around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |  4 +--
 mm/internal.h      |  2 +-
 mm/khugepaged.c    |  5 ++--
 mm/memory.c        | 76 +++++++++++++++++++++++++++---------------------------
 4 files changed, 44 insertions(+), 43 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5fc6daf5242c..c908fd7243ea 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -300,8 +300,8 @@ struct vm_fault {
 	unsigned long virtual_address;	/* Faulting virtual address masked by
 					 * PAGE_MASK */
 	pmd_t *pmd;			/* Pointer to pmd entry matching
-					 * the 'address'
-					 */
+					 * the 'address' */
+	pte_t orig_pte;			/* Value of PTE at the time of fault */
 
 	struct page *cow_page;		/* Handler may choose to COW */
 	struct page *page;		/* ->fault handlers should return a
diff --git a/mm/internal.h b/mm/internal.h
index cc80060914f6..7c7421da5d63 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -36,7 +36,7 @@
 /* Do not use these with a slab allocator */
 #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
 
-int do_swap_page(struct vm_fault *vmf, pte_t orig_pte);
+int do_swap_page(struct vm_fault *vmf);
 
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
 		unsigned long floor, unsigned long ceiling);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f88b2d3810a7..66bc77f2d1d2 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -890,11 +890,12 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 	vmf.pte = pte_offset_map(pmd, address);
 	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
 			vmf.pte++, vmf.address += PAGE_SIZE) {
-		pteval = *vmf.pte;
+		vmf.orig_pte = *vmf.pte;
+		pteval = vmf.orig_pte;
 		if (!is_swap_pte(pteval))
 			continue;
 		swapped_in++;
-		ret = do_swap_page(&vmf, pteval);
+		ret = do_swap_page(&vmf);
 
 		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
 		if (ret & VM_FAULT_RETRY) {
diff --git a/mm/memory.c b/mm/memory.c
index 48de8187d7b2..0c8779c23925 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2070,8 +2070,8 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct vm_fault *vmf, pte_t orig_pte,
-			struct page *page, int page_mkwrite, int dirty_shared)
+static inline int wp_page_reuse(struct vm_fault *vmf, struct page *page,
+				int page_mkwrite, int dirty_shared)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -2084,8 +2084,8 @@ static inline int wp_page_reuse(struct vm_fault *vmf, pte_t orig_pte,
 	if (page)
 		page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1);
 
-	flush_cache_page(vma, vmf->address, pte_pfn(orig_pte));
-	entry = pte_mkyoung(orig_pte);
+	flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
+	entry = pte_mkyoung(vmf->orig_pte);
 	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
@@ -2135,8 +2135,7 @@ static inline int wp_page_reuse(struct vm_fault *vmf, pte_t orig_pte,
  *   held to the old page, as well as updating the rmap.
  * - In any case, unlock the PTL and drop the reference we took to the old page.
  */
-static int wp_page_copy(struct vm_fault *vmf, pte_t orig_pte,
-		struct page *old_page)
+static int wp_page_copy(struct vm_fault *vmf, struct page *old_page)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
@@ -2150,7 +2149,7 @@ static int wp_page_copy(struct vm_fault *vmf, pte_t orig_pte,
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
 
-	if (is_zero_pfn(pte_pfn(orig_pte))) {
+	if (is_zero_pfn(pte_pfn(vmf->orig_pte))) {
 		new_page = alloc_zeroed_user_highpage_movable(vma,
 							      vmf->address);
 		if (!new_page)
@@ -2174,7 +2173,7 @@ static int wp_page_copy(struct vm_fault *vmf, pte_t orig_pte,
 	 * Re-check the pte - we dropped the lock
 	 */
 	vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl);
-	if (likely(pte_same(*vmf->pte, orig_pte))) {
+	if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
 				dec_mm_counter_fast(mm,
@@ -2184,7 +2183,7 @@ static int wp_page_copy(struct vm_fault *vmf, pte_t orig_pte,
 		} else {
 			inc_mm_counter_fast(mm, MM_ANONPAGES);
 		}
-		flush_cache_page(vma, vmf->address, pte_pfn(orig_pte));
+		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		/*
@@ -2268,7 +2267,7 @@ oom:
  * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED
  * mapping
  */
-static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
+static int wp_pfn_shared(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 
@@ -2286,16 +2285,15 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
 		 * We might have raced with another page fault while we
 		 * released the pte_offset_map_lock.
 		 */
-		if (!pte_same(*vmf->pte, orig_pte)) {
+		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			return 0;
 		}
 	}
-	return wp_page_reuse(vmf, orig_pte, NULL, 0, 0);
+	return wp_page_reuse(vmf, NULL, 0, 0);
 }
 
-static int wp_page_shared(struct vm_fault *vmf, pte_t orig_pte,
-		struct page *old_page)
+static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -2321,7 +2319,7 @@ static int wp_page_shared(struct vm_fault *vmf, pte_t orig_pte,
 		 */
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 						vmf->address, &vmf->ptl);
-		if (!pte_same(*vmf->pte, orig_pte)) {
+		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
 			unlock_page(old_page);
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			put_page(old_page);
@@ -2330,7 +2328,7 @@ static int wp_page_shared(struct vm_fault *vmf, pte_t orig_pte,
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(vmf, orig_pte, old_page, page_mkwrite, 1);
+	return wp_page_reuse(vmf, old_page, page_mkwrite, 1);
 }
 
 /*
@@ -2351,13 +2349,13 @@ static int wp_page_shared(struct vm_fault *vmf, pte_t orig_pte,
  * but allow concurrent faults), with pte both mapped and locked.
  * We return with mmap_sem still held, but pte unmapped and unlocked.
  */
-static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
+static int do_wp_page(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *old_page;
 
-	old_page = vm_normal_page(vma, vmf->address, orig_pte);
+	old_page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
 	if (!old_page) {
 		/*
 		 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
@@ -2368,10 +2366,10 @@ static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
 		 */
 		if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 				     (VM_WRITE|VM_SHARED))
-			return wp_pfn_shared(vmf, orig_pte);
+			return wp_pfn_shared(vmf);
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		return wp_page_copy(vmf, orig_pte, old_page);
+		return wp_page_copy(vmf, old_page);
 	}
 
 	/*
@@ -2386,7 +2384,7 @@ static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
 			lock_page(old_page);
 			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					vmf->address, &vmf->ptl);
-			if (!pte_same(*vmf->pte, orig_pte)) {
+			if (!pte_same(*vmf->pte, vmf->orig_pte)) {
 				unlock_page(old_page);
 				pte_unmap_unlock(vmf->pte, vmf->ptl);
 				put_page(old_page);
@@ -2406,12 +2404,12 @@ static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
 				page_move_anon_rmap(old_page, vma);
 			}
 			unlock_page(old_page);
-			return wp_page_reuse(vmf, orig_pte, old_page, 0, 0);
+			return wp_page_reuse(vmf, old_page, 0, 0);
 		}
 		unlock_page(old_page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		return wp_page_shared(vmf, orig_pte, old_page);
+		return wp_page_shared(vmf, old_page);
 	}
 
 	/*
@@ -2420,7 +2418,7 @@ static int do_wp_page(struct vm_fault *vmf, pte_t orig_pte)
 	get_page(old_page);
 
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-	return wp_page_copy(vmf, orig_pte, old_page);
+	return wp_page_copy(vmf, old_page);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
@@ -2508,7 +2506,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
  * We return with the mmap_sem locked or unlocked in the same cases
  * as does filemap_fault().
  */
-int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
+int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *page, *swapcache;
@@ -2519,10 +2517,10 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 	int exclusive = 0;
 	int ret = 0;
 
-	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, orig_pte))
+	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
 		goto out;
 
-	entry = pte_to_swp_entry(orig_pte);
+	entry = pte_to_swp_entry(vmf->orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {
 			migration_entry_wait(vma->vm_mm, vmf->pmd,
@@ -2530,7 +2528,7 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
 		} else {
-			print_bad_pte(vma, vmf->address, orig_pte, NULL);
+			print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL);
 			ret = VM_FAULT_SIGBUS;
 		}
 		goto out;
@@ -2547,7 +2545,7 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 			 */
 			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					vmf->address, &vmf->ptl);
-			if (likely(pte_same(*vmf->pte, orig_pte)))
+			if (likely(pte_same(*vmf->pte, vmf->orig_pte)))
 				ret = VM_FAULT_OOM;
 			delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 			goto unlock;
@@ -2604,7 +2602,7 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 	 */
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
-	if (unlikely(!pte_same(*vmf->pte, orig_pte)))
+	if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte)))
 		goto out_nomap;
 
 	if (unlikely(!PageUptodate(page))) {
@@ -2632,9 +2630,10 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 		exclusive = RMAP_EXCLUSIVE;
 	}
 	flush_icache_page(vma, page);
-	if (pte_swp_soft_dirty(orig_pte))
+	if (pte_swp_soft_dirty(vmf->orig_pte))
 		pte = pte_mksoft_dirty(pte);
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
+	vmf->orig_pte = pte;
 	if (page == swapcache) {
 		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		mem_cgroup_commit_charge(page, memcg, true, false);
@@ -2664,7 +2663,7 @@ int do_swap_page(struct vm_fault *vmf, pte_t orig_pte)
 	}
 
 	if (vmf->flags & FAULT_FLAG_WRITE) {
-		ret |= do_wp_page(vmf, pte);
+		ret |= do_wp_page(vmf);
 		if (ret & VM_FAULT_ERROR)
 			ret &= VM_FAULT_ERROR;
 		goto out;
@@ -3326,7 +3325,7 @@ static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 	return mpol_misplaced(page, vma, addr);
 }
 
-static int do_numa_page(struct vm_fault *vmf, pte_t pte)
+static int do_numa_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *page = NULL;
@@ -3334,6 +3333,7 @@ static int do_numa_page(struct vm_fault *vmf, pte_t pte)
 	int last_cpupid;
 	int target_nid;
 	bool migrated = false;
+	pte_t pte = vmf->orig_pte;
 	bool was_writable = pte_write(pte);
 	int flags = 0;
 
@@ -3484,8 +3484,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
 		 * So now it's safe to run pte_offset_map().
 		 */
 		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
-
-		entry = *vmf->pte;
+		vmf->orig_pte = *vmf->pte;
 
 		/*
 		 * some architectures can have larger ptes than wordsize,
@@ -3496,6 +3495,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
 		 * ptl lock held. So here a barrier will do.
 		 */
 		barrier();
+		entry = vmf->orig_pte;
 		if (pte_none(entry)) {
 			pte_unmap(vmf->pte);
 			vmf->pte = NULL;
@@ -3510,10 +3510,10 @@ static int handle_pte_fault(struct vm_fault *vmf)
 	}
 
 	if (!pte_present(entry))
-		return do_swap_page(vmf, entry);
+		return do_swap_page(vmf);
 
 	if (pte_protnone(entry) && vma_is_accessible(vmf->vma))
-		return do_numa_page(vmf, entry);
+		return do_numa_page(vmf);
 
 	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
 	spin_lock(vmf->ptl);
@@ -3521,7 +3521,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
 		goto unlock;
 	if (vmf->flags & FAULT_FLAG_WRITE) {
 		if (!pte_write(entry))
-			return do_wp_page(vmf, entry);
+			return do_wp_page(vmf);
 		entry = pte_mkdirty(entry);
 	}
 	entry = pte_mkyoung(entry);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 08/20] mm: Allow full handling of COW faults in ->fault handlers
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 4 +++-
 mm/memory.c        | 8 +++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c908fd7243ea..faa77b15e9a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -303,7 +303,8 @@ struct vm_fault {
 					 * the 'address' */
 	pte_t orig_pte;			/* Value of PTE at the time of fault */
 
-	struct page *cow_page;		/* Handler may choose to COW */
+	struct page *cow_page;		/* Page handler may use for COW fault */
+	struct mem_cgroup *memcg;	/* Cgroup cow_page belongs to */
 	struct page *page;		/* ->fault handlers should return a
 					 * page here, unless VM_FAULT_NOPAGE
 					 * is set (which is also implied by
@@ -1117,6 +1118,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
+#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index 0c8779c23925..17db88a38e8a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2844,9 +2844,8 @@ static int __do_fault(struct vm_fault *vmf)
 	int ret;
 
 	ret = vma->vm_ops->fault(vma, vmf);
-	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-		return ret;
-	if (ret & VM_FAULT_DAX_LOCKED)
+	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
+			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3205,9 +3204,12 @@ static int do_cow_fault(struct vm_fault *vmf)
 	}
 
 	vmf->cow_page = new_page;
+	vmf->memcg = memcg;
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
+	if (ret & VM_FAULT_DONE_COW)
+		return ret;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
 		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 08/20] mm: Allow full handling of COW faults in ->fault handlers
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 4 +++-
 mm/memory.c        | 8 +++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c908fd7243ea..faa77b15e9a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -303,7 +303,8 @@ struct vm_fault {
 					 * the 'address' */
 	pte_t orig_pte;			/* Value of PTE at the time of fault */
 
-	struct page *cow_page;		/* Handler may choose to COW */
+	struct page *cow_page;		/* Page handler may use for COW fault */
+	struct mem_cgroup *memcg;	/* Cgroup cow_page belongs to */
 	struct page *page;		/* ->fault handlers should return a
 					 * page here, unless VM_FAULT_NOPAGE
 					 * is set (which is also implied by
@@ -1117,6 +1118,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
+#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index 0c8779c23925..17db88a38e8a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2844,9 +2844,8 @@ static int __do_fault(struct vm_fault *vmf)
 	int ret;
 
 	ret = vma->vm_ops->fault(vma, vmf);
-	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-		return ret;
-	if (ret & VM_FAULT_DAX_LOCKED)
+	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
+			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3205,9 +3204,12 @@ static int do_cow_fault(struct vm_fault *vmf)
 	}
 
 	vmf->cow_page = new_page;
+	vmf->memcg = memcg;
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
+	if (ret & VM_FAULT_DONE_COW)
+		return ret;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
 		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 08/20] mm: Allow full handling of COW faults in ->fault handlers
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

To allow full handling of COW faults add memcg field to struct vm_fault
and a return value of ->fault() handler meaning that COW fault is fully
handled and memcg charge must not be canceled. This will allow us to
remove knowledge about special DAX locking from the generic fault code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 4 +++-
 mm/memory.c        | 8 +++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c908fd7243ea..faa77b15e9a6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -303,7 +303,8 @@ struct vm_fault {
 					 * the 'address' */
 	pte_t orig_pte;			/* Value of PTE at the time of fault */
 
-	struct page *cow_page;		/* Handler may choose to COW */
+	struct page *cow_page;		/* Page handler may use for COW fault */
+	struct mem_cgroup *memcg;	/* Cgroup cow_page belongs to */
 	struct page *page;		/* ->fault handlers should return a
 					 * page here, unless VM_FAULT_NOPAGE
 					 * is set (which is also implied by
@@ -1117,6 +1118,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
 #define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
+#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index 0c8779c23925..17db88a38e8a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2844,9 +2844,8 @@ static int __do_fault(struct vm_fault *vmf)
 	int ret;
 
 	ret = vma->vm_ops->fault(vma, vmf);
-	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
-		return ret;
-	if (ret & VM_FAULT_DAX_LOCKED)
+	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
+			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3205,9 +3204,12 @@ static int do_cow_fault(struct vm_fault *vmf)
 	}
 
 	vmf->cow_page = new_page;
+	vmf->memcg = memcg;
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
+	if (ret & VM_FAULT_DONE_COW)
+		return ret;
 
 	if (!(ret & VM_FAULT_DAX_LOCKED))
 		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 09/20] mm: Factor out functionality to finish page faults
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (7 preceding siblings ...)
  2016-09-27 16:08   ` Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-17 17:38     ` Ross Zwisler
  2016-10-17 17:40     ` Ross Zwisler
  2016-09-27 16:08 ` [PATCH 10/20] mm: Move handling of COW faults into DAX code Jan Kara
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Introduce function finish_fault() as a helper function for finishing
page faults. It is rather thin wrapper around alloc_set_pte() but since
we'd want to call this from DAX code or filesystems, it is still useful
to avoid some boilerplate code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |  1 +
 mm/memory.c        | 42 +++++++++++++++++++++++++++++++++---------
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index faa77b15e9a6..919ebdd27f1e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -622,6 +622,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 
 int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page);
+int finish_fault(struct vm_fault *vmf);
 #endif
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index 17db88a38e8a..f54cfad7fe04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3029,6 +3029,36 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 	return 0;
 }
 
+
+/**
+ * finish_fault - finish page fault once we have prepared the page to fault
+ *
+ * @vmf: structure describing the fault
+ *
+ * This function handles all that is needed to finish a page fault once the
+ * page to fault in is prepared. It handles locking of PTEs, inserts PTE for
+ * given page, adds reverse page mapping, handles memcg charges and LRU
+ * addition. The function returns 0 on success, VM_FAULT_ code in case of
+ * error.
+ *
+ * The function expects the page to be locked.
+ */
+int finish_fault(struct vm_fault *vmf)
+{
+	struct page *page;
+	int ret;
+
+	/* Did we COW the page? */
+	if (vmf->flags & FAULT_FLAG_WRITE && !(vmf->vma->vm_flags & VM_SHARED))
+		page = vmf->cow_page;
+	else
+		page = vmf->page;
+	ret = alloc_set_pte(vmf, vmf->memcg, page);
+	if (vmf->pte)
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	return ret;
+}
+
 static unsigned long fault_around_bytes __read_mostly =
 	rounddown_pow_of_two(65536);
 
@@ -3174,9 +3204,7 @@ static int do_read_fault(struct vm_fault *vmf)
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		return ret;
 
-	ret |= alloc_set_pte(vmf, NULL, vmf->page);
-	if (vmf->pte)
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	ret |= finish_fault(vmf);
 	unlock_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		put_page(vmf->page);
@@ -3215,9 +3243,7 @@ static int do_cow_fault(struct vm_fault *vmf)
 		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(new_page);
 
-	ret |= alloc_set_pte(vmf, memcg, new_page);
-	if (vmf->pte)
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	ret |= finish_fault(vmf);
 	if (!(ret & VM_FAULT_DAX_LOCKED)) {
 		unlock_page(vmf->page);
 		put_page(vmf->page);
@@ -3258,9 +3284,7 @@ static int do_shared_fault(struct vm_fault *vmf)
 		}
 	}
 
-	ret |= alloc_set_pte(vmf, NULL, vmf->page);
-	if (vmf->pte)
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	ret |= finish_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |
 					VM_FAULT_RETRY))) {
 		unlock_page(vmf->page);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 10/20] mm: Move handling of COW faults into DAX code
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (8 preceding siblings ...)
  2016-09-27 16:08 ` [PATCH 09/20] mm: Factor out functionality to finish page faults Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-17 19:29     ` Ross Zwisler
  2016-09-27 16:08 ` [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check Jan Kara
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of peculiarities
of DAX locking so remove that knowledge.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 22 ++++++++++++++++------
 include/linux/dax.h |  7 -------
 include/linux/mm.h  |  9 +--------
 mm/memory.c         | 14 ++++----------
 4 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 0dc251ca77b8..b1c503930d1d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -876,10 +876,15 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			goto unlock_entry;
 		if (!radix_tree_exceptional_entry(entry)) {
 			vmf->page = entry;
-			return VM_FAULT_LOCKED;
+			if (unlikely(PageHWPoison(entry))) {
+				put_locked_mapping_entry(mapping, vmf->pgoff,
+							 entry);
+				return VM_FAULT_HWPOISON;
+			}
 		}
-		vmf->entry = entry;
-		return VM_FAULT_DAX_LOCKED;
+		error = finish_fault(vmf);
+		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
+		return error ? error : VM_FAULT_DONE_COW;
 	}
 
 	if (!buffer_mapped(&bh)) {
@@ -1430,10 +1435,15 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			goto unlock_entry;
 		if (!radix_tree_exceptional_entry(entry)) {
 			vmf->page = entry;
-			return VM_FAULT_LOCKED;
+			if (unlikely(PageHWPoison(entry))) {
+				put_locked_mapping_entry(mapping, vmf->pgoff,
+							 entry);
+				return VM_FAULT_HWPOISON;
+			}
 		}
-		vmf->entry = entry;
-		return VM_FAULT_DAX_LOCKED;
+		error = finish_fault(vmf);
+		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
+		return error ? error : VM_FAULT_DONE_COW;
 	}
 
 	switch (iomap.type) {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index add6c4bc568f..b1a1acd10df2 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -26,7 +26,6 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 #ifdef CONFIG_FS_DAX
 struct page *read_dax_sector(struct block_device *bdev, sector_t n);
-void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index);
 int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
@@ -35,12 +34,6 @@ static inline struct page *read_dax_sector(struct block_device *bdev,
 {
 	return ERR_PTR(-ENXIO);
 }
-/* Shouldn't ever be called when dax is disabled. */
-static inline void dax_unlock_mapping_entry(struct address_space *mapping,
-					    pgoff_t index)
-{
-	BUG();
-}
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		sector_t sector, unsigned int offset, unsigned int length)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 919ebdd27f1e..1055f2ece80d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -310,12 +310,6 @@ struct vm_fault {
 					 * is set (which is also implied by
 					 * VM_FAULT_ERROR).
 					 */
-	void *entry;			/* ->fault handler can alternatively
-					 * return locked DAX entry. In that
-					 * case handler should return
-					 * VM_FAULT_DAX_LOCKED and fill in
-					 * entry here.
-					 */
 	/* These three entries are valid only while holding ptl lock */
 	pte_t *pte;			/* Pointer to pte entry matching
 					 * the 'address'. NULL if the page
@@ -1118,8 +1112,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
-#define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
-#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
+#define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index f54cfad7fe04..a4522e8999b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2845,7 +2845,7 @@ static int __do_fault(struct vm_fault *vmf)
 
 	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
-			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
+			    VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3239,17 +3239,11 @@ static int do_cow_fault(struct vm_fault *vmf)
 	if (ret & VM_FAULT_DONE_COW)
 		return ret;
 
-	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(new_page, vmf->page, vmf->address, vma);
+	copy_user_highpage(new_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(new_page);
-
 	ret |= finish_fault(vmf);
-	if (!(ret & VM_FAULT_DAX_LOCKED)) {
-		unlock_page(vmf->page);
-		put_page(vmf->page);
-	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
-	}
+	unlock_page(vmf->page);
+	put_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 	return ret;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (9 preceding siblings ...)
  2016-09-27 16:08 ` [PATCH 10/20] mm: Move handling of COW faults into DAX code Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-17 19:40     ` Ross Zwisler
  2016-09-27 16:08   ` Jan Kara
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

We don't check whether vma->vm_ops is NULL in do_shared_fault() so
there's hardly any point in checking it in wp_page_shared() which gets
called only for shared file mappings as well.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index a4522e8999b2..63d9c1a54caf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2301,7 +2301,7 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
 
 	get_page(old_page);
 
-	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
+	if (vma->vm_ops->page_mkwrite) {
 		int tmp;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 12/20] mm: Factor out common parts of write fault handling
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Currently we duplicate handling of shared write faults in
wp_page_reuse() and do_shared_fault(). Factor them out into a common
function.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
 1 file changed, 37 insertions(+), 41 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 63d9c1a54caf..0643b3b5a12a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 }
 
 /*
+ * Handle dirtying of a page in shared file mapping on a write fault.
+ *
+ * The function expects the page to be locked and unlocks it.
+ */
+static void fault_dirty_shared_page(struct vm_area_struct *vma,
+				    struct page *page)
+{
+	struct address_space *mapping;
+	bool dirtied;
+	bool page_mkwrite = vma->vm_ops->page_mkwrite;
+
+	dirtied = set_page_dirty(page);
+	VM_BUG_ON_PAGE(PageAnon(page), page);
+	/*
+	 * Take a local copy of the address_space - page.mapping may be zeroed
+	 * by truncate after unlock_page().   The address_space itself remains
+	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
+	 * release semantics to prevent the compiler from undoing this copying.
+	 */
+	mapping = page_rmapping(page);
+	unlock_page(page);
+
+	if ((dirtied || page_mkwrite) && mapping) {
+		/*
+		 * Some device drivers do not set page.mapping
+		 * but still dirty their pages
+		 */
+		balance_dirty_pages_ratelimited(mapping);
+	}
+
+	if (!page_mkwrite)
+		file_update_time(vma->vm_file);
+}
+
+/*
  * Handle write page faults for pages that can be reused in the current vma
  *
  * This can happen either due to the mapping being with the VM_SHARED flag,
@@ -2092,28 +2127,11 @@ static inline int wp_page_reuse(struct vm_fault *vmf, struct page *page,
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 
 	if (dirty_shared) {
-		struct address_space *mapping;
-		int dirtied;
-
 		if (!page_mkwrite)
 			lock_page(page);
 
-		dirtied = set_page_dirty(page);
-		VM_BUG_ON_PAGE(PageAnon(page), page);
-		mapping = page->mapping;
-		unlock_page(page);
+		fault_dirty_shared_page(vma, page);
 		put_page(page);
-
-		if ((dirtied || page_mkwrite) && mapping) {
-			/*
-			 * Some device drivers do not set page.mapping
-			 * but still dirty their pages
-			 */
-			balance_dirty_pages_ratelimited(mapping);
-		}
-
-		if (!page_mkwrite)
-			file_update_time(vma->vm_file);
 	}
 
 	return VM_FAULT_WRITE;
@@ -3256,8 +3274,6 @@ uncharge_out:
 static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping;
-	int dirtied = 0;
 	int ret, tmp;
 
 	ret = __do_fault(vmf);
@@ -3286,27 +3302,7 @@ static int do_shared_fault(struct vm_fault *vmf)
 		return ret;
 	}
 
-	if (set_page_dirty(vmf->page))
-		dirtied = 1;
-	/*
-	 * Take a local copy of the address_space - page.mapping may be zeroed
-	 * by truncate after unlock_page().   The address_space itself remains
-	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
-	 * release semantics to prevent the compiler from undoing this copying.
-	 */
-	mapping = page_rmapping(vmf->page);
-	unlock_page(vmf->page);
-	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
-		/*
-		 * Some device drivers do not set page.mapping but still
-		 * dirty their pages
-		 */
-		balance_dirty_pages_ratelimited(mapping);
-	}
-
-	if (!vma->vm_ops->page_mkwrite)
-		file_update_time(vma->vm_file);
-
+	fault_dirty_shared_page(vma, vmf->page);
 	return ret;
 }
 
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 12/20] mm: Factor out common parts of write fault handling
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently we duplicate handling of shared write faults in
wp_page_reuse() and do_shared_fault(). Factor them out into a common
function.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
 1 file changed, 37 insertions(+), 41 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 63d9c1a54caf..0643b3b5a12a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 }
 
 /*
+ * Handle dirtying of a page in shared file mapping on a write fault.
+ *
+ * The function expects the page to be locked and unlocks it.
+ */
+static void fault_dirty_shared_page(struct vm_area_struct *vma,
+				    struct page *page)
+{
+	struct address_space *mapping;
+	bool dirtied;
+	bool page_mkwrite = vma->vm_ops->page_mkwrite;
+
+	dirtied = set_page_dirty(page);
+	VM_BUG_ON_PAGE(PageAnon(page), page);
+	/*
+	 * Take a local copy of the address_space - page.mapping may be zeroed
+	 * by truncate after unlock_page().   The address_space itself remains
+	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
+	 * release semantics to prevent the compiler from undoing this copying.
+	 */
+	mapping = page_rmapping(page);
+	unlock_page(page);
+
+	if ((dirtied || page_mkwrite) && mapping) {
+		/*
+		 * Some device drivers do not set page.mapping
+		 * but still dirty their pages
+		 */
+		balance_dirty_pages_ratelimited(mapping);
+	}
+
+	if (!page_mkwrite)
+		file_update_time(vma->vm_file);
+}
+
+/*
  * Handle write page faults for pages that can be reused in the current vma
  *
  * This can happen either due to the mapping being with the VM_SHARED flag,
@@ -2092,28 +2127,11 @@ static inline int wp_page_reuse(struct vm_fault *vmf, struct page *page,
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 
 	if (dirty_shared) {
-		struct address_space *mapping;
-		int dirtied;
-
 		if (!page_mkwrite)
 			lock_page(page);
 
-		dirtied = set_page_dirty(page);
-		VM_BUG_ON_PAGE(PageAnon(page), page);
-		mapping = page->mapping;
-		unlock_page(page);
+		fault_dirty_shared_page(vma, page);
 		put_page(page);
-
-		if ((dirtied || page_mkwrite) && mapping) {
-			/*
-			 * Some device drivers do not set page.mapping
-			 * but still dirty their pages
-			 */
-			balance_dirty_pages_ratelimited(mapping);
-		}
-
-		if (!page_mkwrite)
-			file_update_time(vma->vm_file);
 	}
 
 	return VM_FAULT_WRITE;
@@ -3256,8 +3274,6 @@ uncharge_out:
 static int do_shared_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping;
-	int dirtied = 0;
 	int ret, tmp;
 
 	ret = __do_fault(vmf);
@@ -3286,27 +3302,7 @@ static int do_shared_fault(struct vm_fault *vmf)
 		return ret;
 	}
 
-	if (set_page_dirty(vmf->page))
-		dirtied = 1;
-	/*
-	 * Take a local copy of the address_space - page.mapping may be zeroed
-	 * by truncate after unlock_page().   The address_space itself remains
-	 * pinned by vma->vm_file's reference.  We rely on unlock_page()'s
-	 * release semantics to prevent the compiler from undoing this copying.
-	 */
-	mapping = page_rmapping(vmf->page);
-	unlock_page(vmf->page);
-	if ((dirtied || vma->vm_ops->page_mkwrite) && mapping) {
-		/*
-		 * Some device drivers do not set page.mapping but still
-		 * dirty their pages
-		 */
-		balance_dirty_pages_ratelimited(mapping);
-	}
-
-	if (!vma->vm_ops->page_mkwrite)
-		file_update_time(vma->vm_file);
-
+	fault_dirty_shared_page(vma, vmf->page);
 	return ret;
 }
 
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite()
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (11 preceding siblings ...)
  2016-09-27 16:08   ` Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-17 22:29     ` Ross Zwisler
  2016-09-27 16:08 ` [PATCH 14/20] mm: Use vmf->page during WP faults Jan Kara
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

We will need more information in the ->page_mkwrite() helper for DAX to
be able to fully finish faults there. Pass vm_fault structure to
do_page_mkwrite() and use it there so that information propagates
properly from upper layers.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 0643b3b5a12a..7c87edaa7a8f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2034,20 +2034,14 @@ static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
  *
  * We do this without the lock held, so that it can sleep if it needs to.
  */
-static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
-	       unsigned long address)
+static int do_page_mkwrite(struct vm_fault *vmf)
 {
-	struct vm_fault vmf;
 	int ret;
+	struct page *page = vmf->page;
 
-	vmf.virtual_address = address & PAGE_MASK;
-	vmf.pgoff = page->index;
-	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
-	vmf.gfp_mask = __get_fault_gfp_mask(vma);
-	vmf.page = page;
-	vmf.cow_page = NULL;
+	vmf->flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
 
-	ret = vma->vm_ops->page_mkwrite(vma, &vmf);
+	ret = vmf->vma->vm_ops->page_mkwrite(vmf->vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))
 		return ret;
 	if (unlikely(!(ret & VM_FAULT_LOCKED))) {
@@ -2323,7 +2317,8 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
 		int tmp;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		tmp = do_page_mkwrite(vma, old_page, vmf->address);
+		vmf->page = old_page;
+		tmp = do_page_mkwrite(vmf);
 		if (unlikely(!tmp || (tmp &
 				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			put_page(old_page);
@@ -3286,7 +3281,7 @@ static int do_shared_fault(struct vm_fault *vmf)
 	 */
 	if (vma->vm_ops->page_mkwrite) {
 		unlock_page(vmf->page);
-		tmp = do_page_mkwrite(vma, vmf->page, vmf->address);
+		tmp = do_page_mkwrite(vmf);
 		if (unlikely(!tmp ||
 				(tmp & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			put_page(vmf->page);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 14/20] mm: Use vmf->page during WP faults
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (12 preceding siblings ...)
  2016-09-27 16:08 ` [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite() Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-18 17:56     ` Ross Zwisler
  2016-09-27 16:08   ` Jan Kara
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

So far we set vmf->page during WP faults only when we needed to pass it
to the ->page_mkwrite handler. Set it in all the cases now and use that
instead of passing page pointer explicitely around.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 58 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7c87edaa7a8f..98304eb7bff4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2099,11 +2099,12 @@ static void fault_dirty_shared_page(struct vm_area_struct *vma,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct vm_fault *vmf, struct page *page,
+static inline int wp_page_reuse(struct vm_fault *vmf,
 				int page_mkwrite, int dirty_shared)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	struct page *page = vmf->page;
 	pte_t entry;
 	/*
 	 * Clear the pages cpupid information as the existing
@@ -2147,10 +2148,11 @@ static inline int wp_page_reuse(struct vm_fault *vmf, struct page *page,
  *   held to the old page, as well as updating the rmap.
  * - In any case, unlock the PTL and drop the reference we took to the old page.
  */
-static int wp_page_copy(struct vm_fault *vmf, struct page *old_page)
+static int wp_page_copy(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
+	struct page *old_page = vmf->page;
 	struct page *new_page = NULL;
 	pte_t entry;
 	int page_copied = 0;
@@ -2302,26 +2304,25 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 			return 0;
 		}
 	}
-	return wp_page_reuse(vmf, NULL, 0, 0);
+	return wp_page_reuse(vmf, 0, 0);
 }
 
-static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
+static int wp_page_shared(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	int page_mkwrite = 0;
 
-	get_page(old_page);
+	get_page(vmf->page);
 
 	if (vma->vm_ops->page_mkwrite) {
 		int tmp;
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		vmf->page = old_page;
 		tmp = do_page_mkwrite(vmf);
 		if (unlikely(!tmp || (tmp &
 				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
-			put_page(old_page);
+			put_page(vmf->page);
 			return tmp;
 		}
 		/*
@@ -2333,15 +2334,15 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
 		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 						vmf->address, &vmf->ptl);
 		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-			unlock_page(old_page);
+			unlock_page(vmf->page);
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			put_page(old_page);
+			put_page(vmf->page);
 			return 0;
 		}
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(vmf, old_page, page_mkwrite, 1);
+	return wp_page_reuse(vmf, page_mkwrite, 1);
 }
 
 /*
@@ -2366,10 +2367,9 @@ static int do_wp_page(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *old_page;
 
-	old_page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
-	if (!old_page) {
+	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
+	if (!vmf->page) {
 		/*
 		 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
 		 * VM_PFNMAP VMA.
@@ -2382,30 +2382,30 @@ static int do_wp_page(struct vm_fault *vmf)
 			return wp_pfn_shared(vmf);
 
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		return wp_page_copy(vmf, old_page);
+		return wp_page_copy(vmf);
 	}
 
 	/*
 	 * Take out anonymous pages first, anonymous shared vmas are
 	 * not dirty accountable.
 	 */
-	if (PageAnon(old_page) && !PageKsm(old_page)) {
+	if (PageAnon(vmf->page) && !PageKsm(vmf->page)) {
 		int total_mapcount;
-		if (!trylock_page(old_page)) {
-			get_page(old_page);
+		if (!trylock_page(vmf->page)) {
+			get_page(vmf->page);
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			lock_page(old_page);
+			lock_page(vmf->page);
 			vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 					vmf->address, &vmf->ptl);
 			if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-				unlock_page(old_page);
+				unlock_page(vmf->page);
 				pte_unmap_unlock(vmf->pte, vmf->ptl);
-				put_page(old_page);
+				put_page(vmf->page);
 				return 0;
 			}
-			put_page(old_page);
+			put_page(vmf->page);
 		}
-		if (reuse_swap_page(old_page, &total_mapcount)) {
+		if (reuse_swap_page(vmf->page, &total_mapcount)) {
 			if (total_mapcount == 1) {
 				/*
 				 * The page is all ours. Move it to
@@ -2414,24 +2414,24 @@ static int do_wp_page(struct vm_fault *vmf)
 				 * Protected against the rmap code by
 				 * the page lock.
 				 */
-				page_move_anon_rmap(old_page, vma);
+				page_move_anon_rmap(vmf->page, vma);
 			}
-			unlock_page(old_page);
-			return wp_page_reuse(vmf, old_page, 0, 0);
+			unlock_page(vmf->page);
+			return wp_page_reuse(vmf, 0, 0);
 		}
-		unlock_page(old_page);
+		unlock_page(vmf->page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
 					(VM_WRITE|VM_SHARED))) {
-		return wp_page_shared(vmf, old_page);
+		return wp_page_shared(vmf);
 	}
 
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
-	get_page(old_page);
+	get_page(vmf->page);
 
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-	return wp_page_copy(vmf, old_page);
+	return wp_page_copy(vmf);
 }
 
 static void unmap_mapping_range_vma(struct vm_area_struct *vma,
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 15/20] mm: Move part of wp_page_reuse() into the single call site
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared(). Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 98304eb7bff4..f49e736d6a36 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2099,8 +2099,7 @@ static void fault_dirty_shared_page(struct vm_area_struct *vma,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct vm_fault *vmf,
-				int page_mkwrite, int dirty_shared)
+static inline void wp_page_reuse(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -2120,16 +2119,6 @@ static inline int wp_page_reuse(struct vm_fault *vmf,
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-
-	if (dirty_shared) {
-		if (!page_mkwrite)
-			lock_page(page);
-
-		fault_dirty_shared_page(vma, page);
-		put_page(page);
-	}
-
-	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2304,7 +2293,8 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 			return 0;
 		}
 	}
-	return wp_page_reuse(vmf, 0, 0);
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
 }
 
 static int wp_page_shared(struct vm_fault *vmf)
@@ -2342,7 +2332,13 @@ static int wp_page_shared(struct vm_fault *vmf)
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(vmf, page_mkwrite, 1);
+	wp_page_reuse(vmf);
+	if (!page_mkwrite)
+		lock_page(vmf->page);
+	fault_dirty_shared_page(vma, vmf->page);
+	put_page(vmf->page);
+
+	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2417,7 +2413,8 @@ static int do_wp_page(struct vm_fault *vmf)
 				page_move_anon_rmap(vmf->page, vma);
 			}
 			unlock_page(vmf->page);
-			return wp_page_reuse(vmf, 0, 0);
+			wp_page_reuse(vmf);
+			return VM_FAULT_WRITE;
 		}
 		unlock_page(vmf->page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 15/20] mm: Move part of wp_page_reuse() into the single call site
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared(). Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 98304eb7bff4..f49e736d6a36 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2099,8 +2099,7 @@ static void fault_dirty_shared_page(struct vm_area_struct *vma,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct vm_fault *vmf,
-				int page_mkwrite, int dirty_shared)
+static inline void wp_page_reuse(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -2120,16 +2119,6 @@ static inline int wp_page_reuse(struct vm_fault *vmf,
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-
-	if (dirty_shared) {
-		if (!page_mkwrite)
-			lock_page(page);
-
-		fault_dirty_shared_page(vma, page);
-		put_page(page);
-	}
-
-	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2304,7 +2293,8 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 			return 0;
 		}
 	}
-	return wp_page_reuse(vmf, 0, 0);
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
 }
 
 static int wp_page_shared(struct vm_fault *vmf)
@@ -2342,7 +2332,13 @@ static int wp_page_shared(struct vm_fault *vmf)
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(vmf, page_mkwrite, 1);
+	wp_page_reuse(vmf);
+	if (!page_mkwrite)
+		lock_page(vmf->page);
+	fault_dirty_shared_page(vma, vmf->page);
+	put_page(vmf->page);
+
+	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2417,7 +2413,8 @@ static int do_wp_page(struct vm_fault *vmf)
 				page_move_anon_rmap(vmf->page, vma);
 			}
 			unlock_page(vmf->page);
-			return wp_page_reuse(vmf, 0, 0);
+			wp_page_reuse(vmf);
+			return VM_FAULT_WRITE;
 		}
 		unlock_page(vmf->page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 15/20] mm: Move part of wp_page_reuse() into the single call site
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

wp_page_reuse() handles write shared faults which is needed only in
wp_page_shared(). Move the handling only into that location to make
wp_page_reuse() simpler and avoid a strange situation when we sometimes
pass in locked page, sometimes unlocked etc.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/memory.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 98304eb7bff4..f49e736d6a36 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2099,8 +2099,7 @@ static void fault_dirty_shared_page(struct vm_area_struct *vma,
  * case, all we need to do here is to mark the page as writable and update
  * any related book-keeping.
  */
-static inline int wp_page_reuse(struct vm_fault *vmf,
-				int page_mkwrite, int dirty_shared)
+static inline void wp_page_reuse(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -2120,16 +2119,6 @@ static inline int wp_page_reuse(struct vm_fault *vmf,
 	if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
 		update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
-
-	if (dirty_shared) {
-		if (!page_mkwrite)
-			lock_page(page);
-
-		fault_dirty_shared_page(vma, page);
-		put_page(page);
-	}
-
-	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2304,7 +2293,8 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 			return 0;
 		}
 	}
-	return wp_page_reuse(vmf, 0, 0);
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
 }
 
 static int wp_page_shared(struct vm_fault *vmf)
@@ -2342,7 +2332,13 @@ static int wp_page_shared(struct vm_fault *vmf)
 		page_mkwrite = 1;
 	}
 
-	return wp_page_reuse(vmf, page_mkwrite, 1);
+	wp_page_reuse(vmf);
+	if (!page_mkwrite)
+		lock_page(vmf->page);
+	fault_dirty_shared_page(vma, vmf->page);
+	put_page(vmf->page);
+
+	return VM_FAULT_WRITE;
 }
 
 /*
@@ -2417,7 +2413,8 @@ static int do_wp_page(struct vm_fault *vmf)
 				page_move_anon_rmap(vmf->page, vma);
 			}
 			unlock_page(vmf->page);
-			return wp_page_reuse(vmf, 0, 0);
+			wp_page_reuse(vmf);
+			return VM_FAULT_WRITE;
 		}
 		unlock_page(vmf->page);
 	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |  1 +
 mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1055f2ece80d..e5a014be8932 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page);
 int finish_fault(struct vm_fault *vmf);
+int finish_mkwrite_fault(struct vm_fault *vmf);
 #endif
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index f49e736d6a36..8c8cb7f2133e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2266,6 +2266,36 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+/**
+ * finish_mkrite_fault - finish page fault making PTE writeable once the page
+ *			 page is prepared
+ *
+ * @vmf: structure describing the fault
+ *
+ * This function handles all that is needed to finish a write page fault due
+ * to PTE being read-only once the mapped page is prepared. It handles locking
+ * of PTE and modifying it. The function returns VM_FAULT_WRITE on success,
+ * 0 when PTE got changed before we acquired PTE lock.
+ *
+ * The function expects the page to be locked or other protection against
+ * concurrent faults / writeback (such as DAX radix tree locks).
+ */
+int finish_mkwrite_fault(struct vm_fault *vmf)
+{
+	vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address,
+				       &vmf->ptl);
+	/*
+	 * We might have raced with another page fault while we released the
+	 * pte_offset_map_lock.
+	 */
+	if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return 0;
+	}
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
+}
+
 /*
  * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED
  * mapping
@@ -2282,16 +2312,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-				vmf->address, &vmf->ptl);
-		/*
-		 * We might have raced with another page fault while we
-		 * released the pte_offset_map_lock.
-		 */
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			return 0;
-		}
+		return finish_mkwrite_fault(vmf);
 	}
 	wp_page_reuse(vmf);
 	return VM_FAULT_WRITE;
@@ -2301,7 +2322,6 @@ static int wp_page_shared(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	int page_mkwrite = 0;
 
 	get_page(vmf->page);
 
@@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
 			put_page(vmf->page);
 			return tmp;
 		}
-		/*
-		 * Since we dropped the lock we need to revalidate
-		 * the PTE as someone else may have changed it.  If
-		 * they did, we just return, as we can count on the
-		 * MMU to tell us if they didn't also make it writable.
-		 */
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-						vmf->address, &vmf->ptl);
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		tmp = finish_mkwrite_fault(vmf);
+		if (unlikely(!tmp || (tmp &
+				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			unlock_page(vmf->page);
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			put_page(vmf->page);
-			return 0;
+			return tmp;
 		}
-		page_mkwrite = 1;
-	}
-
-	wp_page_reuse(vmf);
-	if (!page_mkwrite)
+	} else {
+		wp_page_reuse(vmf);
 		lock_page(vmf->page);
+	}
 	fault_dirty_shared_page(vma, vmf->page);
 	put_page(vmf->page);
 
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |  1 +
 mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1055f2ece80d..e5a014be8932 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page);
 int finish_fault(struct vm_fault *vmf);
+int finish_mkwrite_fault(struct vm_fault *vmf);
 #endif
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index f49e736d6a36..8c8cb7f2133e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2266,6 +2266,36 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+/**
+ * finish_mkrite_fault - finish page fault making PTE writeable once the page
+ *			 page is prepared
+ *
+ * @vmf: structure describing the fault
+ *
+ * This function handles all that is needed to finish a write page fault due
+ * to PTE being read-only once the mapped page is prepared. It handles locking
+ * of PTE and modifying it. The function returns VM_FAULT_WRITE on success,
+ * 0 when PTE got changed before we acquired PTE lock.
+ *
+ * The function expects the page to be locked or other protection against
+ * concurrent faults / writeback (such as DAX radix tree locks).
+ */
+int finish_mkwrite_fault(struct vm_fault *vmf)
+{
+	vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address,
+				       &vmf->ptl);
+	/*
+	 * We might have raced with another page fault while we released the
+	 * pte_offset_map_lock.
+	 */
+	if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return 0;
+	}
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
+}
+
 /*
  * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED
  * mapping
@@ -2282,16 +2312,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-				vmf->address, &vmf->ptl);
-		/*
-		 * We might have raced with another page fault while we
-		 * released the pte_offset_map_lock.
-		 */
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			return 0;
-		}
+		return finish_mkwrite_fault(vmf);
 	}
 	wp_page_reuse(vmf);
 	return VM_FAULT_WRITE;
@@ -2301,7 +2322,6 @@ static int wp_page_shared(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	int page_mkwrite = 0;
 
 	get_page(vmf->page);
 
@@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
 			put_page(vmf->page);
 			return tmp;
 		}
-		/*
-		 * Since we dropped the lock we need to revalidate
-		 * the PTE as someone else may have changed it.  If
-		 * they did, we just return, as we can count on the
-		 * MMU to tell us if they didn't also make it writable.
-		 */
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-						vmf->address, &vmf->ptl);
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		tmp = finish_mkwrite_fault(vmf);
+		if (unlikely(!tmp || (tmp &
+				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			unlock_page(vmf->page);
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			put_page(vmf->page);
-			return 0;
+			return tmp;
 		}
-		page_mkwrite = 1;
-	}
-
-	wp_page_reuse(vmf);
-	if (!page_mkwrite)
+	} else {
+		wp_page_reuse(vmf);
 		lock_page(vmf->page);
+	}
 	fault_dirty_shared_page(vma, vmf->page);
 	put_page(vmf->page);
 
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Provide a helper function for finishing write faults due to PTE being
read-only. The helper will be used by DAX to avoid the need of
complicating generic MM code with DAX locking specifics.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h |  1 +
 mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1055f2ece80d..e5a014be8932 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 		struct page *page);
 int finish_fault(struct vm_fault *vmf);
+int finish_mkwrite_fault(struct vm_fault *vmf);
 #endif
 
 /*
diff --git a/mm/memory.c b/mm/memory.c
index f49e736d6a36..8c8cb7f2133e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2266,6 +2266,36 @@ oom:
 	return VM_FAULT_OOM;
 }
 
+/**
+ * finish_mkrite_fault - finish page fault making PTE writeable once the page
+ *			 page is prepared
+ *
+ * @vmf: structure describing the fault
+ *
+ * This function handles all that is needed to finish a write page fault due
+ * to PTE being read-only once the mapped page is prepared. It handles locking
+ * of PTE and modifying it. The function returns VM_FAULT_WRITE on success,
+ * 0 when PTE got changed before we acquired PTE lock.
+ *
+ * The function expects the page to be locked or other protection against
+ * concurrent faults / writeback (such as DAX radix tree locks).
+ */
+int finish_mkwrite_fault(struct vm_fault *vmf)
+{
+	vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address,
+				       &vmf->ptl);
+	/*
+	 * We might have raced with another page fault while we released the
+	 * pte_offset_map_lock.
+	 */
+	if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return 0;
+	}
+	wp_page_reuse(vmf);
+	return VM_FAULT_WRITE;
+}
+
 /*
  * Handle write page faults for VM_MIXEDMAP or VM_PFNMAP for a VM_SHARED
  * mapping
@@ -2282,16 +2312,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
 		if (ret & VM_FAULT_ERROR)
 			return ret;
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-				vmf->address, &vmf->ptl);
-		/*
-		 * We might have raced with another page fault while we
-		 * released the pte_offset_map_lock.
-		 */
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
-			return 0;
-		}
+		return finish_mkwrite_fault(vmf);
 	}
 	wp_page_reuse(vmf);
 	return VM_FAULT_WRITE;
@@ -2301,7 +2322,6 @@ static int wp_page_shared(struct vm_fault *vmf)
 	__releases(vmf->ptl)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	int page_mkwrite = 0;
 
 	get_page(vmf->page);
 
@@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
 			put_page(vmf->page);
 			return tmp;
 		}
-		/*
-		 * Since we dropped the lock we need to revalidate
-		 * the PTE as someone else may have changed it.  If
-		 * they did, we just return, as we can count on the
-		 * MMU to tell us if they didn't also make it writable.
-		 */
-		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-						vmf->address, &vmf->ptl);
-		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		tmp = finish_mkwrite_fault(vmf);
+		if (unlikely(!tmp || (tmp &
+				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
 			unlock_page(vmf->page);
-			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			put_page(vmf->page);
-			return 0;
+			return tmp;
 		}
-		page_mkwrite = 1;
-	}
-
-	wp_page_reuse(vmf);
-	if (!page_mkwrite)
+	} else {
+		wp_page_reuse(vmf);
 		lock_page(vmf->page);
+	}
 	fault_dirty_shared_page(vma, vmf->page);
 	put_page(vmf->page);
 
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 17/20] mm: Export follow_pte()
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (15 preceding siblings ...)
  2016-09-27 16:08   ` Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-18 18:37     ` Ross Zwisler
  2016-09-27 16:08 ` [PATCH 18/20] dax: Make cache flushing protected by entry lock Jan Kara
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

DAX will need to implement its own version of page_check_address(). To
avoid duplicating page table walking code, export follow_pte() which
does what we need.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 2 ++
 mm/memory.c        | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e5a014be8932..133fabe4bb4c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1224,6 +1224,8 @@ int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows);
+int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp,
+	       spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index 8c8cb7f2133e..e7a4a30a5e88 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3763,8 +3763,8 @@ out:
 	return -EINVAL;
 }
 
-static inline int follow_pte(struct mm_struct *mm, unsigned long address,
-			     pte_t **ptepp, spinlock_t **ptlp)
+int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp,
+	       spinlock_t **ptlp)
 {
 	int res;
 
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 18/20] dax: Make cache flushing protected by entry lock
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (16 preceding siblings ...)
  2016-09-27 16:08 ` [PATCH 17/20] mm: Export follow_pte() Jan Kara
@ 2016-09-27 16:08 ` Jan Kara
  2016-10-18 19:20   ` Ross Zwisler
  2016-09-27 16:08   ` Jan Kara
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently, flushing of caches for DAX mappings was ignoring entry lock.
So far this was ok (modulo a bug that a difference in entry lock could
cause cache flushing to be mistakenly skipped) but in the following
patches we will write-protect PTEs on cache flushing and clear dirty
tags. For that we will need more exclusion. So do cache flushing under
an entry lock. This allows us to remove one lock-unlock pair of
mapping->tree_lock as a bonus.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 66 +++++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index b1c503930d1d..c6cadf8413a3 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -672,43 +672,63 @@ static int dax_writeback_one(struct block_device *bdev,
 		struct address_space *mapping, pgoff_t index, void *entry)
 {
 	struct radix_tree_root *page_tree = &mapping->page_tree;
-	int type = RADIX_DAX_TYPE(entry);
-	struct radix_tree_node *node;
 	struct blk_dax_ctl dax;
-	void **slot;
+	void *entry2, **slot;
 	int ret = 0;
+	int type;
 
-	spin_lock_irq(&mapping->tree_lock);
 	/*
-	 * Regular page slots are stabilized by the page lock even
-	 * without the tree itself locked.  These unlocked entries
-	 * need verification under the tree lock.
+	 * A page got tagged dirty in DAX mapping? Something is seriously
+	 * wrong.
 	 */
-	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
-		goto unlock;
-	if (*slot != entry)
-		goto unlock;
-
-	/* another fsync thread may have already written back this entry */
-	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
-		goto unlock;
+	if (WARN_ON(!radix_tree_exceptional_entry(entry)))
+		return -EIO;
 
+	spin_lock_irq(&mapping->tree_lock);
+	entry2 = get_unlocked_mapping_entry(mapping, index, &slot);
+	/* Entry got punched out / reallocated? */
+	if (!entry2 || !radix_tree_exceptional_entry(entry2))
+		goto put_unlock;
+	/*
+	 * Entry got reallocated elsewhere? No need to writeback. We have to
+	 * compare sectors as we must not bail out due to difference in lockbit
+	 * or entry type.
+	 */
+	if (RADIX_DAX_SECTOR(entry2) != RADIX_DAX_SECTOR(entry))
+		goto put_unlock;
+	type = RADIX_DAX_TYPE(entry2);
 	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
 		ret = -EIO;
-		goto unlock;
+		goto put_unlock;
 	}
 
+	/* Another fsync thread may have already written back this entry */
+	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
+		goto put_unlock;
+	/* Lock the entry to serialize with page faults */
+	entry = lock_slot(mapping, slot);
+	/*
+	 * We can clear the tag now but we have to be careful so that concurrent
+	 * dax_writeback_one() calls for the same index cannot finish before we
+	 * actually flush the caches. This is achieved as the calls will look
+	 * at the entry only under tree_lock and once they do that they will
+	 * see the entry locked and wait for it to unlock.
+	 */
+	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
+	spin_unlock_irq(&mapping->tree_lock);
+
 	dax.sector = RADIX_DAX_SECTOR(entry);
 	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
-	spin_unlock_irq(&mapping->tree_lock);
 
 	/*
 	 * We cannot hold tree_lock while calling dax_map_atomic() because it
 	 * eventually calls cond_resched().
 	 */
 	ret = dax_map_atomic(bdev, &dax);
-	if (ret < 0)
+	if (ret < 0) {
+		put_locked_mapping_entry(mapping, index, entry);
 		return ret;
+	}
 
 	if (WARN_ON_ONCE(ret < dax.size)) {
 		ret = -EIO;
@@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
 	}
 
 	wb_cache_pmem(dax.addr, dax.size);
-
-	spin_lock_irq(&mapping->tree_lock);
-	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
-	spin_unlock_irq(&mapping->tree_lock);
- unmap:
+unmap:
 	dax_unmap_atomic(bdev, &dax);
+	put_locked_mapping_entry(mapping, index, entry);
 	return ret;
 
- unlock:
+put_unlock:
+	put_unlocked_mapping_entry(mapping, index, entry2);
 	spin_unlock_irq(&mapping->tree_lock);
 	return ret;
 }
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08   ` Jan Kara
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Jan Kara, linux-nvdimm, linux-fsdevel, Kirill A. Shutemov

Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock. When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consisten updates of PTE and radix
tree (standard faults use page lock to ensure this consistency). So move
update of PTE bit into dax_pfn_mkwrite().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c    | 22 ++++++++++++++++------
 mm/memory.c |  2 +-
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index c6cadf8413a3..a2d3781c9f4e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1163,17 +1163,27 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
-	void *entry;
+	void *entry, **slot;
 	pgoff_t index = vmf->pgoff;
 
 	spin_lock_irq(&mapping->tree_lock);
-	entry = get_unlocked_mapping_entry(mapping, index, NULL);
-	if (!entry || !radix_tree_exceptional_entry(entry))
-		goto out;
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	if (!entry || !radix_tree_exceptional_entry(entry)) {
+		if (entry)
+			put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		return VM_FAULT_NOPAGE;
+	}
 	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
-	put_unlocked_mapping_entry(mapping, index, entry);
-out:
+	entry = lock_slot(mapping, slot);
 	spin_unlock_irq(&mapping->tree_lock);
+	/*
+	 * If we race with somebody updating the PTE and finish_mkwrite_fault()
+	 * fails, we don't care. We need to return VM_FAULT_NOPAGE and retry
+	 * the fault in either case.
+	 */
+	finish_mkwrite_fault(vmf);
+	put_locked_mapping_entry(mapping, index, entry);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
diff --git a/mm/memory.c b/mm/memory.c
index e7a4a30a5e88..5fa3d0c5196e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2310,7 +2310,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		vmf->flags |= FAULT_FLAG_MKWRITE;
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
-		if (ret & VM_FAULT_ERROR)
+		if (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))
 			return ret;
 		return finish_mkwrite_fault(vmf);
 	}
-- 
2.6.6

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock. When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consisten updates of PTE and radix
tree (standard faults use page lock to ensure this consistency). So move
update of PTE bit into dax_pfn_mkwrite().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c    | 22 ++++++++++++++++------
 mm/memory.c |  2 +-
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index c6cadf8413a3..a2d3781c9f4e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1163,17 +1163,27 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
-	void *entry;
+	void *entry, **slot;
 	pgoff_t index = vmf->pgoff;
 
 	spin_lock_irq(&mapping->tree_lock);
-	entry = get_unlocked_mapping_entry(mapping, index, NULL);
-	if (!entry || !radix_tree_exceptional_entry(entry))
-		goto out;
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	if (!entry || !radix_tree_exceptional_entry(entry)) {
+		if (entry)
+			put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		return VM_FAULT_NOPAGE;
+	}
 	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
-	put_unlocked_mapping_entry(mapping, index, entry);
-out:
+	entry = lock_slot(mapping, slot);
 	spin_unlock_irq(&mapping->tree_lock);
+	/*
+	 * If we race with somebody updating the PTE and finish_mkwrite_fault()
+	 * fails, we don't care. We need to return VM_FAULT_NOPAGE and retry
+	 * the fault in either case.
+	 */
+	finish_mkwrite_fault(vmf);
+	put_locked_mapping_entry(mapping, index, entry);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
diff --git a/mm/memory.c b/mm/memory.c
index e7a4a30a5e88..5fa3d0c5196e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2310,7 +2310,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		vmf->flags |= FAULT_FLAG_MKWRITE;
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
-		if (ret & VM_FAULT_ERROR)
+		if (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))
 			return ret;
 		return finish_mkwrite_fault(vmf);
 	}
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
has released corresponding radix tree entry lock. When we want to
writeprotect PTE on cache flush, we need PTE modification to happen
under radix tree entry lock to ensure consisten updates of PTE and radix
tree (standard faults use page lock to ensure this consistency). So move
update of PTE bit into dax_pfn_mkwrite().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c    | 22 ++++++++++++++++------
 mm/memory.c |  2 +-
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index c6cadf8413a3..a2d3781c9f4e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1163,17 +1163,27 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
-	void *entry;
+	void *entry, **slot;
 	pgoff_t index = vmf->pgoff;
 
 	spin_lock_irq(&mapping->tree_lock);
-	entry = get_unlocked_mapping_entry(mapping, index, NULL);
-	if (!entry || !radix_tree_exceptional_entry(entry))
-		goto out;
+	entry = get_unlocked_mapping_entry(mapping, index, &slot);
+	if (!entry || !radix_tree_exceptional_entry(entry)) {
+		if (entry)
+			put_unlocked_mapping_entry(mapping, index, entry);
+		spin_unlock_irq(&mapping->tree_lock);
+		return VM_FAULT_NOPAGE;
+	}
 	radix_tree_tag_set(&mapping->page_tree, index, PAGECACHE_TAG_DIRTY);
-	put_unlocked_mapping_entry(mapping, index, entry);
-out:
+	entry = lock_slot(mapping, slot);
 	spin_unlock_irq(&mapping->tree_lock);
+	/*
+	 * If we race with somebody updating the PTE and finish_mkwrite_fault()
+	 * fails, we don't care. We need to return VM_FAULT_NOPAGE and retry
+	 * the fault in either case.
+	 */
+	finish_mkwrite_fault(vmf);
+	put_locked_mapping_entry(mapping, index, entry);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
diff --git a/mm/memory.c b/mm/memory.c
index e7a4a30a5e88..5fa3d0c5196e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2310,7 +2310,7 @@ static int wp_pfn_shared(struct vm_fault *vmf)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 		vmf->flags |= FAULT_FLAG_MKWRITE;
 		ret = vma->vm_ops->pfn_mkwrite(vma, vmf);
-		if (ret & VM_FAULT_ERROR)
+		if (ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))
 			return ret;
 		return finish_mkwrite_fault(vmf);
 	}
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 20/20] dax: Clear dirty entry tags on cache flush
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
@ 2016-09-27 16:08   ` Jan Kara
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently we never clear dirty tags in DAX mappings and thus address
ranges to flush accumulate. Now that we have locking of radix tree
entries, we have all the locking necessary to reliably clear the radix
tree dirty tag when flushing caches for corresponding address range.
Similarly to page_mkclean() we also have to write-protect pages to get a
page fault when the page is next written to so that we can mark the
entry dirty again.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index a2d3781c9f4e..233f548d298e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -31,6 +31,7 @@
 #include <linux/vmstat.h>
 #include <linux/pfn_t.h>
 #include <linux/sizes.h>
+#include <linux/mmu_notifier.h>
 #include <linux/iomap.h>
 #include "internal.h"
 
@@ -668,6 +669,59 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 	return new_entry;
 }
 
+static inline unsigned long
+pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma)
+{
+	unsigned long address;
+
+	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	return address;
+}
+
+/* Walk all mappings of a given index of a file and writeprotect them */
+static void dax_mapping_entry_mkclean(struct address_space *mapping,
+				      pgoff_t index, unsigned long pfn)
+{
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	pte_t pte;
+	spinlock_t *ptl;
+	bool changed;
+
+	i_mmap_lock_read(mapping);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, index, index) {
+		unsigned long address;
+
+		cond_resched();
+
+		if (!(vma->vm_flags & VM_SHARED))
+			continue;
+
+		address = pgoff_address(index, vma);
+		changed = false;
+		if (follow_pte(vma->vm_mm, address, &ptep, &ptl))
+			continue;
+		if (pfn != pte_pfn(*ptep))
+			goto unlock;
+		if (!pte_dirty(*ptep) && !pte_write(*ptep))
+			goto unlock;
+
+		flush_cache_page(vma, address, pfn);
+		pte = ptep_clear_flush(vma, address, ptep);
+		pte = pte_wrprotect(pte);
+		pte = pte_mkclean(pte);
+		set_pte_at(vma->vm_mm, address, ptep, pte);
+		changed = true;
+unlock:
+		pte_unmap_unlock(ptep, ptl);
+
+		if (changed)
+			mmu_notifier_invalidate_page(vma->vm_mm, address);
+	}
+	i_mmap_unlock_read(mapping);
+}
+
 static int dax_writeback_one(struct block_device *bdev,
 		struct address_space *mapping, pgoff_t index, void *entry)
 {
@@ -735,7 +789,17 @@ static int dax_writeback_one(struct block_device *bdev,
 		goto unmap;
 	}
 
+	dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
 	wb_cache_pmem(dax.addr, dax.size);
+	/*
+	 * After we have flushed the cache, we can clear the dirty tag. There
+	 * cannot be new dirty data in the pfn after the flush has completed as
+	 * the pfn mappings are writeprotected and fault waits for mapping
+	 * entry lock.
+	 */
+	spin_lock_irq(&mapping->tree_lock);
+	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
+	spin_unlock_irq(&mapping->tree_lock);
 unmap:
 	dax_unmap_atomic(bdev, &dax);
 	put_locked_mapping_entry(mapping, index, entry);
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 20/20] dax: Clear dirty entry tags on cache flush
@ 2016-09-27 16:08   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-09-27 16:08 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-nvdimm, Dan Williams, Ross Zwisler,
	Kirill A. Shutemov, Jan Kara

Currently we never clear dirty tags in DAX mappings and thus address
ranges to flush accumulate. Now that we have locking of radix tree
entries, we have all the locking necessary to reliably clear the radix
tree dirty tag when flushing caches for corresponding address range.
Similarly to page_mkclean() we also have to write-protect pages to get a
page fault when the page is next written to so that we can mark the
entry dirty again.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/dax.c b/fs/dax.c
index a2d3781c9f4e..233f548d298e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -31,6 +31,7 @@
 #include <linux/vmstat.h>
 #include <linux/pfn_t.h>
 #include <linux/sizes.h>
+#include <linux/mmu_notifier.h>
 #include <linux/iomap.h>
 #include "internal.h"
 
@@ -668,6 +669,59 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 	return new_entry;
 }
 
+static inline unsigned long
+pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma)
+{
+	unsigned long address;
+
+	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+	return address;
+}
+
+/* Walk all mappings of a given index of a file and writeprotect them */
+static void dax_mapping_entry_mkclean(struct address_space *mapping,
+				      pgoff_t index, unsigned long pfn)
+{
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	pte_t pte;
+	spinlock_t *ptl;
+	bool changed;
+
+	i_mmap_lock_read(mapping);
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, index, index) {
+		unsigned long address;
+
+		cond_resched();
+
+		if (!(vma->vm_flags & VM_SHARED))
+			continue;
+
+		address = pgoff_address(index, vma);
+		changed = false;
+		if (follow_pte(vma->vm_mm, address, &ptep, &ptl))
+			continue;
+		if (pfn != pte_pfn(*ptep))
+			goto unlock;
+		if (!pte_dirty(*ptep) && !pte_write(*ptep))
+			goto unlock;
+
+		flush_cache_page(vma, address, pfn);
+		pte = ptep_clear_flush(vma, address, ptep);
+		pte = pte_wrprotect(pte);
+		pte = pte_mkclean(pte);
+		set_pte_at(vma->vm_mm, address, ptep, pte);
+		changed = true;
+unlock:
+		pte_unmap_unlock(ptep, ptl);
+
+		if (changed)
+			mmu_notifier_invalidate_page(vma->vm_mm, address);
+	}
+	i_mmap_unlock_read(mapping);
+}
+
 static int dax_writeback_one(struct block_device *bdev,
 		struct address_space *mapping, pgoff_t index, void *entry)
 {
@@ -735,7 +789,17 @@ static int dax_writeback_one(struct block_device *bdev,
 		goto unmap;
 	}
 
+	dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(dax.pfn));
 	wb_cache_pmem(dax.addr, dax.size);
+	/*
+	 * After we have flushed the cache, we can clear the dirty tag. There
+	 * cannot be new dirty data in the pfn after the flush has completed as
+	 * the pfn mappings are writeprotected and fault waits for mapping
+	 * entry lock.
+	 */
+	spin_lock_irq(&mapping->tree_lock);
+	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
+	spin_unlock_irq(&mapping->tree_lock);
 unmap:
 	dax_unmap_atomic(bdev, &dax);
 	put_locked_mapping_entry(mapping, index, entry);
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH 01/20] mm: Change type of vmf->virtual_address
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-09-30  9:07       ` Christoph Hellwig
  2016-10-14 18:02     ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-09-30  9:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

Looks fine,

Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 01/20] mm: Change type of vmf->virtual_address
@ 2016-09-30  9:07       ` Christoph Hellwig
  0 siblings, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-09-30  9:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 02/20] mm: Join struct fault_env and vm_fault
  2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
@ 2016-09-30  9:10       ` Christoph Hellwig
  0 siblings, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-09-30  9:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:06PM +0200, Jan Kara wrote:
> Currently we have two different structures for passing fault information
> around - struct vm_fault and struct fault_env. DAX will need more
> information in struct vm_fault to handle its faults so the content of
> that structure would become event closer to fault_env. Furthermore it
> would need to generate struct fault_env to be able to call some of the
> generic functions. So at this point I don't think there's much use in
> keeping these two structures separate. Just embed into struct vm_fault
> all that is needed to use it for both purposes.

Looks sensible, and I wonder why it's not been like that from
the start.  But given that you touched all users of the virtual_address
member earlier:  any reason not to move everyone to the unmasked variant
there and avoid having to pass the address twice?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 02/20] mm: Join struct fault_env and vm_fault
@ 2016-09-30  9:10       ` Christoph Hellwig
  0 siblings, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-09-30  9:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:06PM +0200, Jan Kara wrote:
> Currently we have two different structures for passing fault information
> around - struct vm_fault and struct fault_env. DAX will need more
> information in struct vm_fault to handle its faults so the content of
> that structure would become event closer to fault_env. Furthermore it
> would need to generate struct fault_env to be able to call some of the
> generic functions. So at this point I don't think there's much use in
> keeping these two structures separate. Just embed into struct vm_fault
> all that is needed to use it for both purposes.

Looks sensible, and I wonder why it's not been like that from
the start.  But given that you touched all users of the virtual_address
member earlier:  any reason not to move everyone to the unmasked variant
there and avoid having to pass the address twice?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
                   ` (19 preceding siblings ...)
  2016-09-27 16:08   ` Jan Kara
@ 2016-09-30  9:14 ` Christoph Hellwig
  2016-10-03  7:59   ` Jan Kara
  20 siblings, 1 reply; 130+ messages in thread
From: Christoph Hellwig @ 2016-09-30  9:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:04PM +0200, Jan Kara wrote:
> Hello,
> 
> this is a third revision of my patches to clear dirty bits from radix tree of
> DAX inodes when caches for corresponding pfns have been flushed. This patch set
> is significantly larger than the previous version because I'm changing how
> ->fault, ->page_mkwrite, and ->pfn_mkwrite handlers may choose to handle the
> fault

Btw, is there ny good reason to keep ->fault, ->pmd_fault, page->mkwrite
and pfn_mkwrite separate these days?  All of them now take a struct
vm_fault, and the differences aren't exactly obvious for callers and
users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 02/20] mm: Join struct fault_env and vm_fault
  2016-09-30  9:10       ` Christoph Hellwig
@ 2016-10-03  7:43         ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-03  7:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Fri 30-09-16 02:10:14, Christoph Hellwig wrote:
> On Tue, Sep 27, 2016 at 06:08:06PM +0200, Jan Kara wrote:
> > Currently we have two different structures for passing fault information
> > around - struct vm_fault and struct fault_env. DAX will need more
> > information in struct vm_fault to handle its faults so the content of
> > that structure would become event closer to fault_env. Furthermore it
> > would need to generate struct fault_env to be able to call some of the
> > generic functions. So at this point I don't think there's much use in
> > keeping these two structures separate. Just embed into struct vm_fault
> > all that is needed to use it for both purposes.
> 
> Looks sensible, and I wonder why it's not been like that from
> the start.  But given that you touched all users of the virtual_address
> member earlier:  any reason not to move everyone to the unmasked variant
> there and avoid having to pass the address twice?

Hum, right, probably makes sense. I'll do that for the next version.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 02/20] mm: Join struct fault_env and vm_fault
@ 2016-10-03  7:43         ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-03  7:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Fri 30-09-16 02:10:14, Christoph Hellwig wrote:
> On Tue, Sep 27, 2016 at 06:08:06PM +0200, Jan Kara wrote:
> > Currently we have two different structures for passing fault information
> > around - struct vm_fault and struct fault_env. DAX will need more
> > information in struct vm_fault to handle its faults so the content of
> > that structure would become event closer to fault_env. Furthermore it
> > would need to generate struct fault_env to be able to call some of the
> > generic functions. So at this point I don't think there's much use in
> > keeping these two structures separate. Just embed into struct vm_fault
> > all that is needed to use it for both purposes.
> 
> Looks sensible, and I wonder why it's not been like that from
> the start.  But given that you touched all users of the virtual_address
> member earlier:  any reason not to move everyone to the unmasked variant
> there and avoid having to pass the address twice?

Hum, right, probably makes sense. I'll do that for the next version.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-09-30  9:14 ` [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Christoph Hellwig
@ 2016-10-03  7:59   ` Jan Kara
  2016-10-03  8:03     ` Christoph Hellwig
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-10-03  7:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Fri 30-09-16 02:14:18, Christoph Hellwig wrote:
> On Tue, Sep 27, 2016 at 06:08:04PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > this is a third revision of my patches to clear dirty bits from radix tree of
> > DAX inodes when caches for corresponding pfns have been flushed. This patch set
> > is significantly larger than the previous version because I'm changing how
> > ->fault, ->page_mkwrite, and ->pfn_mkwrite handlers may choose to handle the
> > fault
> 
> Btw, is there ny good reason to keep ->fault, ->pmd_fault, page->mkwrite
> and pfn_mkwrite separate these days?  All of them now take a struct
> vm_fault, and the differences aren't exactly obvious for callers and
> users.

IMO ->fault and ->pmd_fault can be merged, ->page_mkwrite and ->pfn_mkwrite
can be merged. There were even patches flying around for that. I want to do
that but it's not a priority now as the patch set it already large enough.

I'm not sure whether merging ->fault and ->page_mkwrite would be really
helpful and it would certainly require some non-trivial changes in the
fault path. For example currently a write fault of a file mapping will
result in first ->fault being called which handles the read part of the
fault and then ->page_mkwrite is called to handle write-enabling of the
PTE. When the handlers would be merged, calling one handler twice would be
really strange.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-03  7:59   ` Jan Kara
@ 2016-10-03  8:03     ` Christoph Hellwig
  2016-10-03  8:15         ` Jan Kara
  0 siblings, 1 reply; 130+ messages in thread
From: Christoph Hellwig @ 2016-10-03  8:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Ross Zwisler, Kirill A. Shutemov

On Mon, Oct 03, 2016 at 09:59:02AM +0200, Jan Kara wrote:
> IMO ->fault and ->pmd_fault can be merged, ->page_mkwrite and ->pfn_mkwrite
> can be merged. There were even patches flying around for that. I want to do
> that but it's not a priority now as the patch set it already large enough.
> 
> I'm not sure whether merging ->fault and ->page_mkwrite would be really
> helpful and it would certainly require some non-trivial changes in the
> fault path. For example currently a write fault of a file mapping will
> result in first ->fault being called which handles the read part of the
> fault and then ->page_mkwrite is called to handle write-enabling of the
> PTE. When the handlers would be merged, calling one handler twice would be
> really strange.

Except for the DAX path, where we apparently need to call out to
the mkwrite handler from ->fault.  Or at least used to, with some
leftovers in XFS and not extN.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-03  8:03     ` Christoph Hellwig
@ 2016-10-03  8:15         ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-03  8:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Mon 03-10-16 01:03:37, Christoph Hellwig wrote:
> On Mon, Oct 03, 2016 at 09:59:02AM +0200, Jan Kara wrote:
> > IMO ->fault and ->pmd_fault can be merged, ->page_mkwrite and ->pfn_mkwrite
> > can be merged. There were even patches flying around for that. I want to do
> > that but it's not a priority now as the patch set it already large enough.
> > 
> > I'm not sure whether merging ->fault and ->page_mkwrite would be really
> > helpful and it would certainly require some non-trivial changes in the
> > fault path. For example currently a write fault of a file mapping will
> > result in first ->fault being called which handles the read part of the
> > fault and then ->page_mkwrite is called to handle write-enabling of the
> > PTE. When the handlers would be merged, calling one handler twice would be
> > really strange.
> 
> Except for the DAX path, where we apparently need to call out to
> the mkwrite handler from ->fault.  Or at least used to, with some
> leftovers in XFS and not extN.

Yeah, so DAX path is special because it installs its own PTE directly from
the fault handler which we don't do in any other case (only driver fault
handlers commonly do this but those generally don't care about
->page_mkwrite or file mappings for that matter).

I don't say there are no simplifications or unifications possible, but I'd
prefer to leave them for a bit later once the current churn with ongoing
work somewhat settles...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-03  8:15         ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-03  8:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Mon 03-10-16 01:03:37, Christoph Hellwig wrote:
> On Mon, Oct 03, 2016 at 09:59:02AM +0200, Jan Kara wrote:
> > IMO ->fault and ->pmd_fault can be merged, ->page_mkwrite and ->pfn_mkwrite
> > can be merged. There were even patches flying around for that. I want to do
> > that but it's not a priority now as the patch set it already large enough.
> > 
> > I'm not sure whether merging ->fault and ->page_mkwrite would be really
> > helpful and it would certainly require some non-trivial changes in the
> > fault path. For example currently a write fault of a file mapping will
> > result in first ->fault being called which handles the read part of the
> > fault and then ->page_mkwrite is called to handle write-enabling of the
> > PTE. When the handlers would be merged, calling one handler twice would be
> > really strange.
> 
> Except for the DAX path, where we apparently need to call out to
> the mkwrite handler from ->fault.  Or at least used to, with some
> leftovers in XFS and not extN.

Yeah, so DAX path is special because it installs its own PTE directly from
the fault handler which we don't do in any other case (only driver fault
handlers commonly do this but those generally don't care about
->page_mkwrite or file mappings for that matter).

I don't say there are no simplifications or unifications possible, but I'd
prefer to leave them for a bit later once the current churn with ongoing
work somewhat settles...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-03  8:15         ` Jan Kara
@ 2016-10-03  9:32           ` Christoph Hellwig
  -1 siblings, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-10-03  9:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Ross Zwisler, Kirill A. Shutemov

On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> Yeah, so DAX path is special because it installs its own PTE directly from
> the fault handler which we don't do in any other case (only driver fault
> handlers commonly do this but those generally don't care about
> ->page_mkwrite or file mappings for that matter).
> 
> I don't say there are no simplifications or unifications possible, but I'd
> prefer to leave them for a bit later once the current churn with ongoing
> work somewhat settles...

Allright, let's keep it simple for now.  Being said this series clearly
is 4.9 material, but any chance to get a respin of the invalidate_pages
series as that might still be 4.8 material?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-03  9:32           ` Christoph Hellwig
  0 siblings, 0 replies; 130+ messages in thread
From: Christoph Hellwig @ 2016-10-03  9:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Ross Zwisler, Kirill A. Shutemov

On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> Yeah, so DAX path is special because it installs its own PTE directly from
> the fault handler which we don't do in any other case (only driver fault
> handlers commonly do this but those generally don't care about
> ->page_mkwrite or file mappings for that matter).
> 
> I don't say there are no simplifications or unifications possible, but I'd
> prefer to leave them for a bit later once the current churn with ongoing
> work somewhat settles...

Allright, let's keep it simple for now.  Being said this series clearly
is 4.9 material, but any chance to get a respin of the invalidate_pages
series as that might still be 4.8 material?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-03  9:32           ` Christoph Hellwig
  (?)
@ 2016-10-03 11:13           ` Jan Kara
       [not found]             ` <20161003111358.GQ6457-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
  -1 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-10-03 11:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > Yeah, so DAX path is special because it installs its own PTE directly from
> > the fault handler which we don't do in any other case (only driver fault
> > handlers commonly do this but those generally don't care about
> > ->page_mkwrite or file mappings for that matter).
> > 
> > I don't say there are no simplifications or unifications possible, but I'd
> > prefer to leave them for a bit later once the current churn with ongoing
> > work somewhat settles...
> 
> Allright, let's keep it simple for now.  Being said this series clearly
> is 4.9 material, but any chance to get a respin of the invalidate_pages

Agreed (actually 4.10).

> series as that might still be 4.8 material?

The problem with invalidate_pages series is that it depends on the ability
to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
series). Otherwise radix tree entries that get once dirty can never be safely
evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
callers get confused (I've tried that few weeks ago).

If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
dirty radix tree entries from invalidate_inode_pages2_range()), things
would run fine, just fsync() may miss to flush caches for some pages. I'm
not sure that's much better than current status quo though. Thoughts?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-03 11:13           ` Jan Kara
@ 2016-10-13 20:34                 ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-13 20:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, Christoph Hellwig,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

On Mon, Oct 03, 2016 at 01:13:58PM +0200, Jan Kara wrote:
> On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> > On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > > Yeah, so DAX path is special because it installs its own PTE directly from
> > > the fault handler which we don't do in any other case (only driver fault
> > > handlers commonly do this but those generally don't care about
> > > ->page_mkwrite or file mappings for that matter).
> > > 
> > > I don't say there are no simplifications or unifications possible, but I'd
> > > prefer to leave them for a bit later once the current churn with ongoing
> > > work somewhat settles...
> > 
> > Allright, let's keep it simple for now.  Being said this series clearly
> > is 4.9 material, but any chance to get a respin of the invalidate_pages
> 
> Agreed (actually 4.10).
> 
> > series as that might still be 4.8 material?
> 
> The problem with invalidate_pages series is that it depends on the ability
> to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
> series). Otherwise radix tree entries that get once dirty can never be safely
> evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
> callers get confused (I've tried that few weeks ago).
> 
> If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
> dirty radix tree entries from invalidate_inode_pages2_range()), things
> would run fine, just fsync() may miss to flush caches for some pages. I'm
> not sure that's much better than current status quo though. Thoughts?

I'm not sure if I'm understanding this correctly, but if you're saying that we
might end up in a case where fsync()/msync() would fail to properly flush
pages that are/should be dirty, I think this is a no-go.  That could result in
data corruption if a user calls fsync(), thinks they've achieved a
synchronization point (updating other metadata or whatever), then via power
loss they lose data they had flushed via that previous fsync() because it was
still in the CPU cache and never really made it out to media.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-13 20:34                 ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-13 20:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Ross Zwisler, Kirill A. Shutemov

On Mon, Oct 03, 2016 at 01:13:58PM +0200, Jan Kara wrote:
> On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> > On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > > Yeah, so DAX path is special because it installs its own PTE directly from
> > > the fault handler which we don't do in any other case (only driver fault
> > > handlers commonly do this but those generally don't care about
> > > ->page_mkwrite or file mappings for that matter).
> > > 
> > > I don't say there are no simplifications or unifications possible, but I'd
> > > prefer to leave them for a bit later once the current churn with ongoing
> > > work somewhat settles...
> > 
> > Allright, let's keep it simple for now.  Being said this series clearly
> > is 4.9 material, but any chance to get a respin of the invalidate_pages
> 
> Agreed (actually 4.10).
> 
> > series as that might still be 4.8 material?
> 
> The problem with invalidate_pages series is that it depends on the ability
> to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
> series). Otherwise radix tree entries that get once dirty can never be safely
> evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
> callers get confused (I've tried that few weeks ago).
> 
> If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
> dirty radix tree entries from invalidate_inode_pages2_range()), things
> would run fine, just fsync() may miss to flush caches for some pages. I'm
> not sure that's much better than current status quo though. Thoughts?

I'm not sure if I'm understanding this correctly, but if you're saying that we
might end up in a case where fsync()/msync() would fail to properly flush
pages that are/should be dirty, I think this is a no-go.  That could result in
data corruption if a user calls fsync(), thinks they've achieved a
synchronization point (updating other metadata or whatever), then via power
loss they lose data they had flushed via that previous fsync() because it was
still in the CPU cache and never really made it out to media.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 01/20] mm: Change type of vmf->virtual_address
  2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
@ 2016-10-14 18:02     ` Ross Zwisler
  2016-10-14 18:02     ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 18:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:05PM +0200, Jan Kara wrote:
> Every single user of vmf->virtual_address typed that entry to unsigned
> long before doing anything with it. So just change the type of that
> entry to unsigned long immediately.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 01/20] mm: Change type of vmf->virtual_address
@ 2016-10-14 18:02     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 18:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:05PM +0200, Jan Kara wrote:
> Every single user of vmf->virtual_address typed that entry to unsigned
> long before doing anything with it. So just change the type of that
> entry to unsigned long immediately.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-14 18:42     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 18:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:07PM +0200, Jan Kara wrote:
> struct vm_fault has already pgoff entry. Use it instead of passing pgoff
> as a separate argument and then assigning it later.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 35 ++++++++++++++++++-----------------
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 447a1ef4a9e3..4c2ec9a9d8af 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
>  	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
>  		struct vm_fault vmf2 = {
>  			.page = NULL,
> -			.pgoff = linear_page_index(vma, vmf->address),
> +			.pgoff = vmf->pgoff,

I think there is one path where vmf->pgoff isn't set here.  Here's the path:

__collapse_huge_page_swapin()
  do_swap_page()
    do_wp_page()
      wp_pfn_shared()

We then use an uninitialized vmf->pgoff to set up vmf2->pgoff, which we pass
to vm_ops->pfn_mkwrite().

I think all we need to do to fix this is initialize .pgoff in
__collapse_huge_page_swapin().  With this one change:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
@ 2016-10-14 18:42     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 18:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:07PM +0200, Jan Kara wrote:
> struct vm_fault has already pgoff entry. Use it instead of passing pgoff
> as a separate argument and then assigning it later.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 35 ++++++++++++++++++-----------------
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 447a1ef4a9e3..4c2ec9a9d8af 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
>  	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
>  		struct vm_fault vmf2 = {
>  			.page = NULL,
> -			.pgoff = linear_page_index(vma, vmf->address),
> +			.pgoff = vmf->pgoff,

I think there is one path where vmf->pgoff isn't set here.  Here's the path:

__collapse_huge_page_swapin()
  do_swap_page()
    do_wp_page()
      wp_pfn_shared()

We then use an uninitialized vmf->pgoff to set up vmf2->pgoff, which we pass
to vm_ops->pfn_mkwrite().

I think all we need to do to fix this is initialize .pgoff in
__collapse_huge_page_swapin().  With this one change:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 04/20] mm: Use passed vm_fault structure in __do_fault()
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-14 19:05     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 19:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:08PM +0200, Jan Kara wrote:
> Instead of creating another vm_fault structure, use the one passed to
> __do_fault() for passing arguments into fault handler.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 04/20] mm: Use passed vm_fault structure in __do_fault()
@ 2016-10-14 19:05     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 19:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:08PM +0200, Jan Kara wrote:
> Instead of creating another vm_fault structure, use the one passed to
> __do_fault() for passing arguments into fault handler.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 05/20] mm: Trim __do_fault() arguments
  2016-09-27 16:08   ` Jan Kara
  (?)
@ 2016-10-14 20:31   ` Ross Zwisler
  2016-10-17  9:04       ` Jan Kara
  -1 siblings, 1 reply; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 20:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:09PM +0200, Jan Kara wrote:
> Use vm_fault structure to pass cow_page, page, and entry in and out of
> the function. That reduces number of __do_fault() arguments from 4 to 1.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

In looking at this I realized that vmf->entry is actually unused, as is the
entry we used to return back via __do_fault().  I guess they must have been in
there because at one point they were needed for dax_unlock_mapping_entry()?
Anyway, looking ahead I see patch 10 removes vmf->entry altogether. :)

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared()
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-14 21:04     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 21:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:10PM +0200, Jan Kara wrote:
> Instead of creating another vm_fault structure, use the one passed to
> wp_pfn_shared() for passing arguments into pfn_mkwrite handler.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared()
@ 2016-10-14 21:04     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-14 21:04 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:10PM +0200, Jan Kara wrote:
> Instead of creating another vm_fault structure, use the one passed to
> wp_pfn_shared() for passing arguments into pfn_mkwrite handler.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-13 20:34                 ` Ross Zwisler
@ 2016-10-17  8:47                   ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-17  8:47 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, Christoph Hellwig, linux-mm, linux-fsdevel,
	linux-nvdimm, Dan Williams, Kirill A. Shutemov

On Thu 13-10-16 14:34:34, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 01:13:58PM +0200, Jan Kara wrote:
> > On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> > > On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > > > Yeah, so DAX path is special because it installs its own PTE directly from
> > > > the fault handler which we don't do in any other case (only driver fault
> > > > handlers commonly do this but those generally don't care about
> > > > ->page_mkwrite or file mappings for that matter).
> > > > 
> > > > I don't say there are no simplifications or unifications possible, but I'd
> > > > prefer to leave them for a bit later once the current churn with ongoing
> > > > work somewhat settles...
> > > 
> > > Allright, let's keep it simple for now.  Being said this series clearly
> > > is 4.9 material, but any chance to get a respin of the invalidate_pages
> > 
> > Agreed (actually 4.10).
> > 
> > > series as that might still be 4.8 material?
> > 
> > The problem with invalidate_pages series is that it depends on the ability
> > to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
> > series). Otherwise radix tree entries that get once dirty can never be safely
> > evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
> > callers get confused (I've tried that few weeks ago).
> > 
> > If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
> > dirty radix tree entries from invalidate_inode_pages2_range()), things
> > would run fine, just fsync() may miss to flush caches for some pages. I'm
> > not sure that's much better than current status quo though. Thoughts?
> 
> I'm not sure if I'm understanding this correctly, but if you're saying
> that we might end up in a case where fsync()/msync() would fail to
> properly flush pages that are/should be dirty, I think this is a no-go.
> That could result in data corruption if a user calls fsync(), thinks
> they've achieved a synchronization point (updating other metadata or
> whatever), then via power loss they lose data they had flushed via that
> previous fsync() because it was still in the CPU cache and never really
> made it out to media.

I know and actually current code is buggy in that way as well and this
patch set is fixing it. But I was arguing that only applying part of the
fixes so that the main problem remains unfixed would not be very beneficial
anyway.

This week I plan to rebase both series on top of rc1 + your THP patches so
that we can move on with merging the stuff.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-17  8:47                   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-17  8:47 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, Christoph Hellwig, linux-mm, linux-fsdevel,
	linux-nvdimm, Dan Williams, Kirill A. Shutemov

On Thu 13-10-16 14:34:34, Ross Zwisler wrote:
> On Mon, Oct 03, 2016 at 01:13:58PM +0200, Jan Kara wrote:
> > On Mon 03-10-16 02:32:48, Christoph Hellwig wrote:
> > > On Mon, Oct 03, 2016 at 10:15:49AM +0200, Jan Kara wrote:
> > > > Yeah, so DAX path is special because it installs its own PTE directly from
> > > > the fault handler which we don't do in any other case (only driver fault
> > > > handlers commonly do this but those generally don't care about
> > > > ->page_mkwrite or file mappings for that matter).
> > > > 
> > > > I don't say there are no simplifications or unifications possible, but I'd
> > > > prefer to leave them for a bit later once the current churn with ongoing
> > > > work somewhat settles...
> > > 
> > > Allright, let's keep it simple for now.  Being said this series clearly
> > > is 4.9 material, but any chance to get a respin of the invalidate_pages
> > 
> > Agreed (actually 4.10).
> > 
> > > series as that might still be 4.8 material?
> > 
> > The problem with invalidate_pages series is that it depends on the ability
> > to clear the dirty bits in the radix tree of DAX mappings (i.e. the first
> > series). Otherwise radix tree entries that get once dirty can never be safely
> > evicted, invalidate_inode_pages2_range() will keep returning EBUSY and
> > callers get confused (I've tried that few weeks ago).
> > 
> > If I dropped patch 5/6 for 4.9 merge (i.e., we would still happily discard
> > dirty radix tree entries from invalidate_inode_pages2_range()), things
> > would run fine, just fsync() may miss to flush caches for some pages. I'm
> > not sure that's much better than current status quo though. Thoughts?
> 
> I'm not sure if I'm understanding this correctly, but if you're saying
> that we might end up in a case where fsync()/msync() would fail to
> properly flush pages that are/should be dirty, I think this is a no-go.
> That could result in data corruption if a user calls fsync(), thinks
> they've achieved a synchronization point (updating other metadata or
> whatever), then via power loss they lose data they had flushed via that
> previous fsync() because it was still in the CPU cache and never really
> made it out to media.

I know and actually current code is buggy in that way as well and this
patch set is fixing it. But I was arguing that only applying part of the
fixes so that the main problem remains unfixed would not be very beneficial
anyway.

This week I plan to rebase both series on top of rc1 + your THP patches so
that we can move on with merging the stuff.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately
  2016-10-14 18:42     ` Ross Zwisler
  (?)
@ 2016-10-17  9:01     ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-17  9:01 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Fri 14-10-16 12:42:51, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:07PM +0200, Jan Kara wrote:
> > struct vm_fault has already pgoff entry. Use it instead of passing pgoff
> > as a separate argument and then assigning it later.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/memory.c | 35 ++++++++++++++++++-----------------
> >  1 file changed, 18 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 447a1ef4a9e3..4c2ec9a9d8af 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2275,7 +2275,7 @@ static int wp_pfn_shared(struct vm_fault *vmf, pte_t orig_pte)
> >  	if (vma->vm_ops && vma->vm_ops->pfn_mkwrite) {
> >  		struct vm_fault vmf2 = {
> >  			.page = NULL,
> > -			.pgoff = linear_page_index(vma, vmf->address),
> > +			.pgoff = vmf->pgoff,
> 
> I think there is one path where vmf->pgoff isn't set here.  Here's the path:
> 
> __collapse_huge_page_swapin()
>   do_swap_page()
>     do_wp_page()
>       wp_pfn_shared()
> 
> We then use an uninitialized vmf->pgoff to set up vmf2->pgoff, which we pass
> to vm_ops->pfn_mkwrite().
> 
> I think all we need to do to fix this is initialize .pgoff in
> __collapse_huge_page_swapin().  With this one change:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks for catching this. I don't think that bug had any visible effect
since for anonymous pages (which is what do_swap_page() handles) we won't
enter wp_pfn_shared() but it is definitely good to fix this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 05/20] mm: Trim __do_fault() arguments
  2016-10-14 20:31   ` Ross Zwisler
@ 2016-10-17  9:04       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-17  9:04 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Fri 14-10-16 14:31:47, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:09PM +0200, Jan Kara wrote:
> > Use vm_fault structure to pass cow_page, page, and entry in and out of
> > the function. That reduces number of __do_fault() arguments from 4 to 1.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> In looking at this I realized that vmf->entry is actually unused, as is the
> entry we used to return back via __do_fault().  I guess they must have been in
> there because at one point they were needed for dax_unlock_mapping_entry()?
> Anyway, looking ahead I see patch 10 removes vmf->entry altogether. :)

Yes :).

> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 05/20] mm: Trim __do_fault() arguments
@ 2016-10-17  9:04       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-17  9:04 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Fri 14-10-16 14:31:47, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:09PM +0200, Jan Kara wrote:
> > Use vm_fault structure to pass cow_page, page, and entry in and out of
> > the function. That reduces number of __do_fault() arguments from 4 to 1.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> In looking at this I realized that vmf->entry is actually unused, as is the
> entry we used to return back via __do_fault().  I guess they must have been in
> there because at one point they were needed for dax_unlock_mapping_entry()?
> Anyway, looking ahead I see patch 10 removes vmf->entry altogether. :)

Yes :).

> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 07/20] mm: Add orig_pte field into vm_fault
  2016-09-27 16:08 ` [PATCH 07/20] mm: Add orig_pte field into vm_fault Jan Kara
@ 2016-10-17 16:45     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 16:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:11PM +0200, Jan Kara wrote:
> Add orig_pte field to vm_fault structure to allow ->page_mkwrite
> handlers to fully handle the fault. This also allows us to save some
> passing of extra arguments around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index f88b2d3810a7..66bc77f2d1d2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -890,11 +890,12 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>  	vmf.pte = pte_offset_map(pmd, address);
>  	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
>  			vmf.pte++, vmf.address += PAGE_SIZE) {
> -		pteval = *vmf.pte;
> +		vmf.orig_pte = *vmf.pte;
> +		pteval = vmf.orig_pte;
>  		if (!is_swap_pte(pteval))
>  			continue;

'pteval' is now only used once.  It's probably cleaner to just remove it and
use vmf.orig_pte for the is_swap_pte() check.

> @@ -3484,8 +3484,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  		 * So now it's safe to run pte_offset_map().
>  		 */
>  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> -
> -		entry = *vmf->pte;
> +		vmf->orig_pte = *vmf->pte;
>  
>  		/*
>  		 * some architectures can have larger ptes than wordsize,
> @@ -3496,6 +3495,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  		 * ptl lock held. So here a barrier will do.
>  		 */
>  		barrier();
> +		entry = vmf->orig_pte;

This set of 'entry' is now on the other side of the barrier().  I'll admit
that I don't fully grok the need for the barrier. Does it apply to only the
setting of vmf->pte and vmf->orig_pte, or does 'entry' also matter because it
too is of type pte_t, and thus could be bigger than the architecture's word
size?

My guess is that 'entry' matters, too, and should remain before the barrier()
call.  If not, can you help me understand why?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 07/20] mm: Add orig_pte field into vm_fault
@ 2016-10-17 16:45     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 16:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:11PM +0200, Jan Kara wrote:
> Add orig_pte field to vm_fault structure to allow ->page_mkwrite
> handlers to fully handle the fault. This also allows us to save some
> passing of extra arguments around.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index f88b2d3810a7..66bc77f2d1d2 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -890,11 +890,12 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>  	vmf.pte = pte_offset_map(pmd, address);
>  	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
>  			vmf.pte++, vmf.address += PAGE_SIZE) {
> -		pteval = *vmf.pte;
> +		vmf.orig_pte = *vmf.pte;
> +		pteval = vmf.orig_pte;
>  		if (!is_swap_pte(pteval))
>  			continue;

'pteval' is now only used once.  It's probably cleaner to just remove it and
use vmf.orig_pte for the is_swap_pte() check.

> @@ -3484,8 +3484,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  		 * So now it's safe to run pte_offset_map().
>  		 */
>  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> -
> -		entry = *vmf->pte;
> +		vmf->orig_pte = *vmf->pte;
>  
>  		/*
>  		 * some architectures can have larger ptes than wordsize,
> @@ -3496,6 +3495,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  		 * ptl lock held. So here a barrier will do.
>  		 */
>  		barrier();
> +		entry = vmf->orig_pte;

This set of 'entry' is now on the other side of the barrier().  I'll admit
that I don't fully grok the need for the barrier. Does it apply to only the
setting of vmf->pte and vmf->orig_pte, or does 'entry' also matter because it
too is of type pte_t, and thus could be bigger than the architecture's word
size?

My guess is that 'entry' matters, too, and should remain before the barrier()
call.  If not, can you help me understand why?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 08/20] mm: Allow full handling of COW faults in ->fault handlers
  2016-09-27 16:08   ` Jan Kara
  (?)
  (?)
@ 2016-10-17 16:50   ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 16:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:12PM +0200, Jan Kara wrote:
> To allow full handling of COW faults add memcg field to struct vm_fault
> and a return value of ->fault() handler meaning that COW fault is fully
> handled and memcg charge must not be canceled. This will allow us to
> remove knowledge about special DAX locking from the generic fault code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/20] mm: Factor out functionality to finish page faults
  2016-09-27 16:08 ` [PATCH 09/20] mm: Factor out functionality to finish page faults Jan Kara
@ 2016-10-17 17:38     ` Ross Zwisler
  2016-10-17 17:40     ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 17:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:13PM +0200, Jan Kara wrote:
> Introduce function finish_fault() as a helper function for finishing
> page faults. It is rather thin wrapper around alloc_set_pte() but since
> we'd want to call this from DAX code or filesystems, it is still useful
> to avoid some boilerplate code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/20] mm: Factor out functionality to finish page faults
@ 2016-10-17 17:38     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 17:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:13PM +0200, Jan Kara wrote:
> Introduce function finish_fault() as a helper function for finishing
> page faults. It is rather thin wrapper around alloc_set_pte() but since
> we'd want to call this from DAX code or filesystems, it is still useful
> to avoid some boilerplate code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/20] mm: Factor out functionality to finish page faults
  2016-09-27 16:08 ` [PATCH 09/20] mm: Factor out functionality to finish page faults Jan Kara
@ 2016-10-17 17:40     ` Ross Zwisler
  2016-10-17 17:40     ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 17:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:13PM +0200, Jan Kara wrote:
> Introduce function finish_fault() as a helper function for finishing
> page faults. It is rather thin wrapper around alloc_set_pte() but since
> we'd want to call this from DAX code or filesystems, it is still useful
> to avoid some boilerplate code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

> diff --git a/mm/memory.c b/mm/memory.c
> index 17db88a38e8a..f54cfad7fe04 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3029,6 +3029,36 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  	return 0;
>  }
>  
> +
> +/**
> + * finish_fault - finish page fault once we have prepared the page to fault
> + *
> + * @vmf: structure describing the fault
> + *
> + * This function handles all that is needed to finish a page fault once the
> + * page to fault in is prepared. It handles locking of PTEs, inserts PTE for
> + * given page, adds reverse page mapping, handles memcg charges and LRU
> + * addition. The function returns 0 on success, VM_FAULT_ code in case of
> + * error.
> + *
> + * The function expects the page to be locked.
> + */
> +int finish_fault(struct vm_fault *vmf)
> +{
> +	struct page *page;
> +	int ret;
> +
> +	/* Did we COW the page? */
> +	if (vmf->flags & FAULT_FLAG_WRITE && !(vmf->vma->vm_flags & VM_SHARED))

Oh, sorry, I did have one bit of feedback.  Maybe added parens around the flag
check for readability:

	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->vma->vm_flags & VM_SHARED))

Aside from that one nit:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/20] mm: Factor out functionality to finish page faults
@ 2016-10-17 17:40     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 17:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:13PM +0200, Jan Kara wrote:
> Introduce function finish_fault() as a helper function for finishing
> page faults. It is rather thin wrapper around alloc_set_pte() but since
> we'd want to call this from DAX code or filesystems, it is still useful
> to avoid some boilerplate code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

> diff --git a/mm/memory.c b/mm/memory.c
> index 17db88a38e8a..f54cfad7fe04 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3029,6 +3029,36 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  	return 0;
>  }
>  
> +
> +/**
> + * finish_fault - finish page fault once we have prepared the page to fault
> + *
> + * @vmf: structure describing the fault
> + *
> + * This function handles all that is needed to finish a page fault once the
> + * page to fault in is prepared. It handles locking of PTEs, inserts PTE for
> + * given page, adds reverse page mapping, handles memcg charges and LRU
> + * addition. The function returns 0 on success, VM_FAULT_ code in case of
> + * error.
> + *
> + * The function expects the page to be locked.
> + */
> +int finish_fault(struct vm_fault *vmf)
> +{
> +	struct page *page;
> +	int ret;
> +
> +	/* Did we COW the page? */
> +	if (vmf->flags & FAULT_FLAG_WRITE && !(vmf->vma->vm_flags & VM_SHARED))

Oh, sorry, I did have one bit of feedback.  Maybe added parens around the flag
check for readability:

	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->vma->vm_flags & VM_SHARED))

Aside from that one nit:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-17  8:47                   ` Jan Kara
@ 2016-10-17 18:59                       ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 18:59 UTC (permalink / raw)
  To: Jan Kara, Dave Chinner, Andrew Morton
  Cc: linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, Christoph Hellwig,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Kirill A. Shutemov

On Mon, Oct 17, 2016 at 10:47:32AM +0200, Jan Kara wrote:

> This week I plan to rebase both series on top of rc1 + your THP patches so
> that we can move on with merging the stuff.

Yea...so how are we going to coordinate merging of these series for the v4.10
merge window?  My series mostly changes DAX, but it also changes XFS, ext2 and
ext4.  I think the plan right now is to have Dave Chinner take it through his
XFS tree.

Your first series is mostly mm changes with some DAX sprinkled in, and your
second series touches dax, mm and all 3 DAX filesystems.  

What is the best way to handle all this?  Have it go through one central tree
(-MM?), even though the changes touch code that exists outside of that trees
normal domain (like the FS code)?  Have my series go through the XFS tree and
yours through -MM, and give Linus a merge resolution patch?  Something else?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-17 18:59                       ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 18:59 UTC (permalink / raw)
  To: Jan Kara, Dave Chinner, Andrew Morton
  Cc: Ross Zwisler, Christoph Hellwig, linux-mm, linux-fsdevel,
	linux-nvdimm, Dan Williams, Kirill A. Shutemov

On Mon, Oct 17, 2016 at 10:47:32AM +0200, Jan Kara wrote:

> This week I plan to rebase both series on top of rc1 + your THP patches so
> that we can move on with merging the stuff.

Yea...so how are we going to coordinate merging of these series for the v4.10
merge window?  My series mostly changes DAX, but it also changes XFS, ext2 and
ext4.  I think the plan right now is to have Dave Chinner take it through his
XFS tree.

Your first series is mostly mm changes with some DAX sprinkled in, and your
second series touches dax, mm and all 3 DAX filesystems.  

What is the best way to handle all this?  Have it go through one central tree
(-MM?), even though the changes touch code that exists outside of that trees
normal domain (like the FS code)?  Have my series go through the XFS tree and
yours through -MM, and give Linus a merge resolution patch?  Something else?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
  2016-09-27 16:08 ` [PATCH 10/20] mm: Move handling of COW faults into DAX code Jan Kara
@ 2016-10-17 19:29     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 19:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:14PM +0200, Jan Kara wrote:
> Move final handling of COW faults from generic code into DAX fault
> handler. That way generic code doesn't have to be aware of peculiarities
> of DAX locking so remove that knowledge.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c            | 22 ++++++++++++++++------
>  include/linux/dax.h |  7 -------
>  include/linux/mm.h  |  9 +--------
>  mm/memory.c         | 14 ++++----------
>  4 files changed, 21 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 0dc251ca77b8..b1c503930d1d 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -876,10 +876,15 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  			goto unlock_entry;
>  		if (!radix_tree_exceptional_entry(entry)) {
>  			vmf->page = entry;
> -			return VM_FAULT_LOCKED;
> +			if (unlikely(PageHWPoison(entry))) {
> +				put_locked_mapping_entry(mapping, vmf->pgoff,
> +							 entry);
> +				return VM_FAULT_HWPOISON;
> +			}
>  		}
> -		vmf->entry = entry;
> -		return VM_FAULT_DAX_LOCKED;
> +		error = finish_fault(vmf);
> +		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
> +		return error ? error : VM_FAULT_DONE_COW;
>  	}
>  
>  	if (!buffer_mapped(&bh)) {
> @@ -1430,10 +1435,15 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  			goto unlock_entry;
>  		if (!radix_tree_exceptional_entry(entry)) {
>  			vmf->page = entry;

In __do_fault() we explicitly clear vmf->page in the case where PageHWPoison()
is set.  I think we can get the same behavior here by moving the call that
sets vmf->page after the PageHWPoison() check.

> -			return VM_FAULT_LOCKED;
> +			if (unlikely(PageHWPoison(entry))) {
> +				put_locked_mapping_entry(mapping, vmf->pgoff,
> +							 entry);
> +				return VM_FAULT_HWPOISON;
> +			}
>  		}
> -		vmf->entry = entry;
> -		return VM_FAULT_DAX_LOCKED;

I think we're missing a call to 

	__SetPageUptodate(new_page);

before finish_fault()?  This call currently lives in do_cow_fault(), and
is part of the path that we don't skip as part of the VM_FAULT_DAX_LOCKED
logic.

Both of these comments apply equally to the iomap_dax_fault() code.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
@ 2016-10-17 19:29     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 19:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:14PM +0200, Jan Kara wrote:
> Move final handling of COW faults from generic code into DAX fault
> handler. That way generic code doesn't have to be aware of peculiarities
> of DAX locking so remove that knowledge.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c            | 22 ++++++++++++++++------
>  include/linux/dax.h |  7 -------
>  include/linux/mm.h  |  9 +--------
>  mm/memory.c         | 14 ++++----------
>  4 files changed, 21 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 0dc251ca77b8..b1c503930d1d 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -876,10 +876,15 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  			goto unlock_entry;
>  		if (!radix_tree_exceptional_entry(entry)) {
>  			vmf->page = entry;
> -			return VM_FAULT_LOCKED;
> +			if (unlikely(PageHWPoison(entry))) {
> +				put_locked_mapping_entry(mapping, vmf->pgoff,
> +							 entry);
> +				return VM_FAULT_HWPOISON;
> +			}
>  		}
> -		vmf->entry = entry;
> -		return VM_FAULT_DAX_LOCKED;
> +		error = finish_fault(vmf);
> +		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
> +		return error ? error : VM_FAULT_DONE_COW;
>  	}
>  
>  	if (!buffer_mapped(&bh)) {
> @@ -1430,10 +1435,15 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
>  			goto unlock_entry;
>  		if (!radix_tree_exceptional_entry(entry)) {
>  			vmf->page = entry;

In __do_fault() we explicitly clear vmf->page in the case where PageHWPoison()
is set.  I think we can get the same behavior here by moving the call that
sets vmf->page after the PageHWPoison() check.

> -			return VM_FAULT_LOCKED;
> +			if (unlikely(PageHWPoison(entry))) {
> +				put_locked_mapping_entry(mapping, vmf->pgoff,
> +							 entry);
> +				return VM_FAULT_HWPOISON;
> +			}
>  		}
> -		vmf->entry = entry;
> -		return VM_FAULT_DAX_LOCKED;

I think we're missing a call to 

	__SetPageUptodate(new_page);

before finish_fault()?  This call currently lives in do_cow_fault(), and
is part of the path that we don't skip as part of the VM_FAULT_DAX_LOCKED
logic.

Both of these comments apply equally to the iomap_dax_fault() code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check
  2016-09-27 16:08 ` [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check Jan Kara
@ 2016-10-17 19:40     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 19:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:15PM +0200, Jan Kara wrote:
> We don't check whether vma->vm_ops is NULL in do_shared_fault() so
> there's hardly any point in checking it in wp_page_shared() which gets
> called only for shared file mappings as well.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index a4522e8999b2..63d9c1a54caf 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2301,7 +2301,7 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
>  
>  	get_page(old_page);
>  
> -	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> +	if (vma->vm_ops->page_mkwrite) {
>  		int tmp;
>  
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> -- 
> 2.6.6

Does this apply equally to the check in wp_pfn_shared()?  Both
wp_page_shared() and wp_pfn_shared() are called for shared file mappings via
do_wp_page().
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check
@ 2016-10-17 19:40     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 19:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:15PM +0200, Jan Kara wrote:
> We don't check whether vma->vm_ops is NULL in do_shared_fault() so
> there's hardly any point in checking it in wp_page_shared() which gets
> called only for shared file mappings as well.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index a4522e8999b2..63d9c1a54caf 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2301,7 +2301,7 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
>  
>  	get_page(old_page);
>  
> -	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> +	if (vma->vm_ops->page_mkwrite) {
>  		int tmp;
>  
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> -- 
> 2.6.6

Does this apply equally to the check in wp_pfn_shared()?  Both
wp_page_shared() and wp_pfn_shared() are called for shared file mappings via
do_wp_page().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/20] mm: Factor out common parts of write fault handling
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-17 22:08     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 22:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:16PM +0200, Jan Kara wrote:
> Currently we duplicate handling of shared write faults in
> wp_page_reuse() and do_shared_fault(). Factor them out into a common
> function.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
>  1 file changed, 37 insertions(+), 41 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 63d9c1a54caf..0643b3b5a12a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
>  }
>  
>  /*
> + * Handle dirtying of a page in shared file mapping on a write fault.
> + *
> + * The function expects the page to be locked and unlocks it.
> + */
> +static void fault_dirty_shared_page(struct vm_area_struct *vma,
> +				    struct page *page)
> +{
> +	struct address_space *mapping;
> +	bool dirtied;
> +	bool page_mkwrite = vma->vm_ops->page_mkwrite;

I think you may need to pass in a 'page_mkwrite' parameter if you don't want
to change behavior.  Just checking to see of vma->vm_ops->page_mkwrite is
non-NULL works fine for this path:

do_shared_fault()
	fault_dirty_shared_page()

and for

wp_page_shared()
	wp_page_reuse()
		fault_dirty_shared_page()

But for these paths:

wp_pfn_shared()
	wp_page_reuse()
		fault_dirty_shared_page()

and

do_wp_page()
	wp_page_reuse()
		fault_dirty_shared_page()

we unconditionally pass 0 for the 'page_mkwrite' parameter, even though from
the logic in wp_pfn_shared() especially you can see that
vma->vm_ops->pfn_mkwrite() must be defined some of the time.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/20] mm: Factor out common parts of write fault handling
@ 2016-10-17 22:08     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 22:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:16PM +0200, Jan Kara wrote:
> Currently we duplicate handling of shared write faults in
> wp_page_reuse() and do_shared_fault(). Factor them out into a common
> function.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
>  1 file changed, 37 insertions(+), 41 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 63d9c1a54caf..0643b3b5a12a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
>  }
>  
>  /*
> + * Handle dirtying of a page in shared file mapping on a write fault.
> + *
> + * The function expects the page to be locked and unlocks it.
> + */
> +static void fault_dirty_shared_page(struct vm_area_struct *vma,
> +				    struct page *page)
> +{
> +	struct address_space *mapping;
> +	bool dirtied;
> +	bool page_mkwrite = vma->vm_ops->page_mkwrite;

I think you may need to pass in a 'page_mkwrite' parameter if you don't want
to change behavior.  Just checking to see of vma->vm_ops->page_mkwrite is
non-NULL works fine for this path:

do_shared_fault()
	fault_dirty_shared_page()

and for

wp_page_shared()
	wp_page_reuse()
		fault_dirty_shared_page()

But for these paths:

wp_pfn_shared()
	wp_page_reuse()
		fault_dirty_shared_page()

and

do_wp_page()
	wp_page_reuse()
		fault_dirty_shared_page()

we unconditionally pass 0 for the 'page_mkwrite' parameter, even though from
the logic in wp_pfn_shared() especially you can see that
vma->vm_ops->pfn_mkwrite() must be defined some of the time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite()
  2016-09-27 16:08 ` [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite() Jan Kara
@ 2016-10-17 22:29     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 22:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:17PM +0200, Jan Kara wrote:
> We will need more information in the ->page_mkwrite() helper for DAX to
> be able to fully finish faults there. Pass vm_fault structure to
> do_page_mkwrite() and use it there so that information propagates
> properly from upper layers.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite()
@ 2016-10-17 22:29     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-17 22:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:17PM +0200, Jan Kara wrote:
> We will need more information in the ->page_mkwrite() helper for DAX to
> be able to fully finish faults there. Pass vm_fault structure to
> do_page_mkwrite() and use it there so that information propagates
> properly from upper layers.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/20] mm: Factor out functionality to finish page faults
  2016-10-17 17:40     ` Ross Zwisler
  (?)
@ 2016-10-18  9:44     ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18  9:44 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 11:40:42, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:13PM +0200, Jan Kara wrote:
> > +	/* Did we COW the page? */
> > +	if (vmf->flags & FAULT_FLAG_WRITE && !(vmf->vma->vm_flags & VM_SHARED))
> 
> Oh, sorry, I did have one bit of feedback.  Maybe added parens around the flag
> check for readability:
> 
> 	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->vma->vm_flags & VM_SHARED))

Fixed.

> Aside from that one nit:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks!

								Honza 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
  2016-10-17 18:59                       ` Ross Zwisler
@ 2016-10-18  9:49                         ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18  9:49 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, Dave Chinner, Andrew Morton, Christoph Hellwig,
	linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 12:59:55, Ross Zwisler wrote:
> On Mon, Oct 17, 2016 at 10:47:32AM +0200, Jan Kara wrote:
> 
> > This week I plan to rebase both series on top of rc1 + your THP patches so
> > that we can move on with merging the stuff.
> 
> Yea...so how are we going to coordinate merging of these series for the v4.10
> merge window?  My series mostly changes DAX, but it also changes XFS, ext2 and
> ext4.  I think the plan right now is to have Dave Chinner take it through his
> XFS tree.
> 
> Your first series is mostly mm changes with some DAX sprinkled in, and your
> second series touches dax, mm and all 3 DAX filesystems.  
> 
> What is the best way to handle all this?  Have it go through one central tree
> (-MM?), even though the changes touch code that exists outside of that trees
> normal domain (like the FS code)?  Have my series go through the XFS tree and
> yours through -MM, and give Linus a merge resolution patch?  Something else?

For your changes to go through XFS tree is IMO fine (changes outside of XFS
& DAX are easy). Let me do the rebase first and then discuss how to merge
my patches after that...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches
@ 2016-10-18  9:49                         ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18  9:49 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, Dave Chinner, Andrew Morton, Christoph Hellwig,
	linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 12:59:55, Ross Zwisler wrote:
> On Mon, Oct 17, 2016 at 10:47:32AM +0200, Jan Kara wrote:
> 
> > This week I plan to rebase both series on top of rc1 + your THP patches so
> > that we can move on with merging the stuff.
> 
> Yea...so how are we going to coordinate merging of these series for the v4.10
> merge window?  My series mostly changes DAX, but it also changes XFS, ext2 and
> ext4.  I think the plan right now is to have Dave Chinner take it through his
> XFS tree.
> 
> Your first series is mostly mm changes with some DAX sprinkled in, and your
> second series touches dax, mm and all 3 DAX filesystems.  
> 
> What is the best way to handle all this?  Have it go through one central tree
> (-MM?), even though the changes touch code that exists outside of that trees
> normal domain (like the FS code)?  Have my series go through the XFS tree and
> yours through -MM, and give Linus a merge resolution patch?  Something else?

For your changes to go through XFS tree is IMO fine (changes outside of XFS
& DAX are easy). Let me do the rebase first and then discuss how to merge
my patches after that...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 07/20] mm: Add orig_pte field into vm_fault
  2016-10-17 16:45     ` Ross Zwisler
@ 2016-10-18 10:13       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:13 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Mon 17-10-16 10:45:12, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:11PM +0200, Jan Kara wrote:
> > Add orig_pte field to vm_fault structure to allow ->page_mkwrite
> > handlers to fully handle the fault. This also allows us to save some
> > passing of extra arguments around.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index f88b2d3810a7..66bc77f2d1d2 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -890,11 +890,12 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >  	vmf.pte = pte_offset_map(pmd, address);
> >  	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
> >  			vmf.pte++, vmf.address += PAGE_SIZE) {
> > -		pteval = *vmf.pte;
> > +		vmf.orig_pte = *vmf.pte;
> > +		pteval = vmf.orig_pte;
> >  		if (!is_swap_pte(pteval))
> >  			continue;
> 
> 'pteval' is now only used once.  It's probably cleaner to just remove it and
> use vmf.orig_pte for the is_swap_pte() check.

Yes, fixed.

> > @@ -3484,8 +3484,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
> >  		 * So now it's safe to run pte_offset_map().
> >  		 */
> >  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> > -
> > -		entry = *vmf->pte;
> > +		vmf->orig_pte = *vmf->pte;
> >  
> >  		/*
> >  		 * some architectures can have larger ptes than wordsize,
> > @@ -3496,6 +3495,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
> >  		 * ptl lock held. So here a barrier will do.
> >  		 */
> >  		barrier();
> > +		entry = vmf->orig_pte;
> 
> This set of 'entry' is now on the other side of the barrier().  I'll admit
> that I don't fully grok the need for the barrier. Does it apply to only the
> setting of vmf->pte and vmf->orig_pte, or does 'entry' also matter because it
> too is of type pte_t, and thus could be bigger than the architecture's word
> size?
> 
> My guess is that 'entry' matters, too, and should remain before the barrier()
> call.  If not, can you help me understand why?

Sure, actually the comment just above the barrier() explains it: We care
about sampling *vmf->pte value only once - so we want the value stored in
'entry' (vmf->orig_pte after the patch) to be used and avoid compiler
optimizations leading to refetching the value at *vmf->pte. The way I've
written the code achieves this. Actually, I've moved the 'entry' assignment
even further down where it makes more sense with the new code layout.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 07/20] mm: Add orig_pte field into vm_fault
@ 2016-10-18 10:13       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:13 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 10:45:12, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:11PM +0200, Jan Kara wrote:
> > Add orig_pte field to vm_fault structure to allow ->page_mkwrite
> > handlers to fully handle the fault. This also allows us to save some
> > passing of extra arguments around.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index f88b2d3810a7..66bc77f2d1d2 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -890,11 +890,12 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> >  	vmf.pte = pte_offset_map(pmd, address);
> >  	for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE;
> >  			vmf.pte++, vmf.address += PAGE_SIZE) {
> > -		pteval = *vmf.pte;
> > +		vmf.orig_pte = *vmf.pte;
> > +		pteval = vmf.orig_pte;
> >  		if (!is_swap_pte(pteval))
> >  			continue;
> 
> 'pteval' is now only used once.  It's probably cleaner to just remove it and
> use vmf.orig_pte for the is_swap_pte() check.

Yes, fixed.

> > @@ -3484,8 +3484,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
> >  		 * So now it's safe to run pte_offset_map().
> >  		 */
> >  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> > -
> > -		entry = *vmf->pte;
> > +		vmf->orig_pte = *vmf->pte;
> >  
> >  		/*
> >  		 * some architectures can have larger ptes than wordsize,
> > @@ -3496,6 +3495,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
> >  		 * ptl lock held. So here a barrier will do.
> >  		 */
> >  		barrier();
> > +		entry = vmf->orig_pte;
> 
> This set of 'entry' is now on the other side of the barrier().  I'll admit
> that I don't fully grok the need for the barrier. Does it apply to only the
> setting of vmf->pte and vmf->orig_pte, or does 'entry' also matter because it
> too is of type pte_t, and thus could be bigger than the architecture's word
> size?
> 
> My guess is that 'entry' matters, too, and should remain before the barrier()
> call.  If not, can you help me understand why?

Sure, actually the comment just above the barrier() explains it: We care
about sampling *vmf->pte value only once - so we want the value stored in
'entry' (vmf->orig_pte after the patch) to be used and avoid compiler
optimizations leading to refetching the value at *vmf->pte. The way I've
written the code achieves this. Actually, I've moved the 'entry' assignment
even further down where it makes more sense with the new code layout.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
  2016-10-17 19:29     ` Ross Zwisler
@ 2016-10-18 10:32       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:32 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Mon 17-10-16 13:29:49, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:14PM +0200, Jan Kara wrote:
> > Move final handling of COW faults from generic code into DAX fault
> > handler. That way generic code doesn't have to be aware of peculiarities
> > of DAX locking so remove that knowledge.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c            | 22 ++++++++++++++++------
> >  include/linux/dax.h |  7 -------
> >  include/linux/mm.h  |  9 +--------
> >  mm/memory.c         | 14 ++++----------
> >  4 files changed, 21 insertions(+), 31 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 0dc251ca77b8..b1c503930d1d 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -876,10 +876,15 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> >  			goto unlock_entry;
> >  		if (!radix_tree_exceptional_entry(entry)) {
> >  			vmf->page = entry;
> > -			return VM_FAULT_LOCKED;
> > +			if (unlikely(PageHWPoison(entry))) {
> > +				put_locked_mapping_entry(mapping, vmf->pgoff,
> > +							 entry);
> > +				return VM_FAULT_HWPOISON;
> > +			}
> >  		}
> > -		vmf->entry = entry;
> > -		return VM_FAULT_DAX_LOCKED;
> > +		error = finish_fault(vmf);
> > +		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
> > +		return error ? error : VM_FAULT_DONE_COW;
> >  	}
> >  
> >  	if (!buffer_mapped(&bh)) {
> > @@ -1430,10 +1435,15 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> >  			goto unlock_entry;
> >  		if (!radix_tree_exceptional_entry(entry)) {
> >  			vmf->page = entry;
> 
> In __do_fault() we explicitly clear vmf->page in the case where PageHWPoison()
> is set.  I think we can get the same behavior here by moving the call that
> sets vmf->page after the PageHWPoison() check.

Actually, the whole HWPoison checking was non-sensical for DAX. We want to
check for HWPoison to avoid reading from poisoned pages. However for DAX we
either use copy_user_dax() which takes care of IO errors / poisoning itself
or we use clear_user_highpage() which doesn't touch the source page. So we
don't have to check for HWPoison at all. Fixed.

> > -			return VM_FAULT_LOCKED;
> > +			if (unlikely(PageHWPoison(entry))) {
> > +				put_locked_mapping_entry(mapping, vmf->pgoff,
> > +							 entry);
> > +				return VM_FAULT_HWPOISON;
> > +			}
> >  		}
> > -		vmf->entry = entry;
> > -		return VM_FAULT_DAX_LOCKED;
> 
> I think we're missing a call to 
> 
> 	__SetPageUptodate(new_page);

> before finish_fault()?  This call currently lives in do_cow_fault(), and
> is part of the path that we don't skip as part of the VM_FAULT_DAX_LOCKED
> logic.

Ah, great catch. I wonder how the DAX COW test could have passed with this?
Maybe PageUptodate is not used much for anon pages... Anyway thanks for
spotting this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
@ 2016-10-18 10:32       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:32 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 13:29:49, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:14PM +0200, Jan Kara wrote:
> > Move final handling of COW faults from generic code into DAX fault
> > handler. That way generic code doesn't have to be aware of peculiarities
> > of DAX locking so remove that knowledge.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/dax.c            | 22 ++++++++++++++++------
> >  include/linux/dax.h |  7 -------
> >  include/linux/mm.h  |  9 +--------
> >  mm/memory.c         | 14 ++++----------
> >  4 files changed, 21 insertions(+), 31 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 0dc251ca77b8..b1c503930d1d 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -876,10 +876,15 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> >  			goto unlock_entry;
> >  		if (!radix_tree_exceptional_entry(entry)) {
> >  			vmf->page = entry;
> > -			return VM_FAULT_LOCKED;
> > +			if (unlikely(PageHWPoison(entry))) {
> > +				put_locked_mapping_entry(mapping, vmf->pgoff,
> > +							 entry);
> > +				return VM_FAULT_HWPOISON;
> > +			}
> >  		}
> > -		vmf->entry = entry;
> > -		return VM_FAULT_DAX_LOCKED;
> > +		error = finish_fault(vmf);
> > +		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
> > +		return error ? error : VM_FAULT_DONE_COW;
> >  	}
> >  
> >  	if (!buffer_mapped(&bh)) {
> > @@ -1430,10 +1435,15 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> >  			goto unlock_entry;
> >  		if (!radix_tree_exceptional_entry(entry)) {
> >  			vmf->page = entry;
> 
> In __do_fault() we explicitly clear vmf->page in the case where PageHWPoison()
> is set.  I think we can get the same behavior here by moving the call that
> sets vmf->page after the PageHWPoison() check.

Actually, the whole HWPoison checking was non-sensical for DAX. We want to
check for HWPoison to avoid reading from poisoned pages. However for DAX we
either use copy_user_dax() which takes care of IO errors / poisoning itself
or we use clear_user_highpage() which doesn't touch the source page. So we
don't have to check for HWPoison at all. Fixed.

> > -			return VM_FAULT_LOCKED;
> > +			if (unlikely(PageHWPoison(entry))) {
> > +				put_locked_mapping_entry(mapping, vmf->pgoff,
> > +							 entry);
> > +				return VM_FAULT_HWPOISON;
> > +			}
> >  		}
> > -		vmf->entry = entry;
> > -		return VM_FAULT_DAX_LOCKED;
> 
> I think we're missing a call to 
> 
> 	__SetPageUptodate(new_page);

> before finish_fault()?  This call currently lives in do_cow_fault(), and
> is part of the path that we don't skip as part of the VM_FAULT_DAX_LOCKED
> logic.

Ah, great catch. I wonder how the DAX COW test could have passed with this?
Maybe PageUptodate is not used much for anon pages... Anyway thanks for
spotting this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check
  2016-10-17 19:40     ` Ross Zwisler
@ 2016-10-18 10:37       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:37 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Mon 17-10-16 13:40:41, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:15PM +0200, Jan Kara wrote:
> > We don't check whether vma->vm_ops is NULL in do_shared_fault() so
> > there's hardly any point in checking it in wp_page_shared() which gets
> > called only for shared file mappings as well.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/memory.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index a4522e8999b2..63d9c1a54caf 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2301,7 +2301,7 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
> >  
> >  	get_page(old_page);
> >  
> > -	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> > +	if (vma->vm_ops->page_mkwrite) {
> >  		int tmp;
> >  
> >  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> > -- 
> > 2.6.6
> 
> Does this apply equally to the check in wp_pfn_shared()?  Both
> wp_page_shared() and wp_pfn_shared() are called for shared file mappings via
> do_wp_page().

Yes, it does apply there as well. Added to the commit. There are actually
more places with these checks which don't seem necessary but I didn't want
to do more cleanups than I need... But at least these two come logically
together.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check
@ 2016-10-18 10:37       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:37 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 13:40:41, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:15PM +0200, Jan Kara wrote:
> > We don't check whether vma->vm_ops is NULL in do_shared_fault() so
> > there's hardly any point in checking it in wp_page_shared() which gets
> > called only for shared file mappings as well.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/memory.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index a4522e8999b2..63d9c1a54caf 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2301,7 +2301,7 @@ static int wp_page_shared(struct vm_fault *vmf, struct page *old_page)
> >  
> >  	get_page(old_page);
> >  
> > -	if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
> > +	if (vma->vm_ops->page_mkwrite) {
> >  		int tmp;
> >  
> >  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> > -- 
> > 2.6.6
> 
> Does this apply equally to the check in wp_pfn_shared()?  Both
> wp_page_shared() and wp_pfn_shared() are called for shared file mappings via
> do_wp_page().

Yes, it does apply there as well. Added to the commit. There are actually
more places with these checks which don't seem necessary but I didn't want
to do more cleanups than I need... But at least these two come logically
together.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/20] mm: Factor out common parts of write fault handling
  2016-10-17 22:08     ` Ross Zwisler
  (?)
@ 2016-10-18 10:50     ` Jan Kara
  2016-10-18 17:32         ` Ross Zwisler
  -1 siblings, 1 reply; 130+ messages in thread
From: Jan Kara @ 2016-10-18 10:50 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Mon 17-10-16 16:08:51, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:16PM +0200, Jan Kara wrote:
> > Currently we duplicate handling of shared write faults in
> > wp_page_reuse() and do_shared_fault(). Factor them out into a common
> > function.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
> >  1 file changed, 37 insertions(+), 41 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 63d9c1a54caf..0643b3b5a12a 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> >  }
> >  
> >  /*
> > + * Handle dirtying of a page in shared file mapping on a write fault.
> > + *
> > + * The function expects the page to be locked and unlocks it.
> > + */
> > +static void fault_dirty_shared_page(struct vm_area_struct *vma,
> > +				    struct page *page)
> > +{
> > +	struct address_space *mapping;
> > +	bool dirtied;
> > +	bool page_mkwrite = vma->vm_ops->page_mkwrite;
> 
> I think you may need to pass in a 'page_mkwrite' parameter if you don't want
> to change behavior.  Just checking to see of vma->vm_ops->page_mkwrite is
> non-NULL works fine for this path:
> 
> do_shared_fault()
> 	fault_dirty_shared_page()
> 
> and for
> 
> wp_page_shared()
> 	wp_page_reuse()
> 		fault_dirty_shared_page()
> 
> But for these paths:
> 
> wp_pfn_shared()
> 	wp_page_reuse()
> 		fault_dirty_shared_page()
> 
> and
> 
> do_wp_page()
> 	wp_page_reuse()
> 		fault_dirty_shared_page()
> 
> we unconditionally pass 0 for the 'page_mkwrite' parameter, even though from
> the logic in wp_pfn_shared() especially you can see that
> vma->vm_ops->pfn_mkwrite() must be defined some of the time.

The trick which makes this work is that for fault_dirty_shared_page() to be
called at all, you have to set 'dirty_shared' argument to wp_page_reuse()
and that does not happen from wp_pfn_shared() and do_wp_page() paths. So
things work as they should. If you look somewhat later into the series,
the patch "mm: Move part of wp_page_reuse() into the single call site"
cleans this up to make things more obvious.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/20] mm: Factor out common parts of write fault handling
  2016-10-18 10:50     ` Jan Kara
@ 2016-10-18 17:32         ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 17:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Oct 18, 2016 at 12:50:00PM +0200, Jan Kara wrote:
> On Mon 17-10-16 16:08:51, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:16PM +0200, Jan Kara wrote:
> > > Currently we duplicate handling of shared write faults in
> > > wp_page_reuse() and do_shared_fault(). Factor them out into a common
> > > function.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
> > >  1 file changed, 37 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 63d9c1a54caf..0643b3b5a12a 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> > >  }
> > >  
> > >  /*
> > > + * Handle dirtying of a page in shared file mapping on a write fault.
> > > + *
> > > + * The function expects the page to be locked and unlocks it.
> > > + */
> > > +static void fault_dirty_shared_page(struct vm_area_struct *vma,
> > > +				    struct page *page)
> > > +{
> > > +	struct address_space *mapping;
> > > +	bool dirtied;
> > > +	bool page_mkwrite = vma->vm_ops->page_mkwrite;
> > 
> > I think you may need to pass in a 'page_mkwrite' parameter if you don't want
> > to change behavior.  Just checking to see of vma->vm_ops->page_mkwrite is
> > non-NULL works fine for this path:
> > 
> > do_shared_fault()
> > 	fault_dirty_shared_page()
> > 
> > and for
> > 
> > wp_page_shared()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > But for these paths:
> > 
> > wp_pfn_shared()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > and
> > 
> > do_wp_page()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > we unconditionally pass 0 for the 'page_mkwrite' parameter, even though from
> > the logic in wp_pfn_shared() especially you can see that
> > vma->vm_ops->pfn_mkwrite() must be defined some of the time.
> 
> The trick which makes this work is that for fault_dirty_shared_page() to be
> called at all, you have to set 'dirty_shared' argument to wp_page_reuse()
> and that does not happen from wp_pfn_shared() and do_wp_page() paths. So
> things work as they should. If you look somewhat later into the series,
> the patch "mm: Move part of wp_page_reuse() into the single call site"
> cleans this up to make things more obvious.
> 
> 								Honza

Ah, cool, that makes sense.

You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/20] mm: Factor out common parts of write fault handling
@ 2016-10-18 17:32         ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 17:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov

On Tue, Oct 18, 2016 at 12:50:00PM +0200, Jan Kara wrote:
> On Mon 17-10-16 16:08:51, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:16PM +0200, Jan Kara wrote:
> > > Currently we duplicate handling of shared write faults in
> > > wp_page_reuse() and do_shared_fault(). Factor them out into a common
> > > function.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  mm/memory.c | 78 +++++++++++++++++++++++++++++--------------------------------
> > >  1 file changed, 37 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 63d9c1a54caf..0643b3b5a12a 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2063,6 +2063,41 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
> > >  }
> > >  
> > >  /*
> > > + * Handle dirtying of a page in shared file mapping on a write fault.
> > > + *
> > > + * The function expects the page to be locked and unlocks it.
> > > + */
> > > +static void fault_dirty_shared_page(struct vm_area_struct *vma,
> > > +				    struct page *page)
> > > +{
> > > +	struct address_space *mapping;
> > > +	bool dirtied;
> > > +	bool page_mkwrite = vma->vm_ops->page_mkwrite;
> > 
> > I think you may need to pass in a 'page_mkwrite' parameter if you don't want
> > to change behavior.  Just checking to see of vma->vm_ops->page_mkwrite is
> > non-NULL works fine for this path:
> > 
> > do_shared_fault()
> > 	fault_dirty_shared_page()
> > 
> > and for
> > 
> > wp_page_shared()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > But for these paths:
> > 
> > wp_pfn_shared()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > and
> > 
> > do_wp_page()
> > 	wp_page_reuse()
> > 		fault_dirty_shared_page()
> > 
> > we unconditionally pass 0 for the 'page_mkwrite' parameter, even though from
> > the logic in wp_pfn_shared() especially you can see that
> > vma->vm_ops->pfn_mkwrite() must be defined some of the time.
> 
> The trick which makes this work is that for fault_dirty_shared_page() to be
> called at all, you have to set 'dirty_shared' argument to wp_page_reuse()
> and that does not happen from wp_pfn_shared() and do_wp_page() paths. So
> things work as they should. If you look somewhat later into the series,
> the patch "mm: Move part of wp_page_reuse() into the single call site"
> cleans this up to make things more obvious.
> 
> 								Honza

Ah, cool, that makes sense.

You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 14/20] mm: Use vmf->page during WP faults
  2016-09-27 16:08 ` [PATCH 14/20] mm: Use vmf->page during WP faults Jan Kara
@ 2016-10-18 17:56     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 17:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:18PM +0200, Jan Kara wrote:
> So far we set vmf->page during WP faults only when we needed to pass it
> to the ->page_mkwrite handler. Set it in all the cases now and use that
> instead of passing page pointer explicitely around.
				  explicitly

> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 14/20] mm: Use vmf->page during WP faults
@ 2016-10-18 17:56     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 17:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:18PM +0200, Jan Kara wrote:
> So far we set vmf->page during WP faults only when we needed to pass it
> to the ->page_mkwrite handler. Set it in all the cases now and use that
> instead of passing page pointer explicitely around.
				  explicitly

> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 15/20] mm: Move part of wp_page_reuse() into the single call site
  2016-09-27 16:08   ` Jan Kara
  (?)
  (?)
@ 2016-10-18 17:59   ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 17:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:19PM +0200, Jan Kara wrote:
> wp_page_reuse() handles write shared faults which is needed only in
> wp_page_shared(). Move the handling only into that location to make
> wp_page_reuse() simpler and avoid a strange situation when we sometimes
> pass in locked page, sometimes unlocked etc.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-18 18:35     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 18:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> Provide a helper function for finishing write faults due to PTE being
> read-only. The helper will be used by DAX to avoid the need of
> complicating generic MM code with DAX locking specifics.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  include/linux/mm.h |  1 +
>  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
>  2 files changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1055f2ece80d..e5a014be8932 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  		struct page *page);
>  int finish_fault(struct vm_fault *vmf);
> +int finish_mkwrite_fault(struct vm_fault *vmf);
>  #endif
>  
>  /*
> diff --git a/mm/memory.c b/mm/memory.c
> index f49e736d6a36..8c8cb7f2133e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2266,6 +2266,36 @@ oom:
>  	return VM_FAULT_OOM;
>  }
>  
> +/**
> + * finish_mkrite_fault - finish page fault making PTE writeable once the page
      finish_mkwrite_fault

> @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
>  			put_page(vmf->page);
>  			return tmp;
>  		}
> -		/*
> -		 * Since we dropped the lock we need to revalidate
> -		 * the PTE as someone else may have changed it.  If
> -		 * they did, we just return, as we can count on the
> -		 * MMU to tell us if they didn't also make it writable.
> -		 */
> -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> -						vmf->address, &vmf->ptl);
> -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> +		tmp = finish_mkwrite_fault(vmf);
> +		if (unlikely(!tmp || (tmp &
> +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {

The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
I think this test should just be 

		if (unlikely(!tmp)) {

With that and the small spelling fix:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-10-18 18:35     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 18:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> Provide a helper function for finishing write faults due to PTE being
> read-only. The helper will be used by DAX to avoid the need of
> complicating generic MM code with DAX locking specifics.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  include/linux/mm.h |  1 +
>  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
>  2 files changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1055f2ece80d..e5a014be8932 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  		struct page *page);
>  int finish_fault(struct vm_fault *vmf);
> +int finish_mkwrite_fault(struct vm_fault *vmf);
>  #endif
>  
>  /*
> diff --git a/mm/memory.c b/mm/memory.c
> index f49e736d6a36..8c8cb7f2133e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2266,6 +2266,36 @@ oom:
>  	return VM_FAULT_OOM;
>  }
>  
> +/**
> + * finish_mkrite_fault - finish page fault making PTE writeable once the page
      finish_mkwrite_fault

> @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
>  			put_page(vmf->page);
>  			return tmp;
>  		}
> -		/*
> -		 * Since we dropped the lock we need to revalidate
> -		 * the PTE as someone else may have changed it.  If
> -		 * they did, we just return, as we can count on the
> -		 * MMU to tell us if they didn't also make it writable.
> -		 */
> -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> -						vmf->address, &vmf->ptl);
> -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> +		tmp = finish_mkwrite_fault(vmf);
> +		if (unlikely(!tmp || (tmp &
> +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {

The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
I think this test should just be 

		if (unlikely(!tmp)) {

With that and the small spelling fix:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 17/20] mm: Export follow_pte()
  2016-09-27 16:08 ` [PATCH 17/20] mm: Export follow_pte() Jan Kara
@ 2016-10-18 18:37     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 18:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:21PM +0200, Jan Kara wrote:
> DAX will need to implement its own version of page_check_address(). To
> avoid duplicating page table walking code, export follow_pte() which
> does what we need.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 17/20] mm: Export follow_pte()
@ 2016-10-18 18:37     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 18:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:21PM +0200, Jan Kara wrote:
> DAX will need to implement its own version of page_check_address(). To
> avoid duplicating page table walking code, export follow_pte() which
> does what we need.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/20] dax: Make cache flushing protected by entry lock
  2016-09-27 16:08 ` [PATCH 18/20] dax: Make cache flushing protected by entry lock Jan Kara
@ 2016-10-18 19:20   ` Ross Zwisler
  2016-10-19  7:19       ` Jan Kara
  2016-10-19 18:25       ` Ross Zwisler
  0 siblings, 2 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 19:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:22PM +0200, Jan Kara wrote:
> Currently, flushing of caches for DAX mappings was ignoring entry lock.
> So far this was ok (modulo a bug that a difference in entry lock could
> cause cache flushing to be mistakenly skipped) but in the following
> patches we will write-protect PTEs on cache flushing and clear dirty
> tags. For that we will need more exclusion. So do cache flushing under
> an entry lock. This allows us to remove one lock-unlock pair of
> mapping->tree_lock as a bonus.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

> @@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
>  	}
>  
>  	wb_cache_pmem(dax.addr, dax.size);
> -
> -	spin_lock_irq(&mapping->tree_lock);
> -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> -	spin_unlock_irq(&mapping->tree_lock);
> - unmap:
> +unmap:
>  	dax_unmap_atomic(bdev, &dax);
> +	put_locked_mapping_entry(mapping, index, entry);
>  	return ret;
>  
> - unlock:
> +put_unlock:

I know there's an ongoing debate about this, but can you please stick a space
in front of the labels to make the patches pretty & to be consistent with the
rest of the DAX code?

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-18 19:53     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 19:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
> has released corresponding radix tree entry lock. When we want to
> writeprotect PTE on cache flush, we need PTE modification to happen
> under radix tree entry lock to ensure consisten updates of PTE and radix
					consistent

> tree (standard faults use page lock to ensure this consistency). So move
> update of PTE bit into dax_pfn_mkwrite().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c    | 22 ++++++++++++++++------
>  mm/memory.c |  2 +-
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index c6cadf8413a3..a2d3781c9f4e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1163,17 +1163,27 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  {
>  	struct file *file = vma->vm_file;
>  	struct address_space *mapping = file->f_mapping;
> -	void *entry;
> +	void *entry, **slot;
>  	pgoff_t index = vmf->pgoff;
>  
>  	spin_lock_irq(&mapping->tree_lock);
> -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> -	if (!entry || !radix_tree_exceptional_entry(entry))
> -		goto out;
> +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> +		if (entry)
> +			put_unlocked_mapping_entry(mapping, index, entry);

I don't think you need this call to put_unlocked_mapping_entry().  If we get
in here we know that 'entry' is a page cache page, in which case
put_unlocked_mapping_entry() will just return without doing any work.

With that nit & the spelling error above:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
@ 2016-10-18 19:53     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 19:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> Currently PTE gets updated in wp_pfn_shared() after dax_pfn_mkwrite()
> has released corresponding radix tree entry lock. When we want to
> writeprotect PTE on cache flush, we need PTE modification to happen
> under radix tree entry lock to ensure consisten updates of PTE and radix
					consistent

> tree (standard faults use page lock to ensure this consistency). So move
> update of PTE bit into dax_pfn_mkwrite().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/dax.c    | 22 ++++++++++++++++------
>  mm/memory.c |  2 +-
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index c6cadf8413a3..a2d3781c9f4e 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1163,17 +1163,27 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
>  {
>  	struct file *file = vma->vm_file;
>  	struct address_space *mapping = file->f_mapping;
> -	void *entry;
> +	void *entry, **slot;
>  	pgoff_t index = vmf->pgoff;
>  
>  	spin_lock_irq(&mapping->tree_lock);
> -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> -	if (!entry || !radix_tree_exceptional_entry(entry))
> -		goto out;
> +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> +		if (entry)
> +			put_unlocked_mapping_entry(mapping, index, entry);

I don't think you need this call to put_unlocked_mapping_entry().  If we get
in here we know that 'entry' is a page cache page, in which case
put_unlocked_mapping_entry() will just return without doing any work.

With that nit & the spelling error above:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
  2016-09-27 16:08   ` Jan Kara
@ 2016-10-18 22:12     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 22:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> Currently we never clear dirty tags in DAX mappings and thus address
> ranges to flush accumulate. Now that we have locking of radix tree
> entries, we have all the locking necessary to reliably clear the radix
> tree dirty tag when flushing caches for corresponding address range.
> Similarly to page_mkclean() we also have to write-protect pages to get a
> page fault when the page is next written to so that we can mark the
> entry dirty again.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks great. 

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
@ 2016-10-18 22:12     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-18 22:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Ross Zwisler, Kirill A. Shutemov

On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> Currently we never clear dirty tags in DAX mappings and thus address
> ranges to flush accumulate. Now that we have locking of radix tree
> entries, we have all the locking necessary to reliably clear the radix
> tree dirty tag when flushing caches for corresponding address range.
> Similarly to page_mkclean() we also have to write-protect pages to get a
> page fault when the page is next written to so that we can mark the
> entry dirty again.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks great. 

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
  2016-10-18 18:35     ` Ross Zwisler
@ 2016-10-19  7:16       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:16 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue 18-10-16 12:35:25, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> > Provide a helper function for finishing write faults due to PTE being
> > read-only. The helper will be used by DAX to avoid the need of
> > complicating generic MM code with DAX locking specifics.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  include/linux/mm.h |  1 +
> >  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
> >  2 files changed, 39 insertions(+), 27 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1055f2ece80d..e5a014be8932 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> >  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> >  		struct page *page);
> >  int finish_fault(struct vm_fault *vmf);
> > +int finish_mkwrite_fault(struct vm_fault *vmf);
> >  #endif
> >  
> >  /*
> > diff --git a/mm/memory.c b/mm/memory.c
> > index f49e736d6a36..8c8cb7f2133e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2266,6 +2266,36 @@ oom:
> >  	return VM_FAULT_OOM;
> >  }
> >  
> > +/**
> > + * finish_mkrite_fault - finish page fault making PTE writeable once the page
>       finish_mkwrite_fault

Fixed, thanks.

> > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> >  			put_page(vmf->page);
> >  			return tmp;
> >  		}
> > -		/*
> > -		 * Since we dropped the lock we need to revalidate
> > -		 * the PTE as someone else may have changed it.  If
> > -		 * they did, we just return, as we can count on the
> > -		 * MMU to tell us if they didn't also make it writable.
> > -		 */
> > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > -						vmf->address, &vmf->ptl);
> > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > +		tmp = finish_mkwrite_fault(vmf);
> > +		if (unlikely(!tmp || (tmp &
> > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> 
> The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> I think this test should just be 
> 
> 		if (unlikely(!tmp)) {

Right, finish_mkwrite_fault() cannot currently throw other errors than
"retry needed" which is indicated by tmp == 0. However I'd prefer to keep
symmetry with finish_fault() handler which can throw other errors and
better be prepared to handle them from finish_mkwrite_fault() as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-10-19  7:16       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:16 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Tue 18-10-16 12:35:25, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> > Provide a helper function for finishing write faults due to PTE being
> > read-only. The helper will be used by DAX to avoid the need of
> > complicating generic MM code with DAX locking specifics.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  include/linux/mm.h |  1 +
> >  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
> >  2 files changed, 39 insertions(+), 27 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1055f2ece80d..e5a014be8932 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> >  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> >  		struct page *page);
> >  int finish_fault(struct vm_fault *vmf);
> > +int finish_mkwrite_fault(struct vm_fault *vmf);
> >  #endif
> >  
> >  /*
> > diff --git a/mm/memory.c b/mm/memory.c
> > index f49e736d6a36..8c8cb7f2133e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2266,6 +2266,36 @@ oom:
> >  	return VM_FAULT_OOM;
> >  }
> >  
> > +/**
> > + * finish_mkrite_fault - finish page fault making PTE writeable once the page
>       finish_mkwrite_fault

Fixed, thanks.

> > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> >  			put_page(vmf->page);
> >  			return tmp;
> >  		}
> > -		/*
> > -		 * Since we dropped the lock we need to revalidate
> > -		 * the PTE as someone else may have changed it.  If
> > -		 * they did, we just return, as we can count on the
> > -		 * MMU to tell us if they didn't also make it writable.
> > -		 */
> > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > -						vmf->address, &vmf->ptl);
> > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > +		tmp = finish_mkwrite_fault(vmf);
> > +		if (unlikely(!tmp || (tmp &
> > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> 
> The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> I think this test should just be 
> 
> 		if (unlikely(!tmp)) {

Right, finish_mkwrite_fault() cannot currently throw other errors than
"retry needed" which is indicated by tmp == 0. However I'd prefer to keep
symmetry with finish_fault() handler which can throw other errors and
better be prepared to handle them from finish_mkwrite_fault() as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/20] dax: Make cache flushing protected by entry lock
  2016-10-18 19:20   ` Ross Zwisler
@ 2016-10-19  7:19       ` Jan Kara
  2016-10-19 18:25       ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:19 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue 18-10-16 13:20:13, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:22PM +0200, Jan Kara wrote:
> > Currently, flushing of caches for DAX mappings was ignoring entry lock.
> > So far this was ok (modulo a bug that a difference in entry lock could
> > cause cache flushing to be mistakenly skipped) but in the following
> > patches we will write-protect PTEs on cache flushing and clear dirty
> > tags. For that we will need more exclusion. So do cache flushing under
> > an entry lock. This allows us to remove one lock-unlock pair of
> > mapping->tree_lock as a bonus.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> > @@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
> >  	}
> >  
> >  	wb_cache_pmem(dax.addr, dax.size);
> > -
> > -	spin_lock_irq(&mapping->tree_lock);
> > -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > -	spin_unlock_irq(&mapping->tree_lock);
> > - unmap:
> > +unmap:
> >  	dax_unmap_atomic(bdev, &dax);
> > +	put_locked_mapping_entry(mapping, index, entry);
> >  	return ret;
> >  
> > - unlock:
> > +put_unlock:
> 
> I know there's an ongoing debate about this, but can you please stick a space
> in front of the labels to make the patches pretty & to be consistent with the
> rest of the DAX code?

OK, done.

> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/20] dax: Make cache flushing protected by entry lock
@ 2016-10-19  7:19       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:19 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Tue 18-10-16 13:20:13, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:22PM +0200, Jan Kara wrote:
> > Currently, flushing of caches for DAX mappings was ignoring entry lock.
> > So far this was ok (modulo a bug that a difference in entry lock could
> > cause cache flushing to be mistakenly skipped) but in the following
> > patches we will write-protect PTEs on cache flushing and clear dirty
> > tags. For that we will need more exclusion. So do cache flushing under
> > an entry lock. This allows us to remove one lock-unlock pair of
> > mapping->tree_lock as a bonus.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> > @@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
> >  	}
> >  
> >  	wb_cache_pmem(dax.addr, dax.size);
> > -
> > -	spin_lock_irq(&mapping->tree_lock);
> > -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > -	spin_unlock_irq(&mapping->tree_lock);
> > - unmap:
> > +unmap:
> >  	dax_unmap_atomic(bdev, &dax);
> > +	put_locked_mapping_entry(mapping, index, entry);
> >  	return ret;
> >  
> > - unlock:
> > +put_unlock:
> 
> I know there's an ongoing debate about this, but can you please stick a space
> in front of the labels to make the patches pretty & to be consistent with the
> rest of the DAX code?

OK, done.

> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
  2016-10-18 19:53     ` Ross Zwisler
@ 2016-10-19  7:25       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:25 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue 18-10-16 13:53:32, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> > -	void *entry;
> > +	void *entry, **slot;
> >  	pgoff_t index = vmf->pgoff;
> >  
> >  	spin_lock_irq(&mapping->tree_lock);
> > -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> > -	if (!entry || !radix_tree_exceptional_entry(entry))
> > -		goto out;
> > +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> > +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> > +		if (entry)
> > +			put_unlocked_mapping_entry(mapping, index, entry);
> 
> I don't think you need this call to put_unlocked_mapping_entry().  If we get
> in here we know that 'entry' is a page cache page, in which case
> put_unlocked_mapping_entry() will just return without doing any work.

Right, but that is just an implementation detail internal to how the
locking works. The rules are simple to avoid issues and thus the invariant
is: Once you call get_unlocked_mapping_entry() you either have to lock the
entry and then call put_locked_mapping_entry() or you have to drop it with
put_unlocked_mapping_entry(). Once you add arguments about entry types
etc., errors are much easier to make...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
@ 2016-10-19  7:25       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:25 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Tue 18-10-16 13:53:32, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> > -	void *entry;
> > +	void *entry, **slot;
> >  	pgoff_t index = vmf->pgoff;
> >  
> >  	spin_lock_irq(&mapping->tree_lock);
> > -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> > -	if (!entry || !radix_tree_exceptional_entry(entry))
> > -		goto out;
> > +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> > +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> > +		if (entry)
> > +			put_unlocked_mapping_entry(mapping, index, entry);
> 
> I don't think you need this call to put_unlocked_mapping_entry().  If we get
> in here we know that 'entry' is a page cache page, in which case
> put_unlocked_mapping_entry() will just return without doing any work.

Right, but that is just an implementation detail internal to how the
locking works. The rules are simple to avoid issues and thus the invariant
is: Once you call get_unlocked_mapping_entry() you either have to lock the
entry and then call put_locked_mapping_entry() or you have to drop it with
put_unlocked_mapping_entry(). Once you add arguments about entry types
etc., errors are much easier to make...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
  2016-10-18 22:12     ` Ross Zwisler
@ 2016-10-19  7:30       ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:30 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Tue 18-10-16 16:12:54, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> > Currently we never clear dirty tags in DAX mappings and thus address
> > ranges to flush accumulate. Now that we have locking of radix tree
> > entries, we have all the locking necessary to reliably clear the radix
> > tree dirty tag when flushing caches for corresponding address range.
> > Similarly to page_mkclean() we also have to write-protect pages to get a
> > page fault when the page is next written to so that we can mark the
> > entry dirty again.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Looks great. 
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks for review Ross! I've rebased the series on top of rc1. Do you have
your PMD series somewhere rebased on top of rc1 so that I can rebase my
patches on top of that as well? Then I'd post another version of the
series...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
@ 2016-10-19  7:30       ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-19  7:30 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Tue 18-10-16 16:12:54, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> > Currently we never clear dirty tags in DAX mappings and thus address
> > ranges to flush accumulate. Now that we have locking of radix tree
> > entries, we have all the locking necessary to reliably clear the radix
> > tree dirty tag when flushing caches for corresponding address range.
> > Similarly to page_mkclean() we also have to write-protect pages to get a
> > page fault when the page is next written to so that we can mark the
> > entry dirty again.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Looks great. 
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks for review Ross! I've rebased the series on top of rc1. Do you have
your PMD series somewhere rebased on top of rc1 so that I can rebase my
patches on top of that as well? Then I'd post another version of the
series...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
  2016-10-19  7:30       ` Jan Kara
@ 2016-10-19 16:38         ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 16:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:30:16AM +0200, Jan Kara wrote:
> On Tue 18-10-16 16:12:54, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> > > Currently we never clear dirty tags in DAX mappings and thus address
> > > ranges to flush accumulate. Now that we have locking of radix tree
> > > entries, we have all the locking necessary to reliably clear the radix
> > > tree dirty tag when flushing caches for corresponding address range.
> > > Similarly to page_mkclean() we also have to write-protect pages to get a
> > > page fault when the page is next written to so that we can mark the
> > > entry dirty again.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > Looks great. 
> > 
> > Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Thanks for review Ross! I've rebased the series on top of rc1. Do you have
> your PMD series somewhere rebased on top of rc1 so that I can rebase my
> patches on top of that as well? Then I'd post another version of the
> series...

Sure, I'll rebase & post a new version of my series today.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/20] dax: Clear dirty entry tags on cache flush
@ 2016-10-19 16:38         ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 16:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:30:16AM +0200, Jan Kara wrote:
> On Tue 18-10-16 16:12:54, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:24PM +0200, Jan Kara wrote:
> > > Currently we never clear dirty tags in DAX mappings and thus address
> > > ranges to flush accumulate. Now that we have locking of radix tree
> > > entries, we have all the locking necessary to reliably clear the radix
> > > tree dirty tag when flushing caches for corresponding address range.
> > > Similarly to page_mkclean() we also have to write-protect pages to get a
> > > page fault when the page is next written to so that we can mark the
> > > entry dirty again.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > Looks great. 
> > 
> > Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Thanks for review Ross! I've rebased the series on top of rc1. Do you have
> your PMD series somewhere rebased on top of rc1 so that I can rebase my
> patches on top of that as well? Then I'd post another version of the
> series...

Sure, I'll rebase & post a new version of my series today.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
  2016-10-19  7:16       ` Jan Kara
@ 2016-10-19 17:21         ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 17:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:16:00AM +0200, Jan Kara wrote:
> On Tue 18-10-16 12:35:25, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> > > Provide a helper function for finishing write faults due to PTE being
> > > read-only. The helper will be used by DAX to avoid the need of
> > > complicating generic MM code with DAX locking specifics.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  include/linux/mm.h |  1 +
> > >  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
> > >  2 files changed, 39 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 1055f2ece80d..e5a014be8932 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> > >  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> > >  		struct page *page);
> > >  int finish_fault(struct vm_fault *vmf);
> > > +int finish_mkwrite_fault(struct vm_fault *vmf);
> > >  #endif
> > >  
> > >  /*
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index f49e736d6a36..8c8cb7f2133e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2266,6 +2266,36 @@ oom:
> > >  	return VM_FAULT_OOM;
> > >  }
> > >  
> > > +/**
> > > + * finish_mkrite_fault - finish page fault making PTE writeable once the page
> >       finish_mkwrite_fault
> 
> Fixed, thanks.
> 
> > > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> > >  			put_page(vmf->page);
> > >  			return tmp;
> > >  		}
> > > -		/*
> > > -		 * Since we dropped the lock we need to revalidate
> > > -		 * the PTE as someone else may have changed it.  If
> > > -		 * they did, we just return, as we can count on the
> > > -		 * MMU to tell us if they didn't also make it writable.
> > > -		 */
> > > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > > -						vmf->address, &vmf->ptl);
> > > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > > +		tmp = finish_mkwrite_fault(vmf);
> > > +		if (unlikely(!tmp || (tmp &
> > > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> > 
> > The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> > I think this test should just be 
> > 
> > 		if (unlikely(!tmp)) {
> 
> Right, finish_mkwrite_fault() cannot currently throw other errors than
> "retry needed" which is indicated by tmp == 0. However I'd prefer to keep
> symmetry with finish_fault() handler which can throw other errors and
> better be prepared to handle them from finish_mkwrite_fault() as well.

Fair enough.  You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-10-19 17:21         ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 17:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:16:00AM +0200, Jan Kara wrote:
> On Tue 18-10-16 12:35:25, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:20PM +0200, Jan Kara wrote:
> > > Provide a helper function for finishing write faults due to PTE being
> > > read-only. The helper will be used by DAX to avoid the need of
> > > complicating generic MM code with DAX locking specifics.
> > > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  include/linux/mm.h |  1 +
> > >  mm/memory.c        | 65 +++++++++++++++++++++++++++++++-----------------------
> > >  2 files changed, 39 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 1055f2ece80d..e5a014be8932 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -617,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> > >  int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> > >  		struct page *page);
> > >  int finish_fault(struct vm_fault *vmf);
> > > +int finish_mkwrite_fault(struct vm_fault *vmf);
> > >  #endif
> > >  
> > >  /*
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index f49e736d6a36..8c8cb7f2133e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -2266,6 +2266,36 @@ oom:
> > >  	return VM_FAULT_OOM;
> > >  }
> > >  
> > > +/**
> > > + * finish_mkrite_fault - finish page fault making PTE writeable once the page
> >       finish_mkwrite_fault
> 
> Fixed, thanks.
> 
> > > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> > >  			put_page(vmf->page);
> > >  			return tmp;
> > >  		}
> > > -		/*
> > > -		 * Since we dropped the lock we need to revalidate
> > > -		 * the PTE as someone else may have changed it.  If
> > > -		 * they did, we just return, as we can count on the
> > > -		 * MMU to tell us if they didn't also make it writable.
> > > -		 */
> > > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > > -						vmf->address, &vmf->ptl);
> > > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > > +		tmp = finish_mkwrite_fault(vmf);
> > > +		if (unlikely(!tmp || (tmp &
> > > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> > 
> > The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> > I think this test should just be 
> > 
> > 		if (unlikely(!tmp)) {
> 
> Right, finish_mkwrite_fault() cannot currently throw other errors than
> "retry needed" which is indicated by tmp == 0. However I'd prefer to keep
> symmetry with finish_fault() handler which can throw other errors and
> better be prepared to handle them from finish_mkwrite_fault() as well.

Fair enough.  You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
  2016-10-19  7:25       ` Jan Kara
@ 2016-10-19 17:25         ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 17:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:25:05AM +0200, Jan Kara wrote:
> On Tue 18-10-16 13:53:32, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> > > -	void *entry;
> > > +	void *entry, **slot;
> > >  	pgoff_t index = vmf->pgoff;
> > >  
> > >  	spin_lock_irq(&mapping->tree_lock);
> > > -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> > > -	if (!entry || !radix_tree_exceptional_entry(entry))
> > > -		goto out;
> > > +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> > > +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> > > +		if (entry)
> > > +			put_unlocked_mapping_entry(mapping, index, entry);
> > 
> > I don't think you need this call to put_unlocked_mapping_entry().  If we get
> > in here we know that 'entry' is a page cache page, in which case
> > put_unlocked_mapping_entry() will just return without doing any work.
> 
> Right, but that is just an implementation detail internal to how the
> locking works. The rules are simple to avoid issues and thus the invariant
> is: Once you call get_unlocked_mapping_entry() you either have to lock the
> entry and then call put_locked_mapping_entry() or you have to drop it with
> put_unlocked_mapping_entry(). Once you add arguments about entry types
> etc., errors are much easier to make...

Makes sense.  You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree entry lock
@ 2016-10-19 17:25         ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 17:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ross Zwisler, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov

On Wed, Oct 19, 2016 at 09:25:05AM +0200, Jan Kara wrote:
> On Tue 18-10-16 13:53:32, Ross Zwisler wrote:
> > On Tue, Sep 27, 2016 at 06:08:23PM +0200, Jan Kara wrote:
> > > -	void *entry;
> > > +	void *entry, **slot;
> > >  	pgoff_t index = vmf->pgoff;
> > >  
> > >  	spin_lock_irq(&mapping->tree_lock);
> > > -	entry = get_unlocked_mapping_entry(mapping, index, NULL);
> > > -	if (!entry || !radix_tree_exceptional_entry(entry))
> > > -		goto out;
> > > +	entry = get_unlocked_mapping_entry(mapping, index, &slot);
> > > +	if (!entry || !radix_tree_exceptional_entry(entry)) {
> > > +		if (entry)
> > > +			put_unlocked_mapping_entry(mapping, index, entry);
> > 
> > I don't think you need this call to put_unlocked_mapping_entry().  If we get
> > in here we know that 'entry' is a page cache page, in which case
> > put_unlocked_mapping_entry() will just return without doing any work.
> 
> Right, but that is just an implementation detail internal to how the
> locking works. The rules are simple to avoid issues and thus the invariant
> is: Once you call get_unlocked_mapping_entry() you either have to lock the
> entry and then call put_locked_mapping_entry() or you have to drop it with
> put_unlocked_mapping_entry(). Once you add arguments about entry types
> etc., errors are much easier to make...

Makes sense.  You can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/20] dax: Make cache flushing protected by entry lock
  2016-10-18 19:20   ` Ross Zwisler
@ 2016-10-19 18:25       ` Ross Zwisler
  2016-10-19 18:25       ` Ross Zwisler
  1 sibling, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 18:25 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov, Jean Delvare

On Tue, Oct 18, 2016 at 01:20:13PM -0600, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:22PM +0200, Jan Kara wrote:
> > Currently, flushing of caches for DAX mappings was ignoring entry lock.
> > So far this was ok (modulo a bug that a difference in entry lock could
> > cause cache flushing to be mistakenly skipped) but in the following
> > patches we will write-protect PTEs on cache flushing and clear dirty
> > tags. For that we will need more exclusion. So do cache flushing under
> > an entry lock. This allows us to remove one lock-unlock pair of
> > mapping->tree_lock as a bonus.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> > @@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
> >  	}
> >  
> >  	wb_cache_pmem(dax.addr, dax.size);
> > -
> > -	spin_lock_irq(&mapping->tree_lock);
> > -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > -	spin_unlock_irq(&mapping->tree_lock);
> > - unmap:
> > +unmap:
> >  	dax_unmap_atomic(bdev, &dax);
> > +	put_locked_mapping_entry(mapping, index, entry);
> >  	return ret;
> >  
> > - unlock:
> > +put_unlock:
> 
> I know there's an ongoing debate about this, but can you please stick a space
> in front of the labels to make the patches pretty & to be consistent with the
> rest of the DAX code?

Never mind, it looks like the need for labels has been fixed with this commit:

commit 218dd85887da (".gitattributes: set git diff driver for C source code
files")

With this, my git at least generates diffs that don't use labels in the chunk
start, even without spaces.

Feel free to leave out the spaces, and we can remove the stragglers from
fs/dax.c at some point.

Thanks, Jean, for fixing this!
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/20] dax: Make cache flushing protected by entry lock
@ 2016-10-19 18:25       ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-10-19 18:25 UTC (permalink / raw)
  To: Ross Zwisler, Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm,
	Dan Williams, Kirill A. Shutemov, Jean Delvare

On Tue, Oct 18, 2016 at 01:20:13PM -0600, Ross Zwisler wrote:
> On Tue, Sep 27, 2016 at 06:08:22PM +0200, Jan Kara wrote:
> > Currently, flushing of caches for DAX mappings was ignoring entry lock.
> > So far this was ok (modulo a bug that a difference in entry lock could
> > cause cache flushing to be mistakenly skipped) but in the following
> > patches we will write-protect PTEs on cache flushing and clear dirty
> > tags. For that we will need more exclusion. So do cache flushing under
> > an entry lock. This allows us to remove one lock-unlock pair of
> > mapping->tree_lock as a bonus.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> > @@ -716,15 +736,13 @@ static int dax_writeback_one(struct block_device *bdev,
> >  	}
> >  
> >  	wb_cache_pmem(dax.addr, dax.size);
> > -
> > -	spin_lock_irq(&mapping->tree_lock);
> > -	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> > -	spin_unlock_irq(&mapping->tree_lock);
> > - unmap:
> > +unmap:
> >  	dax_unmap_atomic(bdev, &dax);
> > +	put_locked_mapping_entry(mapping, index, entry);
> >  	return ret;
> >  
> > - unlock:
> > +put_unlock:
> 
> I know there's an ongoing debate about this, but can you please stick a space
> in front of the labels to make the patches pretty & to be consistent with the
> rest of the DAX code?

Never mind, it looks like the need for labels has been fixed with this commit:

commit 218dd85887da (".gitattributes: set git diff driver for C source code
files")

With this, my git at least generates diffs that don't use labels in the chunk
start, even without spaces.

Feel free to leave out the spaces, and we can remove the stragglers from
fs/dax.c at some point.

Thanks, Jean, for fixing this!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
  2016-10-19 17:21         ` Ross Zwisler
@ 2016-10-20  8:48           ` Jan Kara
  -1 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-20  8:48 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov

On Wed 19-10-16 11:21:52, Ross Zwisler wrote:
> On Wed, Oct 19, 2016 at 09:16:00AM +0200, Jan Kara wrote:
> > > > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> > > >  			put_page(vmf->page);
> > > >  			return tmp;
> > > >  		}
> > > > -		/*
> > > > -		 * Since we dropped the lock we need to revalidate
> > > > -		 * the PTE as someone else may have changed it.  If
> > > > -		 * they did, we just return, as we can count on the
> > > > -		 * MMU to tell us if they didn't also make it writable.
> > > > -		 */
> > > > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > > > -						vmf->address, &vmf->ptl);
> > > > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > > > +		tmp = finish_mkwrite_fault(vmf);
> > > > +		if (unlikely(!tmp || (tmp &
> > > > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> > > 
> > > The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> > > I think this test should just be 
> > > 
> > > 		if (unlikely(!tmp)) {
> > 
> > Right, finish_mkwrite_fault() cannot currently throw other errors than
> > "retry needed" which is indicated by tmp == 0. However I'd prefer to keep
> > symmetry with finish_fault() handler which can throw other errors and
> > better be prepared to handle them from finish_mkwrite_fault() as well.
> 
> Fair enough.  You can add:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks. Actually, your question made me have a harder look at return values
from finish_mkwrite_fault() and I've added one more commit switching the
return values so that finish_mkwrite_fault() returns 0 on success and
VM_FAULT_NOPAGE if PTE changed. That is less confusing and even more
consistent with what finish_fault() returns.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/20] mm: Provide helper for finishing mkwrite faults
@ 2016-10-20  8:48           ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-10-20  8:48 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Jan Kara, linux-mm, linux-fsdevel, linux-nvdimm, Dan Williams,
	Kirill A. Shutemov

On Wed 19-10-16 11:21:52, Ross Zwisler wrote:
> On Wed, Oct 19, 2016 at 09:16:00AM +0200, Jan Kara wrote:
> > > > @@ -2315,26 +2335,17 @@ static int wp_page_shared(struct vm_fault *vmf)
> > > >  			put_page(vmf->page);
> > > >  			return tmp;
> > > >  		}
> > > > -		/*
> > > > -		 * Since we dropped the lock we need to revalidate
> > > > -		 * the PTE as someone else may have changed it.  If
> > > > -		 * they did, we just return, as we can count on the
> > > > -		 * MMU to tell us if they didn't also make it writable.
> > > > -		 */
> > > > -		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
> > > > -						vmf->address, &vmf->ptl);
> > > > -		if (!pte_same(*vmf->pte, vmf->orig_pte)) {
> > > > +		tmp = finish_mkwrite_fault(vmf);
> > > > +		if (unlikely(!tmp || (tmp &
> > > > +				      (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))) {
> > > 
> > > The 'tmp' return from finish_mkwrite_fault() can only be 0 or VM_FAULT_WRITE.
> > > I think this test should just be 
> > > 
> > > 		if (unlikely(!tmp)) {
> > 
> > Right, finish_mkwrite_fault() cannot currently throw other errors than
> > "retry needed" which is indicated by tmp == 0. However I'd prefer to keep
> > symmetry with finish_fault() handler which can throw other errors and
> > better be prepared to handle them from finish_mkwrite_fault() as well.
> 
> Fair enough.  You can add:
> 
> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Thanks. Actually, your question made me have a harder look at return values
from finish_mkwrite_fault() and I've added one more commit switching the
return values so that finish_mkwrite_fault() returns 0 on success and
VM_FAULT_NOPAGE if PTE changed. That is less confusing and even more
consistent with what finish_fault() returns.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
  2016-11-18  9:17   ` Jan Kara
@ 2016-11-21  4:39     ` Ross Zwisler
  -1 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-11-21  4:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-nvdimm, linux-mm, linux-fsdevel, Kirill A. Shutemov, Andrew Morton

On Fri, Nov 18, 2016 at 10:17:14AM +0100, Jan Kara wrote:
> Move final handling of COW faults from generic code into DAX fault
> handler. That way generic code doesn't have to be aware of peculiarities
> of DAX locking so remove that knowledge and make locking functions
> private to fs/dax.c.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/20] mm: Move handling of COW faults into DAX code
@ 2016-11-21  4:39     ` Ross Zwisler
  0 siblings, 0 replies; 130+ messages in thread
From: Ross Zwisler @ 2016-11-21  4:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, Kirill A. Shutemov, Ross Zwisler, Andrew Morton,
	linux-fsdevel, linux-nvdimm

On Fri, Nov 18, 2016 at 10:17:14AM +0100, Jan Kara wrote:
> Move final handling of COW faults from generic code into DAX fault
> handler. That way generic code doesn't have to be aware of peculiarities
> of DAX locking so remove that knowledge and make locking functions
> private to fs/dax.c.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH 10/20] mm: Move handling of COW faults into DAX code
  2016-11-18  9:17 [PATCH 0/20 v5] " Jan Kara
@ 2016-11-18  9:17   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-11-18  9:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Kirill A. Shutemov, Ross Zwisler, Andrew Morton, linux-fsdevel,
	linux-nvdimm, Jan Kara

Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of peculiarities
of DAX locking so remove that knowledge and make locking functions
private to fs/dax.c.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 61 +++++++++++++++++++++++++----------------------------
 include/linux/dax.h |  7 ------
 include/linux/mm.h  |  9 +-------
 mm/memory.c         | 13 ++++--------
 4 files changed, 34 insertions(+), 56 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 56e05af1d2bf..9be1464d1a7e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -240,6 +240,23 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 	}
 }
 
+static void dax_unlock_mapping_entry(struct address_space *mapping,
+				     pgoff_t index)
+{
+	void *entry, **slot;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
+	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
+			 !slot_locked(mapping, slot))) {
+		spin_unlock_irq(&mapping->tree_lock);
+		return;
+	}
+	unlock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
+}
+
 static void put_locked_mapping_entry(struct address_space *mapping,
 				     pgoff_t index, void *entry)
 {
@@ -434,22 +451,6 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, &key);
 }
 
-void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
-{
-	void *entry, **slot;
-
-	spin_lock_irq(&mapping->tree_lock);
-	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
-	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
-			 !slot_locked(mapping, slot))) {
-		spin_unlock_irq(&mapping->tree_lock);
-		return;
-	}
-	unlock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
-	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
-}
-
 /*
  * Delete exceptional DAX entry at @index from @mapping. Wait for radix tree
  * entry to get unlocked before deleting it.
@@ -501,10 +502,8 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
 	/* This will replace locked radix tree entry with a hole page */
 	page = find_or_create_page(mapping, vmf->pgoff,
 				   vmf->gfp_mask | __GFP_ZERO);
-	if (!page) {
-		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
+	if (!page)
 		return VM_FAULT_OOM;
-	}
 	vmf->page = page;
 	return VM_FAULT_LOCKED;
 }
@@ -953,7 +952,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
-	int locked_status = 0;
+	int vmf_ret = 0;
 	void *entry;
 
 	/*
@@ -1006,13 +1005,11 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 		if (error)
 			goto finish_iomap;
-		if (!radix_tree_exceptional_entry(entry)) {
-			vmf->page = entry;
-			locked_status = VM_FAULT_LOCKED;
-		} else {
-			vmf->entry = entry;
-			locked_status = VM_FAULT_DAX_LOCKED;
-		}
+
+		__SetPageUptodate(vmf->cow_page);
+		vmf_ret = finish_fault(vmf);
+		if (!vmf_ret)
+			vmf_ret = VM_FAULT_DONE_COW;
 		goto finish_iomap;
 	}
 
@@ -1029,7 +1026,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
 		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
-			locked_status = dax_load_hole(mapping, entry, vmf);
+			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			break;
 		}
 		/*FALLTHRU*/
@@ -1041,7 +1038,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
  finish_iomap:
 	if (ops->iomap_end) {
-		if (error) {
+		if (error || (vmf_ret & VM_FAULT_ERROR)) {
 			/* keep previous error */
 			ops->iomap_end(inode, pos, PAGE_SIZE, 0, flags,
 					&iomap);
@@ -1051,7 +1048,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		}
 	}
  unlock_entry:
-	if (!locked_status || error)
+	if (vmf_ret != VM_FAULT_LOCKED || error)
 		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
  out:
 	if (error == -ENOMEM)
@@ -1059,9 +1056,9 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	/* -EBUSY is fine, somebody else faulted on the same PTE */
 	if (error < 0 && error != -EBUSY)
 		return VM_FAULT_SIGBUS | major;
-	if (locked_status) {
+	if (vmf_ret) {
 		WARN_ON_ONCE(error); /* -EBUSY from ops->iomap_end? */
-		return locked_status;
+		return vmf_ret;
 	}
 	return VM_FAULT_NOPAGE | major;
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0afade8bd3d7..f97bcfe79472 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -46,7 +46,6 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 #ifdef CONFIG_FS_DAX
 struct page *read_dax_sector(struct block_device *bdev, sector_t n);
-void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index);
 int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
@@ -55,12 +54,6 @@ static inline struct page *read_dax_sector(struct block_device *bdev,
 {
 	return ERR_PTR(-ENXIO);
 }
-/* Shouldn't ever be called when dax is disabled. */
-static inline void dax_unlock_mapping_entry(struct address_space *mapping,
-					    pgoff_t index)
-{
-	BUG();
-}
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		sector_t sector, unsigned int offset, unsigned int length)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 482455952f03..fb128beecdac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -308,12 +308,6 @@ struct vm_fault {
 					 * is set (which is also implied by
 					 * VM_FAULT_ERROR).
 					 */
-	void *entry;			/* ->fault handler can alternatively
-					 * return locked DAX entry. In that
-					 * case handler should return
-					 * VM_FAULT_DAX_LOCKED and fill in
-					 * entry here.
-					 */
 	/* These three entries are valid only while holding ptl lock */
 	pte_t *pte;			/* Pointer to pte entry matching
 					 * the 'address'. NULL if the page
@@ -1104,8 +1098,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
-#define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
-#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
+#define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index ba49e5bacf17..e9e9224264da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2849,7 +2849,7 @@ static int __do_fault(struct vm_fault *vmf)
 
 	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
-			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
+			    VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3241,17 +3241,12 @@ static int do_cow_fault(struct vm_fault *vmf)
 	if (ret & VM_FAULT_DONE_COW)
 		return ret;
 
-	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma);
+	copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(vmf->cow_page);
 
 	ret |= finish_fault(vmf);
-	if (!(ret & VM_FAULT_DAX_LOCKED)) {
-		unlock_page(vmf->page);
-		put_page(vmf->page);
-	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
-	}
+	unlock_page(vmf->page);
+	put_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 	return ret;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 10/20] mm: Move handling of COW faults into DAX code
@ 2016-11-18  9:17   ` Jan Kara
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Kara @ 2016-11-18  9:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Kirill A. Shutemov, Ross Zwisler, Andrew Morton, linux-fsdevel,
	linux-nvdimm, Jan Kara

Move final handling of COW faults from generic code into DAX fault
handler. That way generic code doesn't have to be aware of peculiarities
of DAX locking so remove that knowledge and make locking functions
private to fs/dax.c.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c            | 61 +++++++++++++++++++++++++----------------------------
 include/linux/dax.h |  7 ------
 include/linux/mm.h  |  9 +-------
 mm/memory.c         | 13 ++++--------
 4 files changed, 34 insertions(+), 56 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 56e05af1d2bf..9be1464d1a7e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -240,6 +240,23 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
 	}
 }
 
+static void dax_unlock_mapping_entry(struct address_space *mapping,
+				     pgoff_t index)
+{
+	void *entry, **slot;
+
+	spin_lock_irq(&mapping->tree_lock);
+	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
+	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
+			 !slot_locked(mapping, slot))) {
+		spin_unlock_irq(&mapping->tree_lock);
+		return;
+	}
+	unlock_slot(mapping, slot);
+	spin_unlock_irq(&mapping->tree_lock);
+	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
+}
+
 static void put_locked_mapping_entry(struct address_space *mapping,
 				     pgoff_t index, void *entry)
 {
@@ -434,22 +451,6 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 		__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, &key);
 }
 
-void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
-{
-	void *entry, **slot;
-
-	spin_lock_irq(&mapping->tree_lock);
-	entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
-	if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
-			 !slot_locked(mapping, slot))) {
-		spin_unlock_irq(&mapping->tree_lock);
-		return;
-	}
-	unlock_slot(mapping, slot);
-	spin_unlock_irq(&mapping->tree_lock);
-	dax_wake_mapping_entry_waiter(mapping, index, entry, false);
-}
-
 /*
  * Delete exceptional DAX entry at @index from @mapping. Wait for radix tree
  * entry to get unlocked before deleting it.
@@ -501,10 +502,8 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
 	/* This will replace locked radix tree entry with a hole page */
 	page = find_or_create_page(mapping, vmf->pgoff,
 				   vmf->gfp_mask | __GFP_ZERO);
-	if (!page) {
-		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
+	if (!page)
 		return VM_FAULT_OOM;
-	}
 	vmf->page = page;
 	return VM_FAULT_LOCKED;
 }
@@ -953,7 +952,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	struct iomap iomap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
-	int locked_status = 0;
+	int vmf_ret = 0;
 	void *entry;
 
 	/*
@@ -1006,13 +1005,11 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 		if (error)
 			goto finish_iomap;
-		if (!radix_tree_exceptional_entry(entry)) {
-			vmf->page = entry;
-			locked_status = VM_FAULT_LOCKED;
-		} else {
-			vmf->entry = entry;
-			locked_status = VM_FAULT_DAX_LOCKED;
-		}
+
+		__SetPageUptodate(vmf->cow_page);
+		vmf_ret = finish_fault(vmf);
+		if (!vmf_ret)
+			vmf_ret = VM_FAULT_DONE_COW;
 		goto finish_iomap;
 	}
 
@@ -1029,7 +1026,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	case IOMAP_UNWRITTEN:
 	case IOMAP_HOLE:
 		if (!(vmf->flags & FAULT_FLAG_WRITE)) {
-			locked_status = dax_load_hole(mapping, entry, vmf);
+			vmf_ret = dax_load_hole(mapping, entry, vmf);
 			break;
 		}
 		/*FALLTHRU*/
@@ -1041,7 +1038,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
  finish_iomap:
 	if (ops->iomap_end) {
-		if (error) {
+		if (error || (vmf_ret & VM_FAULT_ERROR)) {
 			/* keep previous error */
 			ops->iomap_end(inode, pos, PAGE_SIZE, 0, flags,
 					&iomap);
@@ -1051,7 +1048,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		}
 	}
  unlock_entry:
-	if (!locked_status || error)
+	if (vmf_ret != VM_FAULT_LOCKED || error)
 		put_locked_mapping_entry(mapping, vmf->pgoff, entry);
  out:
 	if (error == -ENOMEM)
@@ -1059,9 +1056,9 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	/* -EBUSY is fine, somebody else faulted on the same PTE */
 	if (error < 0 && error != -EBUSY)
 		return VM_FAULT_SIGBUS | major;
-	if (locked_status) {
+	if (vmf_ret) {
 		WARN_ON_ONCE(error); /* -EBUSY from ops->iomap_end? */
-		return locked_status;
+		return vmf_ret;
 	}
 	return VM_FAULT_NOPAGE | major;
 }
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0afade8bd3d7..f97bcfe79472 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -46,7 +46,6 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
 
 #ifdef CONFIG_FS_DAX
 struct page *read_dax_sector(struct block_device *bdev, sector_t n);
-void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index);
 int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 		unsigned int offset, unsigned int length);
 #else
@@ -55,12 +54,6 @@ static inline struct page *read_dax_sector(struct block_device *bdev,
 {
 	return ERR_PTR(-ENXIO);
 }
-/* Shouldn't ever be called when dax is disabled. */
-static inline void dax_unlock_mapping_entry(struct address_space *mapping,
-					    pgoff_t index)
-{
-	BUG();
-}
 static inline int __dax_zero_page_range(struct block_device *bdev,
 		sector_t sector, unsigned int offset, unsigned int length)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 482455952f03..fb128beecdac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -308,12 +308,6 @@ struct vm_fault {
 					 * is set (which is also implied by
 					 * VM_FAULT_ERROR).
 					 */
-	void *entry;			/* ->fault handler can alternatively
-					 * return locked DAX entry. In that
-					 * case handler should return
-					 * VM_FAULT_DAX_LOCKED and fill in
-					 * entry here.
-					 */
 	/* These three entries are valid only while holding ptl lock */
 	pte_t *pte;			/* Pointer to pte entry matching
 					 * the 'address'. NULL if the page
@@ -1104,8 +1098,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
 #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
 #define VM_FAULT_FALLBACK 0x0800	/* huge page fault failed, fall back to small */
-#define VM_FAULT_DAX_LOCKED 0x1000	/* ->fault has locked DAX entry */
-#define VM_FAULT_DONE_COW   0x2000	/* ->fault has fully handled COW */
+#define VM_FAULT_DONE_COW   0x1000	/* ->fault has fully handled COW */
 
 #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
 
diff --git a/mm/memory.c b/mm/memory.c
index ba49e5bacf17..e9e9224264da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2849,7 +2849,7 @@ static int __do_fault(struct vm_fault *vmf)
 
 	ret = vma->vm_ops->fault(vma, vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
-			    VM_FAULT_DAX_LOCKED | VM_FAULT_DONE_COW)))
+			    VM_FAULT_DONE_COW)))
 		return ret;
 
 	if (unlikely(PageHWPoison(vmf->page))) {
@@ -3241,17 +3241,12 @@ static int do_cow_fault(struct vm_fault *vmf)
 	if (ret & VM_FAULT_DONE_COW)
 		return ret;
 
-	if (!(ret & VM_FAULT_DAX_LOCKED))
-		copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma);
+	copy_user_highpage(vmf->cow_page, vmf->page, vmf->address, vma);
 	__SetPageUptodate(vmf->cow_page);
 
 	ret |= finish_fault(vmf);
-	if (!(ret & VM_FAULT_DAX_LOCKED)) {
-		unlock_page(vmf->page);
-		put_page(vmf->page);
-	} else {
-		dax_unlock_mapping_entry(vma->vm_file->f_mapping, vmf->pgoff);
-	}
+	unlock_page(vmf->page);
+	put_page(vmf->page);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
 		goto uncharge_out;
 	return ret;
-- 
2.6.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2016-11-21  4:39 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-27 16:08 [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Jan Kara
2016-09-27 16:08 ` [PATCH 01/20] mm: Change type of vmf->virtual_address Jan Kara
     [not found]   ` <1474992504-20133-2-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2016-09-30  9:07     ` Christoph Hellwig
2016-09-30  9:07       ` Christoph Hellwig
2016-10-14 18:02   ` Ross Zwisler
2016-10-14 18:02     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 02/20] mm: Join struct fault_env and vm_fault Jan Kara
     [not found]   ` <1474992504-20133-3-git-send-email-jack-AlSwsSmVLrQ@public.gmane.org>
2016-09-30  9:10     ` Christoph Hellwig
2016-09-30  9:10       ` Christoph Hellwig
2016-10-03  7:43       ` Jan Kara
2016-10-03  7:43         ` Jan Kara
2016-09-27 16:08 ` [PATCH 03/20] mm: Use pgoff in struct vm_fault instead of passing it separately Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-14 18:42   ` Ross Zwisler
2016-10-14 18:42     ` Ross Zwisler
2016-10-17  9:01     ` Jan Kara
2016-09-27 16:08 ` [PATCH 04/20] mm: Use passed vm_fault structure in __do_fault() Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-14 19:05   ` Ross Zwisler
2016-10-14 19:05     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 05/20] mm: Trim __do_fault() arguments Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-14 20:31   ` Ross Zwisler
2016-10-17  9:04     ` Jan Kara
2016-10-17  9:04       ` Jan Kara
2016-09-27 16:08 ` [PATCH 06/20] mm: Use pass vm_fault structure for in wp_pfn_shared() Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-14 21:04   ` Ross Zwisler
2016-10-14 21:04     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 07/20] mm: Add orig_pte field into vm_fault Jan Kara
2016-10-17 16:45   ` Ross Zwisler
2016-10-17 16:45     ` Ross Zwisler
2016-10-18 10:13     ` Jan Kara
2016-10-18 10:13       ` Jan Kara
2016-09-27 16:08 ` [PATCH 08/20] mm: Allow full handling of COW faults in ->fault handlers Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-17 16:50   ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 09/20] mm: Factor out functionality to finish page faults Jan Kara
2016-10-17 17:38   ` Ross Zwisler
2016-10-17 17:38     ` Ross Zwisler
2016-10-17 17:40   ` Ross Zwisler
2016-10-17 17:40     ` Ross Zwisler
2016-10-18  9:44     ` Jan Kara
2016-09-27 16:08 ` [PATCH 10/20] mm: Move handling of COW faults into DAX code Jan Kara
2016-10-17 19:29   ` Ross Zwisler
2016-10-17 19:29     ` Ross Zwisler
2016-10-18 10:32     ` Jan Kara
2016-10-18 10:32       ` Jan Kara
2016-09-27 16:08 ` [PATCH 11/20] mm: Remove unnecessary vma->vm_ops check Jan Kara
2016-10-17 19:40   ` Ross Zwisler
2016-10-17 19:40     ` Ross Zwisler
2016-10-18 10:37     ` Jan Kara
2016-10-18 10:37       ` Jan Kara
2016-09-27 16:08 ` [PATCH 12/20] mm: Factor out common parts of write fault handling Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-17 22:08   ` Ross Zwisler
2016-10-17 22:08     ` Ross Zwisler
2016-10-18 10:50     ` Jan Kara
2016-10-18 17:32       ` Ross Zwisler
2016-10-18 17:32         ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 13/20] mm: Pass vm_fault structure into do_page_mkwrite() Jan Kara
2016-10-17 22:29   ` Ross Zwisler
2016-10-17 22:29     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 14/20] mm: Use vmf->page during WP faults Jan Kara
2016-10-18 17:56   ` Ross Zwisler
2016-10-18 17:56     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 15/20] mm: Move part of wp_page_reuse() into the single call site Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-18 17:59   ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 16/20] mm: Provide helper for finishing mkwrite faults Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-18 18:35   ` Ross Zwisler
2016-10-18 18:35     ` Ross Zwisler
2016-10-19  7:16     ` Jan Kara
2016-10-19  7:16       ` Jan Kara
2016-10-19 17:21       ` Ross Zwisler
2016-10-19 17:21         ` Ross Zwisler
2016-10-20  8:48         ` Jan Kara
2016-10-20  8:48           ` Jan Kara
2016-09-27 16:08 ` [PATCH 17/20] mm: Export follow_pte() Jan Kara
2016-10-18 18:37   ` Ross Zwisler
2016-10-18 18:37     ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 18/20] dax: Make cache flushing protected by entry lock Jan Kara
2016-10-18 19:20   ` Ross Zwisler
2016-10-19  7:19     ` Jan Kara
2016-10-19  7:19       ` Jan Kara
2016-10-19 18:25     ` Ross Zwisler
2016-10-19 18:25       ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 19/20] dax: Protect PTE modification on WP fault by radix tree " Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-18 19:53   ` Ross Zwisler
2016-10-18 19:53     ` Ross Zwisler
2016-10-19  7:25     ` Jan Kara
2016-10-19  7:25       ` Jan Kara
2016-10-19 17:25       ` Ross Zwisler
2016-10-19 17:25         ` Ross Zwisler
2016-09-27 16:08 ` [PATCH 20/20] dax: Clear dirty entry tags on cache flush Jan Kara
2016-09-27 16:08   ` Jan Kara
2016-10-18 22:12   ` Ross Zwisler
2016-10-18 22:12     ` Ross Zwisler
2016-10-19  7:30     ` Jan Kara
2016-10-19  7:30       ` Jan Kara
2016-10-19 16:38       ` Ross Zwisler
2016-10-19 16:38         ` Ross Zwisler
2016-09-30  9:14 ` [PATCH 0/20 v3] dax: Clear dirty bits after flushing caches Christoph Hellwig
2016-10-03  7:59   ` Jan Kara
2016-10-03  8:03     ` Christoph Hellwig
2016-10-03  8:15       ` Jan Kara
2016-10-03  8:15         ` Jan Kara
2016-10-03  9:32         ` Christoph Hellwig
2016-10-03  9:32           ` Christoph Hellwig
2016-10-03 11:13           ` Jan Kara
     [not found]             ` <20161003111358.GQ6457-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2016-10-13 20:34               ` Ross Zwisler
2016-10-13 20:34                 ` Ross Zwisler
2016-10-17  8:47                 ` Jan Kara
2016-10-17  8:47                   ` Jan Kara
     [not found]                   ` <20161017084732.GD3359-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2016-10-17 18:59                     ` Ross Zwisler
2016-10-17 18:59                       ` Ross Zwisler
2016-10-18  9:49                       ` Jan Kara
2016-10-18  9:49                         ` Jan Kara
2016-11-18  9:17 [PATCH 0/20 v5] " Jan Kara
2016-11-18  9:17 ` [PATCH 10/20] mm: Move handling of COW faults into DAX code Jan Kara
2016-11-18  9:17   ` Jan Kara
2016-11-21  4:39   ` Ross Zwisler
2016-11-21  4:39     ` Ross Zwisler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.