All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] DAX cleanups
@ 2016-01-31 12:19 ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Very little exciting in here.  This is all based on the PUD support code
that I just sent, mostly addressing things that came up during review
of the PUD code but weren't really justifiable as being mixed into the
adding of PUD support.

Matthew Wilcox (6):
  dax: Use vmf->gfp_mask
  dax: Remove unnecessary rechecking of i_size
  dax: Use vmf->pgoff in fault handlers
  dax: Use PAGE_CACHE_SIZE where appropriate
  dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault
  dax: Factor dax_insert_pud_mapping out of dax_pud_fault

 fs/dax.c | 395 ++++++++++++++++++++++++++-------------------------------------
 1 file changed, 164 insertions(+), 231 deletions(-)

-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/6] DAX cleanups
@ 2016-01-31 12:19 ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Very little exciting in here.  This is all based on the PUD support code
that I just sent, mostly addressing things that came up during review
of the PUD code but weren't really justifiable as being mixed into the
adding of PUD support.

Matthew Wilcox (6):
  dax: Use vmf->gfp_mask
  dax: Remove unnecessary rechecking of i_size
  dax: Use vmf->pgoff in fault handlers
  dax: Use PAGE_CACHE_SIZE where appropriate
  dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault
  dax: Factor dax_insert_pud_mapping out of dax_pud_fault

 fs/dax.c | 395 ++++++++++++++++++++++++++-------------------------------------
 1 file changed, 164 insertions(+), 231 deletions(-)

-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] dax: Use vmf->gfp_mask
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

We were assuming that it was OK to do a GFP_KERNEL allocation in page
fault context.  That appears to be largely true, but filesystems are
permitted to override that in their setting of mapping->gfp_flags, which
the VM then massages into vmf->gfp_flags.  No practical difference for
now, but there may come a day when we would have surprised a filesystem.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 2f9bb89..11be8c7 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -292,7 +292,7 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
 	struct inode *inode = mapping->host;
 	if (!page)
 		page = find_or_create_page(mapping, vmf->pgoff,
-						GFP_KERNEL | __GFP_ZERO);
+						vmf->gfp_mask | __GFP_ZERO);
 	if (!page)
 		return VM_FAULT_OOM;
 	/* Recheck i_size under page lock to avoid truncate race */
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/6] dax: Use vmf->gfp_mask
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

We were assuming that it was OK to do a GFP_KERNEL allocation in page
fault context.  That appears to be largely true, but filesystems are
permitted to override that in their setting of mapping->gfp_flags, which
the VM then massages into vmf->gfp_flags.  No practical difference for
now, but there may come a day when we would have surprised a filesystem.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 2f9bb89..11be8c7 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -292,7 +292,7 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
 	struct inode *inode = mapping->host;
 	if (!page)
 		page = find_or_create_page(mapping, vmf->pgoff,
-						GFP_KERNEL | __GFP_ZERO);
+						vmf->gfp_mask | __GFP_ZERO);
 	if (!page)
 		return VM_FAULT_OOM;
 	/* Recheck i_size under page lock to avoid truncate race */
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] dax: Remove unnecessary rechecking of i_size
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

When i_mmap_lock (or the page lock) was the only protection against
truncate, we checked i_size at the beginning of the fault handler,
then rechecked it after acquiring the lock.  Since the fliesystems now
exclude truncate from racing with the fault handler, we no longer need
to recheck i_size.  We do, of course, still need to check i_size at the
entry to the fault handler.

Also remove the now-unnecessary acquisitions of i_mmap_lock.  One of
the acquisitions is still needed, so put a big fat comment beside it to
prevent the well-intentioned from removing it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 98 +++++++---------------------------------------------------------
 1 file changed, 10 insertions(+), 88 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 11be8c7..696ff90 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -288,21 +288,11 @@ EXPORT_SYMBOL_GPL(dax_do_io);
 static int dax_load_hole(struct address_space *mapping, struct page *page,
 							struct vm_fault *vmf)
 {
-	unsigned long size;
-	struct inode *inode = mapping->host;
 	if (!page)
 		page = find_or_create_page(mapping, vmf->pgoff,
 						vmf->gfp_mask | __GFP_ZERO);
 	if (!page)
 		return VM_FAULT_OOM;
-	/* Recheck i_size under page lock to avoid truncate race */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (vmf->pgoff >= size) {
-		unlock_page(page);
-		page_cache_release(page);
-		return VM_FAULT_SIGBUS;
-	}
-
 	vmf->page = page;
 	return VM_FAULT_LOCKED;
 }
@@ -529,24 +519,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 		.sector = to_sector(bh, inode),
 		.size = bh->b_size,
 	};
-	pgoff_t size;
 	int error;
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * Check truncate didn't happen while we were allocating a block.
-	 * If it did, this block may or may not be still allocated to the
-	 * file.  We can't tell the filesystem to free it because we can't
-	 * take i_mutex here.  In the worst case, the file still has blocks
-	 * allocated past the end of the file.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (unlikely(vmf->pgoff >= size)) {
-		error = -EIO;
-		goto out;
-	}
-
 	if (dax_map_atomic(bdev, &dax) < 0) {
 		error = PTR_ERR(dax.addr);
 		goto out;
@@ -566,8 +540,6 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	error = vm_insert_mixed(vma, vaddr, dax.pfn);
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	return error;
 }
 
@@ -607,15 +579,6 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			page_cache_release(page);
 			goto repeat;
 		}
-		size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		if (unlikely(vmf->pgoff >= size)) {
-			/*
-			 * We have a struct page covering a hole in the file
-			 * from a read fault and we've raced with a truncate
-			 */
-			error = -EIO;
-			goto unlock_page;
-		}
 	}
 
 	error = get_block(inode, block, &bh, 0);
@@ -648,17 +611,17 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		if (error)
 			goto unlock_page;
 		vmf->page = page;
-		if (!page) {
+
+		/*
+		 * A truncate must remove COWs of pages that are removed
+		 * from the file.  If we have a struct page, the normal
+		 * page lock mechanism prevents truncate from missing the
+		 * COWed page.  If not, the i_mmap_lock can provide the
+		 * same guarantee.  It is dropped by the caller after the
+		 * page is safely in the page tables.
+		 */
+		if (!page)
 			i_mmap_lock_read(mapping);
-			/* Check we didn't race with truncate */
-			size = (i_size_read(inode) + PAGE_SIZE - 1) >>
-								PAGE_SHIFT;
-			if (vmf->pgoff >= size) {
-				i_mmap_unlock_read(mapping);
-				error = -EIO;
-				goto out;
-			}
-		}
 		return VM_FAULT_LOCKED;
 	}
 
@@ -820,25 +783,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		truncate_pagecache_range(inode, lstart, lend);
 	}
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * If a truncate happened while we were allocating blocks, we may
-	 * leave blocks allocated to the file that are beyond EOF.  We can't
-	 * take i_mutex here, so just leave them hanging; they'll be freed
-	 * when the file is deleted.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size) {
-		result = VM_FAULT_SIGBUS;
-		goto out;
-	}
-	if ((pgoff | PG_PMD_COLOUR) >= size) {
-		dax_pmd_dbg(&bh, address,
-				"offset + huge page size > file size");
-		goto fallback;
-	}
-
 	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
 		spinlock_t *ptl;
 		pmd_t entry, *pmd = vmf->pmd;
@@ -938,8 +882,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	if (buffer_unwritten(&bh))
 		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 
@@ -1053,24 +995,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		truncate_pagecache_range(inode, lstart, lend);
 	}
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * If a truncate happened while we were allocating blocks, we may
-	 * leave blocks allocated to the file that are beyond EOF.  We can't
-	 * take i_mutex here, so just leave them hanging; they'll be freed
-	 * when the file is deleted.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size) {
-		result = VM_FAULT_SIGBUS;
-		goto out;
-	}
-	if ((pgoff | PG_PUD_COLOUR) >= size) {
-		dax_pud_dbg(&bh, address, "page extends outside VMA");
-		goto fallback;
-	}
-
 	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
 		dax_pud_dbg(&bh, address, "no zero page");
 		goto fallback;
@@ -1121,8 +1045,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	if (buffer_unwritten(&bh))
 		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] dax: Remove unnecessary rechecking of i_size
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

When i_mmap_lock (or the page lock) was the only protection against
truncate, we checked i_size at the beginning of the fault handler,
then rechecked it after acquiring the lock.  Since the fliesystems now
exclude truncate from racing with the fault handler, we no longer need
to recheck i_size.  We do, of course, still need to check i_size at the
entry to the fault handler.

Also remove the now-unnecessary acquisitions of i_mmap_lock.  One of
the acquisitions is still needed, so put a big fat comment beside it to
prevent the well-intentioned from removing it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 98 +++++++---------------------------------------------------------
 1 file changed, 10 insertions(+), 88 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 11be8c7..696ff90 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -288,21 +288,11 @@ EXPORT_SYMBOL_GPL(dax_do_io);
 static int dax_load_hole(struct address_space *mapping, struct page *page,
 							struct vm_fault *vmf)
 {
-	unsigned long size;
-	struct inode *inode = mapping->host;
 	if (!page)
 		page = find_or_create_page(mapping, vmf->pgoff,
 						vmf->gfp_mask | __GFP_ZERO);
 	if (!page)
 		return VM_FAULT_OOM;
-	/* Recheck i_size under page lock to avoid truncate race */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (vmf->pgoff >= size) {
-		unlock_page(page);
-		page_cache_release(page);
-		return VM_FAULT_SIGBUS;
-	}
-
 	vmf->page = page;
 	return VM_FAULT_LOCKED;
 }
@@ -529,24 +519,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 		.sector = to_sector(bh, inode),
 		.size = bh->b_size,
 	};
-	pgoff_t size;
 	int error;
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * Check truncate didn't happen while we were allocating a block.
-	 * If it did, this block may or may not be still allocated to the
-	 * file.  We can't tell the filesystem to free it because we can't
-	 * take i_mutex here.  In the worst case, the file still has blocks
-	 * allocated past the end of the file.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (unlikely(vmf->pgoff >= size)) {
-		error = -EIO;
-		goto out;
-	}
-
 	if (dax_map_atomic(bdev, &dax) < 0) {
 		error = PTR_ERR(dax.addr);
 		goto out;
@@ -566,8 +540,6 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	error = vm_insert_mixed(vma, vaddr, dax.pfn);
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	return error;
 }
 
@@ -607,15 +579,6 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			page_cache_release(page);
 			goto repeat;
 		}
-		size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		if (unlikely(vmf->pgoff >= size)) {
-			/*
-			 * We have a struct page covering a hole in the file
-			 * from a read fault and we've raced with a truncate
-			 */
-			error = -EIO;
-			goto unlock_page;
-		}
 	}
 
 	error = get_block(inode, block, &bh, 0);
@@ -648,17 +611,17 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		if (error)
 			goto unlock_page;
 		vmf->page = page;
-		if (!page) {
+
+		/*
+		 * A truncate must remove COWs of pages that are removed
+		 * from the file.  If we have a struct page, the normal
+		 * page lock mechanism prevents truncate from missing the
+		 * COWed page.  If not, the i_mmap_lock can provide the
+		 * same guarantee.  It is dropped by the caller after the
+		 * page is safely in the page tables.
+		 */
+		if (!page)
 			i_mmap_lock_read(mapping);
-			/* Check we didn't race with truncate */
-			size = (i_size_read(inode) + PAGE_SIZE - 1) >>
-								PAGE_SHIFT;
-			if (vmf->pgoff >= size) {
-				i_mmap_unlock_read(mapping);
-				error = -EIO;
-				goto out;
-			}
-		}
 		return VM_FAULT_LOCKED;
 	}
 
@@ -820,25 +783,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		truncate_pagecache_range(inode, lstart, lend);
 	}
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * If a truncate happened while we were allocating blocks, we may
-	 * leave blocks allocated to the file that are beyond EOF.  We can't
-	 * take i_mutex here, so just leave them hanging; they'll be freed
-	 * when the file is deleted.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size) {
-		result = VM_FAULT_SIGBUS;
-		goto out;
-	}
-	if ((pgoff | PG_PMD_COLOUR) >= size) {
-		dax_pmd_dbg(&bh, address,
-				"offset + huge page size > file size");
-		goto fallback;
-	}
-
 	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
 		spinlock_t *ptl;
 		pmd_t entry, *pmd = vmf->pmd;
@@ -938,8 +882,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	if (buffer_unwritten(&bh))
 		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 
@@ -1053,24 +995,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		truncate_pagecache_range(inode, lstart, lend);
 	}
 
-	i_mmap_lock_read(mapping);
-
-	/*
-	 * If a truncate happened while we were allocating blocks, we may
-	 * leave blocks allocated to the file that are beyond EOF.  We can't
-	 * take i_mutex here, so just leave them hanging; they'll be freed
-	 * when the file is deleted.
-	 */
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size) {
-		result = VM_FAULT_SIGBUS;
-		goto out;
-	}
-	if ((pgoff | PG_PUD_COLOUR) >= size) {
-		dax_pud_dbg(&bh, address, "page extends outside VMA");
-		goto fallback;
-	}
-
 	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
 		dax_pud_dbg(&bh, address, "no zero page");
 		goto fallback;
@@ -1121,8 +1045,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
  out:
-	i_mmap_unlock_read(mapping);
-
 	if (buffer_unwritten(&bh))
 		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] dax: Use vmf->pgoff in fault handlers
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Now that the PMD and PUD fault handlers are passed pgoff, there's no
need to calculate it themselves.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 696ff90..d0e1334 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,7 +709,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long pmd_addr = address & PMD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	struct block_device *bdev;
-	pgoff_t size, pgoff;
+	pgoff_t size;
 	sector_t block;
 	int error, result = 0;
 	bool alloc = false;
@@ -734,12 +734,11 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	pgoff = linear_page_index(vma, pmd_addr);
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size)
+	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PMD would cover blocks out of the file */
-	if ((pgoff | PG_PMD_COLOUR) >= size) {
+	if ((vmf->pgoff | PG_PMD_COLOUR) >= size) {
 		dax_pmd_dbg(NULL, address,
 				"offset + huge page size > file size");
 		return VM_FAULT_FALLBACK;
@@ -747,7 +746,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
 
 	bh.b_size = PMD_SIZE;
 
@@ -777,7 +776,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
 		loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
@@ -863,8 +862,8 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		 * the write to insert a dirty entry.
 		 */
 		if (write) {
-			error = dax_radix_entry(mapping, pgoff, dax.sector,
-					true, true);
+			error = dax_radix_entry(mapping, vmf->pgoff,
+						dax.sector, true, true);
 			if (error) {
 				dax_pmd_dbg(&bh, address,
 						"PMD radix insertion failed");
@@ -921,7 +920,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long pud_addr = address & PUD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	struct block_device *bdev;
-	pgoff_t size, pgoff;
+	pgoff_t size;
 	sector_t block;
 	int result = 0;
 	bool alloc = false;
@@ -946,12 +945,11 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	pgoff = linear_page_index(vma, pud_addr);
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size)
+	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PUD would cover blocks out of the file */
-	if ((pgoff | PG_PUD_COLOUR) >= size) {
+	if ((vmf->pgoff | PG_PUD_COLOUR) >= size) {
 		dax_pud_dbg(NULL, address,
 				"offset + huge page size > file size");
 		return VM_FAULT_FALLBACK;
@@ -959,7 +957,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
 
 	bh.b_size = PUD_SIZE;
 
@@ -989,7 +987,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
 		loff_t lend = lstart + PUD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] dax: Use vmf->pgoff in fault handlers
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Now that the PMD and PUD fault handlers are passed pgoff, there's no
need to calculate it themselves.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 696ff90..d0e1334 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,7 +709,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long pmd_addr = address & PMD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	struct block_device *bdev;
-	pgoff_t size, pgoff;
+	pgoff_t size;
 	sector_t block;
 	int error, result = 0;
 	bool alloc = false;
@@ -734,12 +734,11 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	pgoff = linear_page_index(vma, pmd_addr);
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size)
+	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PMD would cover blocks out of the file */
-	if ((pgoff | PG_PMD_COLOUR) >= size) {
+	if ((vmf->pgoff | PG_PMD_COLOUR) >= size) {
 		dax_pmd_dbg(NULL, address,
 				"offset + huge page size > file size");
 		return VM_FAULT_FALLBACK;
@@ -747,7 +746,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
 
 	bh.b_size = PMD_SIZE;
 
@@ -777,7 +776,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
 		loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
@@ -863,8 +862,8 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		 * the write to insert a dirty entry.
 		 */
 		if (write) {
-			error = dax_radix_entry(mapping, pgoff, dax.sector,
-					true, true);
+			error = dax_radix_entry(mapping, vmf->pgoff,
+						dax.sector, true, true);
 			if (error) {
 				dax_pmd_dbg(&bh, address,
 						"PMD radix insertion failed");
@@ -921,7 +920,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long pud_addr = address & PUD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	struct block_device *bdev;
-	pgoff_t size, pgoff;
+	pgoff_t size;
 	sector_t block;
 	int result = 0;
 	bool alloc = false;
@@ -946,12 +945,11 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	pgoff = linear_page_index(vma, pud_addr);
 	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	if (pgoff >= size)
+	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PUD would cover blocks out of the file */
-	if ((pgoff | PG_PUD_COLOUR) >= size) {
+	if ((vmf->pgoff | PG_PUD_COLOUR) >= size) {
 		dax_pud_dbg(NULL, address,
 				"offset + huge page size > file size");
 		return VM_FAULT_FALLBACK;
@@ -959,7 +957,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
 
 	bh.b_size = PUD_SIZE;
 
@@ -989,7 +987,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
 		loff_t lend = lstart + PUD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.
The important thing to remember is that the VM is gicing us a pgoff_t
and asking us to populate that.  If PAGE_CACHE_SIZE were larger than
PAGE_SIZE, then we would not successfully fill in the PTEs for faults
that occurred in the upper portions of PAGE_CACHE_SIZE.

Of course, we actually only fill in one PTE, so this still doesn't solve
the problem.  I have my doubts we will ever increase PAGE_CACHE_SIZE
now that we have map_pages.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d0e1334..f0c204d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -558,14 +558,14 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	int error;
 	int major = 0;
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 
 	memset(&bh, 0, sizeof(bh));
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 	bh.b_bdev = inode->i_sb->s_bdev;
-	bh.b_size = PAGE_SIZE;
+	bh.b_size = PAGE_CACHE_SIZE;
 
  repeat:
 	page = find_get_page(mapping, vmf->pgoff);
@@ -582,7 +582,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
 	error = get_block(inode, block, &bh, 0);
-	if (!error && (bh.b_size < PAGE_SIZE))
+	if (!error && (bh.b_size < PAGE_CACHE_SIZE))
 		error = -EIO;		/* fs corruption? */
 	if (error)
 		goto unlock_page;
@@ -593,7 +593,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			count_vm_event(PGMAJFAULT);
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
-			if (!error && (bh.b_size < PAGE_SIZE))
+			if (!error && (bh.b_size < PAGE_CACHE_SIZE))
 				error = -EIO;
 			if (error)
 				goto unlock_page;
@@ -630,7 +630,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		page = find_lock_page(mapping, vmf->pgoff);
 
 	if (page) {
-		unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT,
+		unmap_mapping_range(mapping, vmf->pgoff << PAGE_CACHE_SHIFT,
 							PAGE_CACHE_SIZE, 0);
 		delete_from_page_cache(page);
 		unlock_page(page);
@@ -677,7 +677,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
  * The 'colour' (ie low bits) within a PMD of a page offset.  This comes up
  * more often than one might expect in the below function.
  */
-#define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
+#define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_CACHE_SHIFT) - 1)
 
 static void __dax_dbg(struct buffer_head *bh, unsigned long address,
 		const char *reason, const char *fn)
@@ -734,7 +734,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PMD would cover blocks out of the file */
@@ -746,7 +746,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 
 	bh.b_size = PMD_SIZE;
 
@@ -776,7 +776,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_CACHE_SHIFT;
 		loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
@@ -904,7 +904,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
  * The 'colour' (ie low bits) within a PUD of a page offset.  This comes up
  * more often than one might expect in the below function.
  */
-#define PG_PUD_COLOUR	((PUD_SIZE >> PAGE_SHIFT) - 1)
+#define PG_PUD_COLOUR	((PUD_SIZE >> PAGE_CACHE_SHIFT) - 1)
 
 #define dax_pud_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pud")
 
@@ -945,7 +945,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PUD would cover blocks out of the file */
@@ -957,7 +957,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 
 	bh.b_size = PUD_SIZE;
 
@@ -987,7 +987,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_CACHE_SHIFT;
 		loff_t lend = lstart + PUD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.
The important thing to remember is that the VM is gicing us a pgoff_t
and asking us to populate that.  If PAGE_CACHE_SIZE were larger than
PAGE_SIZE, then we would not successfully fill in the PTEs for faults
that occurred in the upper portions of PAGE_CACHE_SIZE.

Of course, we actually only fill in one PTE, so this still doesn't solve
the problem.  I have my doubts we will ever increase PAGE_CACHE_SIZE
now that we have map_pages.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d0e1334..f0c204d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -558,14 +558,14 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	int error;
 	int major = 0;
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 
 	memset(&bh, 0, sizeof(bh));
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 	bh.b_bdev = inode->i_sb->s_bdev;
-	bh.b_size = PAGE_SIZE;
+	bh.b_size = PAGE_CACHE_SIZE;
 
  repeat:
 	page = find_get_page(mapping, vmf->pgoff);
@@ -582,7 +582,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	}
 
 	error = get_block(inode, block, &bh, 0);
-	if (!error && (bh.b_size < PAGE_SIZE))
+	if (!error && (bh.b_size < PAGE_CACHE_SIZE))
 		error = -EIO;		/* fs corruption? */
 	if (error)
 		goto unlock_page;
@@ -593,7 +593,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			count_vm_event(PGMAJFAULT);
 			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
 			major = VM_FAULT_MAJOR;
-			if (!error && (bh.b_size < PAGE_SIZE))
+			if (!error && (bh.b_size < PAGE_CACHE_SIZE))
 				error = -EIO;
 			if (error)
 				goto unlock_page;
@@ -630,7 +630,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		page = find_lock_page(mapping, vmf->pgoff);
 
 	if (page) {
-		unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT,
+		unmap_mapping_range(mapping, vmf->pgoff << PAGE_CACHE_SHIFT,
 							PAGE_CACHE_SIZE, 0);
 		delete_from_page_cache(page);
 		unlock_page(page);
@@ -677,7 +677,7 @@ static int dax_pte_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
  * The 'colour' (ie low bits) within a PMD of a page offset.  This comes up
  * more often than one might expect in the below function.
  */
-#define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
+#define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_CACHE_SHIFT) - 1)
 
 static void __dax_dbg(struct buffer_head *bh, unsigned long address,
 		const char *reason, const char *fn)
@@ -734,7 +734,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PMD would cover blocks out of the file */
@@ -746,7 +746,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 
 	bh.b_size = PMD_SIZE;
 
@@ -776,7 +776,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_CACHE_SHIFT;
 		loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
@@ -904,7 +904,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
  * The 'colour' (ie low bits) within a PUD of a page offset.  This comes up
  * more often than one might expect in the below function.
  */
-#define PG_PUD_COLOUR	((PUD_SIZE >> PAGE_SHIFT) - 1)
+#define PG_PUD_COLOUR	((PUD_SIZE >> PAGE_CACHE_SHIFT) - 1)
 
 #define dax_pud_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pud")
 
@@ -945,7 +945,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_FALLBACK;
 	}
 
-	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
 		return VM_FAULT_SIGBUS;
 	/* If the PUD would cover blocks out of the file */
@@ -957,7 +957,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 	memset(&bh, 0, sizeof(bh));
 	bh.b_bdev = inode->i_sb->s_bdev;
-	block = (sector_t)vmf->pgoff << (PAGE_SHIFT - blkbits);
+	block = (sector_t)vmf->pgoff << (PAGE_CACHE_SHIFT - blkbits);
 
 	bh.b_size = PUD_SIZE;
 
@@ -987,7 +987,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	 * zero pages covering this hole
 	 */
 	if (alloc) {
-		loff_t lstart = vmf->pgoff << PAGE_SHIFT;
+		loff_t lstart = vmf->pgoff << PAGE_CACHE_SHIFT;
 		loff_t lend = lstart + PUD_SIZE - 1; /* inclusive */
 
 		truncate_pagecache_range(inode, lstart, lend);
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

These two functios are still large, but they're no longer quite so
ludicrously large

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 153 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 80 insertions(+), 73 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f0c204d..ec31e6e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -697,6 +697,83 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address,
 
 #define dax_pmd_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pmd")
 
+static int dax_insert_pmd_mapping(struct inode *inode, struct buffer_head *bh,
+			struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	int major = 0;
+	struct blk_dax_ctl dax = {
+		.sector = to_sector(bh, inode),
+		.size = PMD_SIZE,
+	};
+	struct block_device *bdev = bh->b_bdev;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	unsigned long address = (unsigned long)vmf->virtual_address;
+	long length = dax_map_atomic(bdev, &dax);
+
+	if (length < 0)
+		return VM_FAULT_SIGBUS;
+	if (length < PMD_SIZE) {
+		dax_pmd_dbg(bh, address, "dax-length too small");
+		goto unmap;
+	}
+
+	if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR) {
+		dax_pmd_dbg(bh, address, "pfn unaligned");
+		goto unmap;
+	}
+
+	if (!pfn_t_devmap(dax.pfn)) {
+		dax_pmd_dbg(bh, address, "pfn not in memmap");
+		goto unmap;
+	}
+
+	if (buffer_unwritten(bh) || buffer_new(bh)) {
+		clear_pmem(dax.addr, PMD_SIZE);
+		wmb_pmem();
+		count_vm_event(PGMAJFAULT);
+		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+		major = VM_FAULT_MAJOR;
+	}
+	dax_unmap_atomic(bdev, &dax);
+
+	/*
+	 * For PTE faults we insert a radix tree entry for reads, and leave
+	 * it clean.  Then on the first write we dirty the radix tree entry
+	 * via the dax_pfn_mkwrite() path.  This sequence allows the
+	 * dax_pfn_mkwrite() call to be simpler and avoid a call into
+	 * get_block() to translate the pgoff to a sector in order to be able
+	 * to create a new radix tree entry.
+	 *
+	 * The PMD path doesn't have an equivalent to dax_pfn_mkwrite(),
+	 * though, so for a read followed by a write we traverse all the way
+	 * through dax_pmd_fault() twice.  This means we can just skip
+	 * inserting a radix tree entry completely on the initial read and
+	 * just wait until the write to insert a dirty entry.
+	 */
+	if (write) {
+		int error = dax_radix_entry(vma->vm_file->f_mapping, vmf->pgoff,
+						dax.sector, true, true);
+		if (error) {
+			dax_pmd_dbg(bh, address,
+					"PMD radix insertion failed");
+			goto fallback;
+		}
+	}
+
+	dev_dbg(part_to_dev(bdev->bd_part),
+			"%s: %s addr: %lx pfn: %lx sect: %llx\n",
+			__func__, current->comm, address,
+			pfn_t_to_pfn(dax.pfn),
+			(unsigned long long) dax.sector);
+	return major | vmf_insert_pfn_pmd(vma, address, vmf->pmd,
+						dax.pfn, write);
+ unmap:
+	dax_unmap_atomic(bdev, &dax);
+ fallback:
+	count_vm_event(THP_FAULT_FALLBACK);
+	return VM_FAULT_FALLBACK;
+}
+
 static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		get_block_t get_block, dax_iodone_t complete_unwritten)
 {
@@ -708,10 +785,9 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long address = (unsigned long)vmf->virtual_address;
 	unsigned long pmd_addr = address & PMD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	struct block_device *bdev;
 	pgoff_t size;
 	sector_t block;
-	int error, result = 0;
+	int result;
 	bool alloc = false;
 
 	/* dax pmd mappings require pfn_t_devmap() */
@@ -759,8 +835,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		alloc = true;
 	}
 
-	bdev = bh.b_bdev;
-
 	/*
 	 * If the filesystem isn't willing to tell us the length of a hole,
 	 * just fall back to PTEs.  Calling get_block 512 times in a loop
@@ -799,7 +873,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			goto fallback;
 		}
 
-		dev_dbg(part_to_dev(bdev->bd_part),
+		dev_dbg(part_to_dev(bh.b_bdev->bd_part),
 				"%s: %s addr: %lx pfn: <zero> sect: %llx\n",
 				__func__, current->comm, address,
 				(unsigned long long) to_sector(&bh, inode));
@@ -810,74 +884,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		result = VM_FAULT_NOPAGE;
 		spin_unlock(ptl);
 	} else {
-		struct blk_dax_ctl dax = {
-			.sector = to_sector(&bh, inode),
-			.size = PMD_SIZE,
-		};
-		long length = dax_map_atomic(bdev, &dax);
-
-		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
-		}
-		if (length < PMD_SIZE) {
-			dax_pmd_dbg(&bh, address, "dax-length too small");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-		if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR) {
-			dax_pmd_dbg(&bh, address, "pfn unaligned");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-
-		if (!pfn_t_devmap(dax.pfn)) {
-			dax_unmap_atomic(bdev, &dax);
-			dax_pmd_dbg(&bh, address, "pfn not in memmap");
-			goto fallback;
-		}
-
-		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-			clear_pmem(dax.addr, PMD_SIZE);
-			wmb_pmem();
-			count_vm_event(PGMAJFAULT);
-			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-			result |= VM_FAULT_MAJOR;
-		}
-		dax_unmap_atomic(bdev, &dax);
-
-		/*
-		 * For PTE faults we insert a radix tree entry for reads, and
-		 * leave it clean.  Then on the first write we dirty the radix
-		 * tree entry via the dax_pfn_mkwrite() path.  This sequence
-		 * allows the dax_pfn_mkwrite() call to be simpler and avoid a
-		 * call into get_block() to translate the pgoff to a sector in
-		 * order to be able to create a new radix tree entry.
-		 *
-		 * The PMD path doesn't have an equivalent to
-		 * dax_pfn_mkwrite(), though, so for a read followed by a
-		 * write we traverse all the way through dax_pmd_fault()
-		 * twice.  This means we can just skip inserting a radix tree
-		 * entry completely on the initial read and just wait until
-		 * the write to insert a dirty entry.
-		 */
-		if (write) {
-			error = dax_radix_entry(mapping, vmf->pgoff,
-						dax.sector, true, true);
-			if (error) {
-				dax_pmd_dbg(&bh, address,
-						"PMD radix insertion failed");
-				goto fallback;
-			}
-		}
-
-		dev_dbg(part_to_dev(bdev->bd_part),
-				"%s: %s addr: %lx pfn: %lx sect: %llx\n",
-				__func__, current->comm, address,
-				pfn_t_to_pfn(dax.pfn),
-				(unsigned long long) dax.sector);
-		result |= vmf_insert_pfn_pmd(vma, address, vmf->pmd,
-				dax.pfn, write);
+		result = dax_insert_pmd_mapping(inode, &bh, vma, vmf);
 	}
 
  out:
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

These two functios are still large, but they're no longer quite so
ludicrously large

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 153 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 80 insertions(+), 73 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f0c204d..ec31e6e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -697,6 +697,83 @@ static void __dax_dbg(struct buffer_head *bh, unsigned long address,
 
 #define dax_pmd_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pmd")
 
+static int dax_insert_pmd_mapping(struct inode *inode, struct buffer_head *bh,
+			struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	int major = 0;
+	struct blk_dax_ctl dax = {
+		.sector = to_sector(bh, inode),
+		.size = PMD_SIZE,
+	};
+	struct block_device *bdev = bh->b_bdev;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	unsigned long address = (unsigned long)vmf->virtual_address;
+	long length = dax_map_atomic(bdev, &dax);
+
+	if (length < 0)
+		return VM_FAULT_SIGBUS;
+	if (length < PMD_SIZE) {
+		dax_pmd_dbg(bh, address, "dax-length too small");
+		goto unmap;
+	}
+
+	if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR) {
+		dax_pmd_dbg(bh, address, "pfn unaligned");
+		goto unmap;
+	}
+
+	if (!pfn_t_devmap(dax.pfn)) {
+		dax_pmd_dbg(bh, address, "pfn not in memmap");
+		goto unmap;
+	}
+
+	if (buffer_unwritten(bh) || buffer_new(bh)) {
+		clear_pmem(dax.addr, PMD_SIZE);
+		wmb_pmem();
+		count_vm_event(PGMAJFAULT);
+		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+		major = VM_FAULT_MAJOR;
+	}
+	dax_unmap_atomic(bdev, &dax);
+
+	/*
+	 * For PTE faults we insert a radix tree entry for reads, and leave
+	 * it clean.  Then on the first write we dirty the radix tree entry
+	 * via the dax_pfn_mkwrite() path.  This sequence allows the
+	 * dax_pfn_mkwrite() call to be simpler and avoid a call into
+	 * get_block() to translate the pgoff to a sector in order to be able
+	 * to create a new radix tree entry.
+	 *
+	 * The PMD path doesn't have an equivalent to dax_pfn_mkwrite(),
+	 * though, so for a read followed by a write we traverse all the way
+	 * through dax_pmd_fault() twice.  This means we can just skip
+	 * inserting a radix tree entry completely on the initial read and
+	 * just wait until the write to insert a dirty entry.
+	 */
+	if (write) {
+		int error = dax_radix_entry(vma->vm_file->f_mapping, vmf->pgoff,
+						dax.sector, true, true);
+		if (error) {
+			dax_pmd_dbg(bh, address,
+					"PMD radix insertion failed");
+			goto fallback;
+		}
+	}
+
+	dev_dbg(part_to_dev(bdev->bd_part),
+			"%s: %s addr: %lx pfn: %lx sect: %llx\n",
+			__func__, current->comm, address,
+			pfn_t_to_pfn(dax.pfn),
+			(unsigned long long) dax.sector);
+	return major | vmf_insert_pfn_pmd(vma, address, vmf->pmd,
+						dax.pfn, write);
+ unmap:
+	dax_unmap_atomic(bdev, &dax);
+ fallback:
+	count_vm_event(THP_FAULT_FALLBACK);
+	return VM_FAULT_FALLBACK;
+}
+
 static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		get_block_t get_block, dax_iodone_t complete_unwritten)
 {
@@ -708,10 +785,9 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long address = (unsigned long)vmf->virtual_address;
 	unsigned long pmd_addr = address & PMD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	struct block_device *bdev;
 	pgoff_t size;
 	sector_t block;
-	int error, result = 0;
+	int result;
 	bool alloc = false;
 
 	/* dax pmd mappings require pfn_t_devmap() */
@@ -759,8 +835,6 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		alloc = true;
 	}
 
-	bdev = bh.b_bdev;
-
 	/*
 	 * If the filesystem isn't willing to tell us the length of a hole,
 	 * just fall back to PTEs.  Calling get_block 512 times in a loop
@@ -799,7 +873,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			goto fallback;
 		}
 
-		dev_dbg(part_to_dev(bdev->bd_part),
+		dev_dbg(part_to_dev(bh.b_bdev->bd_part),
 				"%s: %s addr: %lx pfn: <zero> sect: %llx\n",
 				__func__, current->comm, address,
 				(unsigned long long) to_sector(&bh, inode));
@@ -810,74 +884,7 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		result = VM_FAULT_NOPAGE;
 		spin_unlock(ptl);
 	} else {
-		struct blk_dax_ctl dax = {
-			.sector = to_sector(&bh, inode),
-			.size = PMD_SIZE,
-		};
-		long length = dax_map_atomic(bdev, &dax);
-
-		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
-		}
-		if (length < PMD_SIZE) {
-			dax_pmd_dbg(&bh, address, "dax-length too small");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-		if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR) {
-			dax_pmd_dbg(&bh, address, "pfn unaligned");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-
-		if (!pfn_t_devmap(dax.pfn)) {
-			dax_unmap_atomic(bdev, &dax);
-			dax_pmd_dbg(&bh, address, "pfn not in memmap");
-			goto fallback;
-		}
-
-		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-			clear_pmem(dax.addr, PMD_SIZE);
-			wmb_pmem();
-			count_vm_event(PGMAJFAULT);
-			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-			result |= VM_FAULT_MAJOR;
-		}
-		dax_unmap_atomic(bdev, &dax);
-
-		/*
-		 * For PTE faults we insert a radix tree entry for reads, and
-		 * leave it clean.  Then on the first write we dirty the radix
-		 * tree entry via the dax_pfn_mkwrite() path.  This sequence
-		 * allows the dax_pfn_mkwrite() call to be simpler and avoid a
-		 * call into get_block() to translate the pgoff to a sector in
-		 * order to be able to create a new radix tree entry.
-		 *
-		 * The PMD path doesn't have an equivalent to
-		 * dax_pfn_mkwrite(), though, so for a read followed by a
-		 * write we traverse all the way through dax_pmd_fault()
-		 * twice.  This means we can just skip inserting a radix tree
-		 * entry completely on the initial read and just wait until
-		 * the write to insert a dirty entry.
-		 */
-		if (write) {
-			error = dax_radix_entry(mapping, vmf->pgoff,
-						dax.sector, true, true);
-			if (error) {
-				dax_pmd_dbg(&bh, address,
-						"PMD radix insertion failed");
-				goto fallback;
-			}
-		}
-
-		dev_dbg(part_to_dev(bdev->bd_part),
-				"%s: %s addr: %lx pfn: %lx sect: %llx\n",
-				__func__, current->comm, address,
-				pfn_t_to_pfn(dax.pfn),
-				(unsigned long long) dax.sector);
-		result |= vmf_insert_pfn_pmd(vma, address, vmf->pmd,
-				dax.pfn, write);
+		result = dax_insert_pmd_mapping(inode, &bh, vma, vmf);
 	}
 
  out:
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] dax: Factor dax_insert_pud_mapping out of dax_pud_fault
  2016-01-31 12:19 ` Matthew Wilcox
@ 2016-01-31 12:19   ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Follow the factoring done for dax_pmd_fault.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 100 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index ec31e6e..e9701d6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -915,6 +915,57 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 #define dax_pud_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pud")
 
+static int dax_insert_pud_mapping(struct inode *inode, struct buffer_head *bh,
+			struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	int major = 0;
+	struct blk_dax_ctl dax = {
+		.sector = to_sector(bh, inode),
+		.size = PUD_SIZE,
+	};
+	struct block_device *bdev = bh->b_bdev;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	unsigned long address = (unsigned long)vmf->virtual_address;
+	long length = dax_map_atomic(bdev, &dax);
+
+	if (length < 0)
+		return VM_FAULT_SIGBUS;
+	if (length < PUD_SIZE) {
+		dax_pud_dbg(bh, address, "dax-length too small");
+		goto unmap;
+	}
+	if (pfn_t_to_pfn(dax.pfn) & PG_PUD_COLOUR) {
+		dax_pud_dbg(bh, address, "pfn unaligned");
+		goto unmap;
+	}
+
+	if (!pfn_t_devmap(dax.pfn)) {
+		dax_pud_dbg(bh, address, "pfn not in memmap");
+		goto unmap;
+	}
+
+	if (buffer_unwritten(bh) || buffer_new(bh)) {
+		clear_pmem(dax.addr, PUD_SIZE);
+		wmb_pmem();
+		count_vm_event(PGMAJFAULT);
+		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+		major = VM_FAULT_MAJOR;
+	}
+	dax_unmap_atomic(bdev, &dax);
+
+	dev_dbg(part_to_dev(bdev->bd_part),
+			"%s: %s addr: %lx pfn: %lx sect: %llx\n",
+			__func__, current->comm, address,
+			pfn_t_to_pfn(dax.pfn),
+			(unsigned long long) dax.sector);
+	return major | vmf_insert_pfn_pud(vma, address, vmf->pud,
+						dax.pfn, write);
+ unmap:
+	dax_unmap_atomic(bdev, &dax);
+	count_vm_event(THP_FAULT_FALLBACK);
+	return VM_FAULT_FALLBACK;
+}
+
 static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		get_block_t get_block, dax_iodone_t complete_unwritten)
 {
@@ -926,10 +977,9 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long address = (unsigned long)vmf->virtual_address;
 	unsigned long pud_addr = address & PUD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	struct block_device *bdev;
 	pgoff_t size;
 	sector_t block;
-	int result = 0;
+	int result;
 	bool alloc = false;
 
 	/* dax pud mappings require pfn_t_devmap() */
@@ -977,8 +1027,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		alloc = true;
 	}
 
-	bdev = bh.b_bdev;
-
 	/*
 	 * If the filesystem isn't willing to tell us the length of a hole,
 	 * just fall back to PMDs.  Calling get_block 512 times in a loop
@@ -1004,49 +1052,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		dax_pud_dbg(&bh, address, "no zero page");
 		goto fallback;
 	} else {
-		struct blk_dax_ctl dax = {
-			.sector = to_sector(&bh, inode),
-			.size = PUD_SIZE,
-		};
-		long length = dax_map_atomic(bdev, &dax);
-
-		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
-		}
-		if (length < PUD_SIZE) {
-			dax_pud_dbg(&bh, address, "dax-length too small");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-		if (pfn_t_to_pfn(dax.pfn) & PG_PUD_COLOUR) {
-			dax_pud_dbg(&bh, address, "pfn unaligned");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-
-		if (!pfn_t_devmap(dax.pfn)) {
-			dax_unmap_atomic(bdev, &dax);
-			dax_pud_dbg(&bh, address, "pfn not in memmap");
-			goto fallback;
-		}
-
-		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-			clear_pmem(dax.addr, PUD_SIZE);
-			wmb_pmem();
-			count_vm_event(PGMAJFAULT);
-			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-			result |= VM_FAULT_MAJOR;
-		}
-		dax_unmap_atomic(bdev, &dax);
-
-		dev_dbg(part_to_dev(bdev->bd_part),
-				"%s: %s addr: %lx pfn: %lx sect: %llx\n",
-				__func__, current->comm, address,
-				pfn_t_to_pfn(dax.pfn),
-				(unsigned long long) dax.sector);
-		result |= vmf_insert_pfn_pud(vma, address, vmf->pud,
-				dax.pfn, write);
+		result = dax_insert_pud_mapping(inode, &bh, vma, vmf);
 	}
 
  out:
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] dax: Factor dax_insert_pud_mapping out of dax_pud_fault
@ 2016-01-31 12:19   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-01-31 12:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

Follow the factoring done for dax_pmd_fault.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/dax.c | 100 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index ec31e6e..e9701d6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -915,6 +915,57 @@ static int dax_pmd_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 
 #define dax_pud_dbg(bh, address, reason)	__dax_dbg(bh, address, reason, "dax_pud")
 
+static int dax_insert_pud_mapping(struct inode *inode, struct buffer_head *bh,
+			struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	int major = 0;
+	struct blk_dax_ctl dax = {
+		.sector = to_sector(bh, inode),
+		.size = PUD_SIZE,
+	};
+	struct block_device *bdev = bh->b_bdev;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	unsigned long address = (unsigned long)vmf->virtual_address;
+	long length = dax_map_atomic(bdev, &dax);
+
+	if (length < 0)
+		return VM_FAULT_SIGBUS;
+	if (length < PUD_SIZE) {
+		dax_pud_dbg(bh, address, "dax-length too small");
+		goto unmap;
+	}
+	if (pfn_t_to_pfn(dax.pfn) & PG_PUD_COLOUR) {
+		dax_pud_dbg(bh, address, "pfn unaligned");
+		goto unmap;
+	}
+
+	if (!pfn_t_devmap(dax.pfn)) {
+		dax_pud_dbg(bh, address, "pfn not in memmap");
+		goto unmap;
+	}
+
+	if (buffer_unwritten(bh) || buffer_new(bh)) {
+		clear_pmem(dax.addr, PUD_SIZE);
+		wmb_pmem();
+		count_vm_event(PGMAJFAULT);
+		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+		major = VM_FAULT_MAJOR;
+	}
+	dax_unmap_atomic(bdev, &dax);
+
+	dev_dbg(part_to_dev(bdev->bd_part),
+			"%s: %s addr: %lx pfn: %lx sect: %llx\n",
+			__func__, current->comm, address,
+			pfn_t_to_pfn(dax.pfn),
+			(unsigned long long) dax.sector);
+	return major | vmf_insert_pfn_pud(vma, address, vmf->pud,
+						dax.pfn, write);
+ unmap:
+	dax_unmap_atomic(bdev, &dax);
+	count_vm_event(THP_FAULT_FALLBACK);
+	return VM_FAULT_FALLBACK;
+}
+
 static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		get_block_t get_block, dax_iodone_t complete_unwritten)
 {
@@ -926,10 +977,9 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 	unsigned long address = (unsigned long)vmf->virtual_address;
 	unsigned long pud_addr = address & PUD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	struct block_device *bdev;
 	pgoff_t size;
 	sector_t block;
-	int result = 0;
+	int result;
 	bool alloc = false;
 
 	/* dax pud mappings require pfn_t_devmap() */
@@ -977,8 +1027,6 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		alloc = true;
 	}
 
-	bdev = bh.b_bdev;
-
 	/*
 	 * If the filesystem isn't willing to tell us the length of a hole,
 	 * just fall back to PMDs.  Calling get_block 512 times in a loop
@@ -1004,49 +1052,7 @@ static int dax_pud_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		dax_pud_dbg(&bh, address, "no zero page");
 		goto fallback;
 	} else {
-		struct blk_dax_ctl dax = {
-			.sector = to_sector(&bh, inode),
-			.size = PUD_SIZE,
-		};
-		long length = dax_map_atomic(bdev, &dax);
-
-		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
-		}
-		if (length < PUD_SIZE) {
-			dax_pud_dbg(&bh, address, "dax-length too small");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-		if (pfn_t_to_pfn(dax.pfn) & PG_PUD_COLOUR) {
-			dax_pud_dbg(&bh, address, "pfn unaligned");
-			dax_unmap_atomic(bdev, &dax);
-			goto fallback;
-		}
-
-		if (!pfn_t_devmap(dax.pfn)) {
-			dax_unmap_atomic(bdev, &dax);
-			dax_pud_dbg(&bh, address, "pfn not in memmap");
-			goto fallback;
-		}
-
-		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-			clear_pmem(dax.addr, PUD_SIZE);
-			wmb_pmem();
-			count_vm_event(PGMAJFAULT);
-			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-			result |= VM_FAULT_MAJOR;
-		}
-		dax_unmap_atomic(bdev, &dax);
-
-		dev_dbg(part_to_dev(bdev->bd_part),
-				"%s: %s addr: %lx pfn: %lx sect: %llx\n",
-				__func__, current->comm, address,
-				pfn_t_to_pfn(dax.pfn),
-				(unsigned long long) dax.sector);
-		result |= vmf_insert_pfn_pud(vma, address, vmf->pud,
-				dax.pfn, write);
+		result = dax_insert_pud_mapping(inode, &bh, vma, vmf);
 	}
 
  out:
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
  2016-01-31 12:19   ` Matthew Wilcox
@ 2016-02-01 13:10     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 20+ messages in thread
From: Kirill A. Shutemov @ 2016-02-01 13:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

On Sun, Jan 31, 2016 at 11:19:53PM +1100, Matthew Wilcox wrote:
> We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.

PAGE_CACHE_SIZE is non-sense. It never had any meaning. At least in
upstream. And only leads to confusion on border between vfs and mm.

We should just drop it.

I need to find time at some point to prepare patchset...

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
@ 2016-02-01 13:10     ` Kirill A. Shutemov
  0 siblings, 0 replies; 20+ messages in thread
From: Kirill A. Shutemov @ 2016-02-01 13:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-mm, linux-nvdimm, linux-fsdevel,
	linux-kernel, willy

On Sun, Jan 31, 2016 at 11:19:53PM +1100, Matthew Wilcox wrote:
> We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.

PAGE_CACHE_SIZE is non-sense. It never had any meaning. At least in
upstream. And only leads to confusion on border between vfs and mm.

We should just drop it.

I need to find time at some point to prepare patchset...

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
  2016-02-01 13:10     ` Kirill A. Shutemov
@ 2016-02-01 20:43       ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-02-01 20:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matthew Wilcox, Andrew Morton, linux-mm, linux-nvdimm,
	linux-fsdevel, linux-kernel

On Mon, Feb 01, 2016 at 03:10:19PM +0200, Kirill A. Shutemov wrote:
> On Sun, Jan 31, 2016 at 11:19:53PM +1100, Matthew Wilcox wrote:
> > We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.
> 
> PAGE_CACHE_SIZE is non-sense. It never had any meaning. At least in
> upstream. And only leads to confusion on border between vfs and mm.
> 
> We should just drop it.
> 
> I need to find time at some point to prepare patchset...

I argued in favour of this at last LSFMM and people were ... reluctant.
I think with your map_pages work, the PAGE_CACHE_SIZE idea now has no
potential performance win left.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate
@ 2016-02-01 20:43       ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2016-02-01 20:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matthew Wilcox, Andrew Morton, linux-mm, linux-nvdimm,
	linux-fsdevel, linux-kernel

On Mon, Feb 01, 2016 at 03:10:19PM +0200, Kirill A. Shutemov wrote:
> On Sun, Jan 31, 2016 at 11:19:53PM +1100, Matthew Wilcox wrote:
> > We were a little sloppy about using PAGE_SIZE instead of PAGE_CACHE_SIZE.
> 
> PAGE_CACHE_SIZE is non-sense. It never had any meaning. At least in
> upstream. And only leads to confusion on border between vfs and mm.
> 
> We should just drop it.
> 
> I need to find time at some point to prepare patchset...

I argued in favour of this at last LSFMM and people were ... reluctant.
I think with your map_pages work, the PAGE_CACHE_SIZE idea now has no
potential performance win left.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] dax: Use vmf->gfp_mask
  2016-01-31 12:19   ` Matthew Wilcox
@ 2016-02-23 23:37     ` Ross Zwisler
  -1 siblings, 0 replies; 20+ messages in thread
From: Ross Zwisler @ 2016-02-23 23:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-nvdimm, linux-kernel, linux-mm, linux-fsdevel

On Sun, Jan 31, 2016 at 11:19:50PM +1100, Matthew Wilcox wrote:
> We were assuming that it was OK to do a GFP_KERNEL allocation in page
> fault context.  That appears to be largely true, but filesystems are
> permitted to override that in their setting of mapping->gfp_flags, which
> the VM then massages into vmf->gfp_flags.  No practical difference for
> now, but there may come a day when we would have surprised a filesystem.
> 
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

Sure, this seems right.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 2f9bb89..11be8c7 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -292,7 +292,7 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
>  	struct inode *inode = mapping->host;
>  	if (!page)
>  		page = find_or_create_page(mapping, vmf->pgoff,
> -						GFP_KERNEL | __GFP_ZERO);
> +						vmf->gfp_mask | __GFP_ZERO);
>  	if (!page)
>  		return VM_FAULT_OOM;
>  	/* Recheck i_size under page lock to avoid truncate race */
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] dax: Use vmf->gfp_mask
@ 2016-02-23 23:37     ` Ross Zwisler
  0 siblings, 0 replies; 20+ messages in thread
From: Ross Zwisler @ 2016-02-23 23:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-nvdimm, linux-kernel, linux-mm, linux-fsdevel

On Sun, Jan 31, 2016 at 11:19:50PM +1100, Matthew Wilcox wrote:
> We were assuming that it was OK to do a GFP_KERNEL allocation in page
> fault context.  That appears to be largely true, but filesystems are
> permitted to override that in their setting of mapping->gfp_flags, which
> the VM then massages into vmf->gfp_flags.  No practical difference for
> now, but there may come a day when we would have surprised a filesystem.
> 
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

Sure, this seems right.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 2f9bb89..11be8c7 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -292,7 +292,7 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
>  	struct inode *inode = mapping->host;
>  	if (!page)
>  		page = find_or_create_page(mapping, vmf->pgoff,
> -						GFP_KERNEL | __GFP_ZERO);
> +						vmf->gfp_mask | __GFP_ZERO);
>  	if (!page)
>  		return VM_FAULT_OOM;
>  	/* Recheck i_size under page lock to avoid truncate race */
> -- 
> 2.7.0.rc3
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-02-23 23:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-31 12:19 [PATCH 0/6] DAX cleanups Matthew Wilcox
2016-01-31 12:19 ` Matthew Wilcox
2016-01-31 12:19 ` [PATCH 1/6] dax: Use vmf->gfp_mask Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox
2016-02-23 23:37   ` Ross Zwisler
2016-02-23 23:37     ` Ross Zwisler
2016-01-31 12:19 ` [PATCH 2/6] dax: Remove unnecessary rechecking of i_size Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox
2016-01-31 12:19 ` [PATCH 3/6] dax: Use vmf->pgoff in fault handlers Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox
2016-01-31 12:19 ` [PATCH 4/6] dax: Use PAGE_CACHE_SIZE where appropriate Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox
2016-02-01 13:10   ` Kirill A. Shutemov
2016-02-01 13:10     ` Kirill A. Shutemov
2016-02-01 20:43     ` Matthew Wilcox
2016-02-01 20:43       ` Matthew Wilcox
2016-01-31 12:19 ` [PATCH 5/6] dax: Factor dax_insert_pmd_mapping out of dax_pmd_fault Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox
2016-01-31 12:19 ` [PATCH 6/6] dax: Factor dax_insert_pud_mapping out of dax_pud_fault Matthew Wilcox
2016-01-31 12:19   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.