linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Revert locking changes in DAX for v4.3
@ 2015-10-06 22:28 Ross Zwisler
  2015-10-06 22:28 ` [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Ross Zwisler
  2015-10-06 22:28 ` [PATCH v4 2/2] Revert "dax: fix race between simultaneous faults" Ross Zwisler
  0 siblings, 2 replies; 5+ messages in thread
From: Ross Zwisler @ 2015-10-06 22:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Matthew Wilcox, linux-fsdevel,
	linux-mm, Andrew Morton, Dan Williams, Dave Chinner, Jan Kara,
	Kirill A. Shutemov, linux-nvdimm, Matthew Wilcox

This series reverts some recent changes to the locking scheme in DAX introduced
by these two commits:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

Changes from v3:
 - reduced the revert of 46c043ede471 in patch 1 so that we still drop the
   mapping->i_mmap_rwsem before calling unmap_mapping_range().  This prevents
   the deadlock in the __dax_pmd_fault() path so there is no longer a need to
   temporarily disable DAX PMD faults.

Ross Zwisler (2):
  Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"
  Revert "dax: fix race between simultaneous faults"

 fs/dax.c    | 70 +++++++++++++++++++++++++------------------------------------
 mm/memory.c |  2 ++
 2 files changed, 31 insertions(+), 41 deletions(-)

-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"
  2015-10-06 22:28 [PATCH v4 0/2] Revert locking changes in DAX for v4.3 Ross Zwisler
@ 2015-10-06 22:28 ` Ross Zwisler
  2015-10-07 16:19   ` Dan Williams
  2015-10-06 22:28 ` [PATCH v4 2/2] Revert "dax: fix race between simultaneous faults" Ross Zwisler
  1 sibling, 1 reply; 5+ messages in thread
From: Ross Zwisler @ 2015-10-06 22:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Matthew Wilcox, linux-fsdevel,
	linux-mm, Andrew Morton, Dan Williams, Dave Chinner, Jan Kara,
	Kirill A. Shutemov, linux-nvdimm, Matthew Wilcox

This reverts commits 46c043ede4711e8d598b9d63c5616c1fedb0605e
and 8346c416d17bf5b4ea1508662959bb62e73fd6a5.

The following two locking commits in the DAX code:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

introduced a number of deadlocks and other issues, and need to be
reverted for the v4.3 kernel. The list of issues in DAX after these
commits (some newly introduced by the commits, some preexisting) can be
found here:

https://lkml.org/lkml/2015/9/25/602

This revert keeps the PMEM API changes to the zeroing code in
__dax_pmd_fault(), which were added by this commit:

commit d77e92e270ed ("dax: update PMD fault handler with PMEM API")

It also keeps the code dropping mapping->i_mmap_rwsem before calling
unmap_mapping_range(), but converts it to a read lock since that's what is
now used by the rest of __dax_pmd_fault().  This is needed to avoid
recursively acquiring mapping->i_mmap_rwsem, once with a read lock in
__dax_pmd_fault() and once with a write lock in unmap_mapping_range().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c    | 37 +++++++++++++------------------------
 mm/memory.c | 11 +++++++++--
 2 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bcfb14b..f665bc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -569,36 +569,14 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
 		goto fallback;
 
-	sector = bh.b_blocknr << (blkbits - 9);
-
-	if (buffer_unwritten(&bh) || buffer_new(&bh)) {
-		int i;
-
-		length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
-						bh.b_size);
-		if (length < 0) {
-			result = VM_FAULT_SIGBUS;
-			goto out;
-		}
-		if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
-			goto fallback;
-
-		for (i = 0; i < PTRS_PER_PMD; i++)
-			clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
-		wmb_pmem();
-		count_vm_event(PGMAJFAULT);
-		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
-		result |= VM_FAULT_MAJOR;
-	}
-
 	/*
 	 * If we allocated new storage, make sure no process has any
 	 * zero pages covering this hole
 	 */
 	if (buffer_new(&bh)) {
-		i_mmap_unlock_write(mapping);
+		i_mmap_unlock_read(mapping);
 		unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
-		i_mmap_lock_write(mapping);
+		i_mmap_lock_read(mapping);
 	}
 
 	/*
@@ -635,6 +613,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		result = VM_FAULT_NOPAGE;
 		spin_unlock(ptl);
 	} else {
+		sector = bh.b_blocknr << (blkbits - 9);
 		length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
 						bh.b_size);
 		if (length < 0) {
@@ -644,6 +623,16 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 		if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
 			goto fallback;
 
+		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
+			int i;
+			for (i = 0; i < PTRS_PER_PMD; i++)
+				clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
+			wmb_pmem();
+			count_vm_event(PGMAJFAULT);
+			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+			result |= VM_FAULT_MAJOR;
+		}
+
 		result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 9cb2747..5ec066f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2426,10 +2426,17 @@ void unmap_mapping_range(struct address_space *mapping,
 	if (details.last_index < details.first_index)
 		details.last_index = ULONG_MAX;
 
-	i_mmap_lock_write(mapping);
+
+	/*
+	 * DAX already holds i_mmap_lock to serialise file truncate vs
+	 * page fault and page fault vs page fault.
+	 */
+	if (!IS_DAX(mapping->host))
+		i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
-	i_mmap_unlock_write(mapping);
+	if (!IS_DAX(mapping->host))
+		i_mmap_unlock_write(mapping);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v4 2/2] Revert "dax: fix race between simultaneous faults"
  2015-10-06 22:28 [PATCH v4 0/2] Revert locking changes in DAX for v4.3 Ross Zwisler
  2015-10-06 22:28 ` [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Ross Zwisler
@ 2015-10-06 22:28 ` Ross Zwisler
  1 sibling, 0 replies; 5+ messages in thread
From: Ross Zwisler @ 2015-10-06 22:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, Alexander Viro, Matthew Wilcox, linux-fsdevel,
	linux-mm, Andrew Morton, Dan Williams, Dave Chinner, Jan Kara,
	Kirill A. Shutemov, linux-nvdimm, Matthew Wilcox

This reverts commit 843172978bb92997310d2f7fbc172ece423cfc02.

The following two locking commits in the DAX code:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

introduced a number of deadlocks and other issues, and need to be
reverted for the v4.3 kernel.  The list of issues in DAX after these
commits (some newly introduced by the commits, some preexisting) can be
found here:

https://lkml.org/lkml/2015/9/25/602

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c    | 33 ++++++++++++++++-----------------
 mm/memory.c | 11 +++--------
 2 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f665bc9..a86d3cc 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -285,6 +285,7 @@ static int copy_user_bh(struct page *to, struct buffer_head *bh,
 static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 			struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct address_space *mapping = inode->i_mapping;
 	sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
 	void __pmem *addr;
@@ -292,6 +293,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	pgoff_t size;
 	int error;
 
+	i_mmap_lock_read(mapping);
+
 	/*
 	 * Check truncate didn't happen while we were allocating a block.
 	 * If it did, this block may or may not be still allocated to the
@@ -321,6 +324,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	error = vm_insert_mixed(vma, vaddr, pfn);
 
  out:
+	i_mmap_unlock_read(mapping);
+
 	return error;
 }
 
@@ -382,17 +387,15 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			 * from a read fault and we've raced with a truncate
 			 */
 			error = -EIO;
-			goto unlock;
+			goto unlock_page;
 		}
-	} else {
-		i_mmap_lock_write(mapping);
 	}
 
 	error = get_block(inode, block, &bh, 0);
 	if (!error && (bh.b_size < PAGE_SIZE))
 		error = -EIO;		/* fs corruption? */
 	if (error)
-		goto unlock;
+		goto unlock_page;
 
 	if (!buffer_mapped(&bh) && !buffer_unwritten(&bh) && !vmf->cow_page) {
 		if (vmf->flags & FAULT_FLAG_WRITE) {
@@ -403,9 +406,8 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			if (!error && (bh.b_size < PAGE_SIZE))
 				error = -EIO;
 			if (error)
-				goto unlock;
+				goto unlock_page;
 		} else {
-			i_mmap_unlock_write(mapping);
 			return dax_load_hole(mapping, page, vmf);
 		}
 	}
@@ -417,15 +419,17 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		else
 			clear_user_highpage(new_page, vaddr);
 		if (error)
-			goto unlock;
+			goto unlock_page;
 		vmf->page = page;
 		if (!page) {
+			i_mmap_lock_read(mapping);
 			/* Check we didn't race with truncate */
 			size = (i_size_read(inode) + PAGE_SIZE - 1) >>
 								PAGE_SHIFT;
 			if (vmf->pgoff >= size) {
+				i_mmap_unlock_read(mapping);
 				error = -EIO;
-				goto unlock;
+				goto out;
 			}
 		}
 		return VM_FAULT_LOCKED;
@@ -461,8 +465,6 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 			WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE));
 	}
 
-	if (!page)
-		i_mmap_unlock_write(mapping);
  out:
 	if (error == -ENOMEM)
 		return VM_FAULT_OOM | major;
@@ -471,14 +473,11 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 		return VM_FAULT_SIGBUS | major;
 	return VM_FAULT_NOPAGE | major;
 
- unlock:
+ unlock_page:
 	if (page) {
 		unlock_page(page);
 		page_cache_release(page);
-	} else {
-		i_mmap_unlock_write(mapping);
 	}
-
 	goto out;
 }
 EXPORT_SYMBOL(__dax_fault);
@@ -556,10 +555,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
 
 	bh.b_size = PMD_SIZE;
-	i_mmap_lock_write(mapping);
 	length = get_block(inode, block, &bh, write);
 	if (length)
 		return VM_FAULT_SIGBUS;
+	i_mmap_lock_read(mapping);
 
 	/*
 	 * If the filesystem isn't willing to tell us the length of a hole,
@@ -637,11 +636,11 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
 	}
 
  out:
+	i_mmap_unlock_read(mapping);
+
 	if (buffer_unwritten(&bh))
 		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
 
-	i_mmap_unlock_write(mapping);
-
 	return result;
 
  fallback:
diff --git a/mm/memory.c b/mm/memory.c
index 5ec066f..deb679c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2427,16 +2427,11 @@ void unmap_mapping_range(struct address_space *mapping,
 		details.last_index = ULONG_MAX;
 
 
-	/*
-	 * DAX already holds i_mmap_lock to serialise file truncate vs
-	 * page fault and page fault vs page fault.
-	 */
-	if (!IS_DAX(mapping->host))
-		i_mmap_lock_write(mapping);
+	/* DAX uses i_mmap_lock to serialise file truncate vs page fault */
+	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
-	if (!IS_DAX(mapping->host))
-		i_mmap_unlock_write(mapping);
+	i_mmap_unlock_write(mapping);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"
  2015-10-06 22:28 ` [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Ross Zwisler
@ 2015-10-07 16:19   ` Dan Williams
  2015-10-07 21:39     ` Ross Zwisler
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2015-10-07 16:19 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, Alexander Viro, Matthew Wilcox, linux-fsdevel,
	Linux MM, Andrew Morton, Dave Chinner, Jan Kara,
	Kirill A. Shutemov, linux-nvdimm, Matthew Wilcox

On Tue, Oct 6, 2015 at 3:28 PM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> This reverts commits 46c043ede4711e8d598b9d63c5616c1fedb0605e
> and 8346c416d17bf5b4ea1508662959bb62e73fd6a5.
>
> The following two locking commits in the DAX code:
>
> commit 843172978bb9 ("dax: fix race between simultaneous faults")
> commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")
>
> introduced a number of deadlocks and other issues, and need to be
> reverted for the v4.3 kernel. The list of issues in DAX after these
> commits (some newly introduced by the commits, some preexisting) can be
> found here:
>
> https://lkml.org/lkml/2015/9/25/602
>
> This revert keeps the PMEM API changes to the zeroing code in
> __dax_pmd_fault(), which were added by this commit:
>
> commit d77e92e270ed ("dax: update PMD fault handler with PMEM API")
>
> It also keeps the code dropping mapping->i_mmap_rwsem before calling
> unmap_mapping_range(), but converts it to a read lock since that's what is
> now used by the rest of __dax_pmd_fault().  This is needed to avoid
> recursively acquiring mapping->i_mmap_rwsem, once with a read lock in
> __dax_pmd_fault() and once with a write lock in unmap_mapping_range().

I think it is safe to say that this has now morphed into a full blown
fix and the "revert" label no longer applies.  But, I'll let Andrew
weigh in if he wants that fixed up or will replace these patches in
-mm:

revert-mm-take-i_mmap_lock-in-unmap_mapping_range-for-dax.patch
revert-dax-fix-race-between-simultaneous-faults.patch
dax-temporarily-disable-dax-pmd-fault-path.patch

...with this new series.

However, a question below:

> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  fs/dax.c    | 37 +++++++++++++------------------------
>  mm/memory.c | 11 +++++++++--
>  2 files changed, 22 insertions(+), 26 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index bcfb14b..f665bc9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -569,36 +569,14 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>         if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
>                 goto fallback;
>
> -       sector = bh.b_blocknr << (blkbits - 9);
> -
> -       if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> -               int i;
> -
> -               length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
> -                                               bh.b_size);
> -               if (length < 0) {
> -                       result = VM_FAULT_SIGBUS;
> -                       goto out;
> -               }
> -               if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
> -                       goto fallback;
> -
> -               for (i = 0; i < PTRS_PER_PMD; i++)
> -                       clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
> -               wmb_pmem();
> -               count_vm_event(PGMAJFAULT);
> -               mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> -               result |= VM_FAULT_MAJOR;
> -       }
> -
>         /*
>          * If we allocated new storage, make sure no process has any
>          * zero pages covering this hole
>          */
>         if (buffer_new(&bh)) {
> -               i_mmap_unlock_write(mapping);
> +               i_mmap_unlock_read(mapping);
>                 unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
> -               i_mmap_lock_write(mapping);
> +               i_mmap_lock_read(mapping);
>         }
>
>         /*
> @@ -635,6 +613,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>                 result = VM_FAULT_NOPAGE;
>                 spin_unlock(ptl);
>         } else {
> +               sector = bh.b_blocknr << (blkbits - 9);
>                 length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
>                                                 bh.b_size);
>                 if (length < 0) {
> @@ -644,6 +623,16 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
>                 if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
>                         goto fallback;
>
> +               if (buffer_unwritten(&bh) || buffer_new(&bh)) {
> +                       int i;
> +                       for (i = 0; i < PTRS_PER_PMD; i++)
> +                               clear_pmem(kaddr + i * PAGE_SIZE, PAGE_SIZE);
> +                       wmb_pmem();
> +                       count_vm_event(PGMAJFAULT);
> +                       mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> +                       result |= VM_FAULT_MAJOR;
> +               }
> +
>                 result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
>         }
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 9cb2747..5ec066f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2426,10 +2426,17 @@ void unmap_mapping_range(struct address_space *mapping,
>         if (details.last_index < details.first_index)
>                 details.last_index = ULONG_MAX;
>
> -       i_mmap_lock_write(mapping);
> +
> +       /*
> +        * DAX already holds i_mmap_lock to serialise file truncate vs
> +        * page fault and page fault vs page fault.
> +        */
> +       if (!IS_DAX(mapping->host))
> +               i_mmap_lock_write(mapping);
>         if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
>                 unmap_mapping_range_tree(&mapping->i_mmap, &details);
> -       i_mmap_unlock_write(mapping);
> +       if (!IS_DAX(mapping->host))
> +               i_mmap_unlock_write(mapping);
>  }
>  EXPORT_SYMBOL(unmap_mapping_range);

What about cases where unmap_mapping_range() is called without an fs
lock?  For the get_user_pages() and ZONE_DEVICE implementation I'm
looking to call truncate_pagecache() from the driver shutdown path to
revoke usage of the struct page's that were allocated by
devm_memremap_pages().

Likely I'm introducing a path through unmap_mapping_range() that does
not exist today, but I don't like that unmap_mapping_range() with this
change is presuming a given locking context.  It's not clear to me how
this routine is safe when it optionally takes i_mmap_lock_write(), at
a minimum this needs documenting, and possibly assertions if the
locking assumptions are violated.

invalidate_inode_pages2_range() seems to call unmap_mapping_range()
without the the correct locking, but this was just a quick scan.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX"
  2015-10-07 16:19   ` Dan Williams
@ 2015-10-07 21:39     ` Ross Zwisler
  0 siblings, 0 replies; 5+ messages in thread
From: Ross Zwisler @ 2015-10-07 21:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ross Zwisler, linux-kernel, Alexander Viro, Matthew Wilcox,
	linux-fsdevel, Linux MM, Andrew Morton, Dave Chinner, Jan Kara,
	Kirill A. Shutemov, linux-nvdimm, Matthew Wilcox

On Wed, Oct 07, 2015 at 09:19:28AM -0700, Dan Williams wrote:
> On Tue, Oct 6, 2015 at 3:28 PM, Ross Zwisler
> <ross.zwisler@linux.intel.com> wrote:
<snip>
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 9cb2747..5ec066f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2426,10 +2426,17 @@ void unmap_mapping_range(struct address_space *mapping,
> >         if (details.last_index < details.first_index)
> >                 details.last_index = ULONG_MAX;
> >
> > -       i_mmap_lock_write(mapping);
> > +
> > +       /*
> > +        * DAX already holds i_mmap_lock to serialise file truncate vs
> > +        * page fault and page fault vs page fault.
> > +        */
> > +       if (!IS_DAX(mapping->host))
> > +               i_mmap_lock_write(mapping);
> >         if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
> >                 unmap_mapping_range_tree(&mapping->i_mmap, &details);
> > -       i_mmap_unlock_write(mapping);
> > +       if (!IS_DAX(mapping->host))
> > +               i_mmap_unlock_write(mapping);
> >  }
> >  EXPORT_SYMBOL(unmap_mapping_range);
> 
> What about cases where unmap_mapping_range() is called without an fs
> lock?  For the get_user_pages() and ZONE_DEVICE implementation I'm
> looking to call truncate_pagecache() from the driver shutdown path to
> revoke usage of the struct page's that were allocated by
> devm_memremap_pages().
> 
> Likely I'm introducing a path through unmap_mapping_range() that does
> not exist today, but I don't like that unmap_mapping_range() with this
> change is presuming a given locking context.  It's not clear to me how
> this routine is safe when it optionally takes i_mmap_lock_write(), at
> a minimum this needs documenting, and possibly assertions if the
> locking assumptions are violated.

Yep, this is very confusing - these changes were undone by the second revert
in the series (they were done and then undone by separate patches, both of
which are getting reverted).  After the series is applied in total
unmap_mapping_range() takes the locks unconditionally:

		/* DAX uses i_mmap_lock to serialise file truncate vs page fault */
		i_mmap_lock_write(mapping);
		if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
			unmap_mapping_range_tree(&mapping->i_mmap, &details);
		i_mmap_unlock_write(mapping);
	}
	EXPORT_SYMBOL(unmap_mapping_range);

Yes, I totally agree this is confusing - I'll just bit the bullet, collapse
the two reverts together and call it "dax locking fixes" or something.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-07 21:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-06 22:28 [PATCH v4 0/2] Revert locking changes in DAX for v4.3 Ross Zwisler
2015-10-06 22:28 ` [PATCH v4 1/2] Revert "mm: take i_mmap_lock in unmap_mapping_range() for DAX" Ross Zwisler
2015-10-07 16:19   ` Dan Williams
2015-10-07 21:39     ` Ross Zwisler
2015-10-06 22:28 ` [PATCH v4 2/2] Revert "dax: fix race between simultaneous faults" Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).