linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: akpm@linux-foundation.org, willy@infradead.org,
	mhocko@kernel.org, mgorman@techsingularity.net,
	jglisse@redhat.com, ldufour@linux.vnet.ibm.com,
	dave@stgolabs.net, Davidlohr Bueso <dbueso@suse.de>
Subject: [PATCH 05/14] mm: remove some BUG checks wrt mmap_sem
Date: Mon, 20 May 2019 21:52:33 -0700	[thread overview]
Message-ID: <20190521045242.24378-6-dave@stgolabs.net> (raw)
In-Reply-To: <20190521045242.24378-1-dave@stgolabs.net>

This patch is a collection of hacks that shamelessly remove
mmap_sem state checks in order to not have to teach file_operations
about range locking; for thp and huge pagecache: By dropping the
rwsem_is_locked checks in zap_pmd_range() and zap_pud_range() we can
avoid having to teach file_operations about mmrange. For example in
xfs: iomap_dio_rw() is called by .read_iter file callbacks.

We also avoid mmap_sem trylock in vm_insert_page(): The rules to
this function state that mmap_sem must be acquired by the caller:

- for write if used in f_op->mmap() (by far the most common case)
- for read if used from vma_op->fault()(with VM_MIXEDMAP)

The only exception is:
  mmap_vmcore()
   remap_vmalloc_range_partial()
      mmap_vmcore()

But there is no concurrency here, thus mmap_sem is not held.
After auditing the kernel, the following drivers use the fault
path and correctly set VM_MIXEDMAP):

.fault = etnaviv_gem_fault
.fault = udl_gem_fault
tegra_bo_fault()

As such, drop the reader trylock BUG_ON() for the common case.
This avoids having file_operations know about mmranges, as
mmap_sem is held during, mmap() for example.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/huge_mm.h | 2 --
 mm/memory.c             | 2 --
 mm/mmap.c               | 4 ++--
 mm/pagewalk.c           | 3 ---
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c150c21d..a4a9cfa78d8f 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -194,7 +194,6 @@ static inline int is_swap_pmd(pmd_t pmd)
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 		struct vm_area_struct *vma)
 {
-	VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma);
 	if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
 		return __pmd_trans_huge_lock(pmd, vma);
 	else
@@ -203,7 +202,6 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
 		struct vm_area_struct *vma)
 {
-	VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma);
 	if (pud_trans_huge(*pud) || pud_devmap(*pud))
 		return __pud_trans_huge_lock(pud, vma);
 	else
diff --git a/mm/memory.c b/mm/memory.c
index 9516c95108a1..73971f859035 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1212,7 +1212,6 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 		next = pud_addr_end(addr, end);
 		if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
 			if (next - addr != HPAGE_PUD_SIZE) {
-				VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma);
 				split_huge_pud(vma, pud, addr);
 			} else if (zap_huge_pud(tlb, vma, pud, addr))
 				goto next;
@@ -1519,7 +1518,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 	if (!page_count(page))
 		return -EINVAL;
 	if (!(vma->vm_flags & VM_MIXEDMAP)) {
-		BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
 		BUG_ON(vma->vm_flags & VM_PFNMAP);
 		vma->vm_flags |= VM_MIXEDMAP;
 	}
diff --git a/mm/mmap.c b/mm/mmap.c
index af228ae3508d..a03ded49f9eb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3466,7 +3466,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
 		 */
-		down_write_nest_lock(&anon_vma->root->rwsem, &mm->mmap_sem);
+		down_write(&mm->mmap_sem);
 		/*
 		 * We can safely modify head.next after taking the
 		 * anon_vma->root->rwsem. If some other vma in this mm shares
@@ -3496,7 +3496,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
 		 */
 		if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
 			BUG();
-		down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_sem);
+		down_write(&mm->mmap_sem);
 	}
 }
 
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c3084ff2569d..6246acf17054 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -303,8 +303,6 @@ int walk_page_range(unsigned long start, unsigned long end,
 	if (!walk->mm)
 		return -EINVAL;
 
-	VM_BUG_ON_MM(!rwsem_is_locked(&walk->mm->mmap_sem), walk->mm);
-
 	vma = find_vma(walk->mm, start);
 	do {
 		if (!vma) { /* after the last vma */
@@ -346,7 +344,6 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
 	if (!walk->mm)
 		return -EINVAL;
 
-	VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
 	VM_BUG_ON(!vma);
 	walk->vma = vma;
 	err = walk_page_test(vma->vm_start, vma->vm_end, walk);
-- 
2.16.4


  parent reply	other threads:[~2019-05-21  4:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-21  4:52 [RFC PATCH 00/14] mmap_sem range locking Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 01/14] interval-tree: build unconditionally Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 02/14] Introduce range reader/writer lock Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 03/14] mm: introduce mm locking wrappers Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 04/14] mm: teach pagefault paths about range locking Davidlohr Bueso
2019-05-21  4:52 ` Davidlohr Bueso [this message]
2019-05-21  4:52 ` [PATCH 06/14] mm: teach the mm " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 07/14] fs: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 08/14] arch/x86: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 09/14] virt: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 10/14] net: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 11/14] ipc: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 12/14] kernel: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 13/14] drivers: " Davidlohr Bueso
2019-05-21  4:52 ` [PATCH 14/14] mm: convert mmap_sem to range mmap_lock Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521045242.24378-6-dave@stgolabs.net \
    --to=dave@stgolabs.net \
    --cc=akpm@linux-foundation.org \
    --cc=dbueso@suse.de \
    --cc=jglisse@redhat.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).