linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/24] Fine grained MM locking
@ 2020-02-24 20:30 Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 01/24] MM locking API: initial implementation as rwsem wrappers Michel Lespinasse
                   ` (24 more replies)
  0 siblings, 25 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Hi,

This is the first version of my work towards fine grained MM locking.
This is still early work - I am happy with my page fault changes,
but want to expand on the mmap/munmap side of things before I send the
next version. I have previously shared this with some of the copied folks
(for those who received that, there are no additional changes in this
public resend). Please expect a v2 within a few weeks, with further
changes for fine grained range locking in the mmap and munmap paths.

This work originated in discussions at LSF/MM 2019; it is intended to
address the latency issues that are caused by false conflicts between
threads working on separate parts of their address space.
The priorities are to keep things as simple as possible,
and to allow for progressive conversion of the code base to
finer grained MM locks.

The general approach is to replace the mmap_sem rwsem with a range lock.
Initially all lock/unlock sites are automatically converted to lock the
entire address space through a new API. Then, the API is extended to
support range locking. Locking sites can then be progressively converted
to use range locking, while leaving unconverted sites working
with no code changes.

When using a range lock (as opposed to a coarse lock), the following
rules apply:
- Some structures (notably the vma rbtree and associated statistics)
  are per-mm. They need to be locked separately using a new mm_vma_lock.
  The entire point of this patch set is to reduce false sharing latencies,
  so the mm_vma_lock must be held only for short times. We expect to do
  O(log N) operations holding the lock (for example, walking or updating
  the vma rbtree) but no O(N) operations (such as iterating on all vmas
  within a range or all mapped pages within a range).
- Code holding the mm_vma_lock should only update vma attributes for the
  range it has a write lock for. However, range locks only protects the
  vma's attributes, not the vmas themselves - vmas can still be split or
  merged with their neighbors if they have compatible attributes.
- Code holding a range lock but not the mm_vma_lock must be prepared for
  the vmas at both ends of the locked range to be merged with their
  neighbors outside of the locked range. The easiest way to do that is
  to copy the vma of record into a pseudo-vma before releasing the
  mm_vma_lock (this is a bit kludgy and I would prefer to copy only the
  necessary VMA attributes, but using a pseudo-vma makes it easier to
  maintain this patchset out of mainline for the moment).

Call sites that take a range lock usualy immediately take the
mm_vma_lock next - it would probably be more efficient to collapse
mm_vma_lock with the mutex that protects the range lock
structures. This isn't done yet as I tried to simplify the initial
implementation.

In the future I would also like to remove the various workarounds we have
been doing to limit mmap_sem hold times (i.e. FAULT_FLAG_ALLOW_RETRY,
vm_populate and munmap downgrading to a read lock, ...) which shouldn't
be necessary if the locking was only effective on the memory ranges
affected by each operation.


The included changes apply on top of upstream kernel v5.5.
Please apply with git am -p0 - I'm not sure why my git format-patch
setup requires that.


Commits 1 to 6 implement a range locking API:
- 1 implements coarse locking as wrappers around rwsem;
- 2 converts most mmap_sem locking sites to use the new coarse locking API
  (using coccinelle to automate the conversion);
- 3 converts remaining mmap_sem locking sites which were missed by coccinelle;
- 4 extends the API to support range locking. The initial implementation
  still uses coarse locking (ignoring the range); but it validates that the
  callers use matching ranges in lock and unlock calls;
- 5 prepares callers to allow for sleeping during unlock;
- 6 actually implements the range locking functions.

Commits 7 to 12 allow the x86 fault handler to specify a range
that may be released while handling the fault:
- 7 adds a range field to struct mm_fault;
- 8 makes handle_mm_fault() populate that field;
- 9 and 10 honor it when dropping mmap_sem during fault handling;
- 11 is a cleanup to the x86 fault handler to prepare for 12;
- 12 changes the x86 fault handler to use an explicit lock range.

Commits 13 to 15 prepare for operating on a pseudo-vma during faults:
- 13 adds a prepare_vma_fault which may update the vma of record
  (specifically, allocate an anon_vma) before creating the pseudo-vma;
- 14 disables swap vma readahead as its implementation keeps stats in the vma;
- 15 changes the x86 fault handler to use pseudo-vmas when handling anon vmas.

Commits 16 and 17 implement range locking in x86 anonymous vma faults:
- Commit 16 adds the vma locking API to be used to manipulate vmas when
  holding a fine grained ranged lock;
- Commit 17 converts the x86 fault handler to use a pmd sized range lock
  when operating on anon vmas.

Commits 18 to 20 extend the above to also work on filemap based files:
- Commit 18 makes sure we release the correct range when dropping mmap_sem
  during filemap file access;
- Commit 19 tags vm_operations that support range locking;
- Commit 20 makes the x86 fault handler use fine grained ranges when
  faulting the supported files.

Commits 21 to 24 implement range locking for the most basic mmap() case:
- 21 adds a locked argument to do_mmap();
- 22 makes do_mmap acquire the mmap_sem if locked is false;
- 23 converts soem easy call sites to pass locked=false;
- 24 changes do_mmap to acquire a fine grained lock in the easiest case
  (anonymous mapping, known address, no prior existing mapping).


Michel Lespinasse (24):
  MM locking API: initial implementation as rwsem wrappers
  MM locking API: use coccinelle to convert mmap_sem rwsem call sites
  MM locking API: manual conversion of mmap_sem call sites missed by
    coccinelle
  MM locking API: add range arguments
  MM locking API: allow for sleeping during unlock
  MM locking API: implement fine grained range locks
  mm/memory: add range field to struct vm_fault
  mm/memory: allow specifying MM lock range to handle_mm_fault()
  do_swap_page: use the vmf->range field when dropping mmap_sem
  handle_userfault: use the vmf->range field when dropping mmap_sem
  x86 fault handler: merge bad_area() functions
  x86 fault handler: use an explicit MM lock range
  mm/memory: add prepare_mm_fault() function
  mm/swap_state: disable swap vma readahead
  x86 fault handler: use a pseudo-vma when operating on anonymous vmas.
  MM locking API: add vma locking API
  x86 fault handler: implement range locking
  shared file mappings: use the vmf->range field when dropping mmap_sem
  mm: add field to annotate vm_operations that support range locking
  x86 fault handler: extend range locking to supported file vmas
  do_mmap: add locked argument
  do_mmap: implement locked argument
  do_mmap: use locked=false in vm_mmap_pgoff() and aio_setup_ring()
  do_mmap: implement easiest cases of fine grained locking

 arch/alpha/kernel/traps.c                     |   4 +-
 arch/alpha/mm/fault.c                         |  10 +-
 arch/arc/kernel/process.c                     |   4 +-
 arch/arc/kernel/troubleshoot.c                |   4 +-
 arch/arc/mm/fault.c                           |   4 +-
 arch/arm/kernel/process.c                     |   4 +-
 arch/arm/kernel/swp_emulate.c                 |   4 +-
 arch/arm/lib/uaccess_with_memcpy.c            |  16 +-
 arch/arm/mm/fault.c                           |   6 +-
 arch/arm64/kernel/traps.c                     |   4 +-
 arch/arm64/kernel/vdso.c                      |   8 +-
 arch/arm64/mm/fault.c                         |   8 +-
 arch/csky/kernel/vdso.c                       |   4 +-
 arch/csky/mm/fault.c                          |   8 +-
 arch/hexagon/kernel/vdso.c                    |   4 +-
 arch/hexagon/mm/vm_fault.c                    |   8 +-
 arch/ia64/kernel/perfmon.c                    |   8 +-
 arch/ia64/mm/fault.c                          |   8 +-
 arch/ia64/mm/init.c                           |  12 +-
 arch/m68k/kernel/sys_m68k.c                   |  14 +-
 arch/m68k/mm/fault.c                          |   8 +-
 arch/microblaze/mm/fault.c                    |  12 +-
 arch/mips/kernel/traps.c                      |   4 +-
 arch/mips/kernel/vdso.c                       |   4 +-
 arch/mips/mm/fault.c                          |  10 +-
 arch/nds32/kernel/vdso.c                      |   6 +-
 arch/nds32/mm/fault.c                         |  12 +-
 arch/nios2/mm/fault.c                         |  12 +-
 arch/nios2/mm/init.c                          |   4 +-
 arch/openrisc/mm/fault.c                      |  10 +-
 arch/parisc/kernel/traps.c                    |   6 +-
 arch/parisc/mm/fault.c                        |   8 +-
 arch/powerpc/kernel/vdso.c                    |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |   4 +-
 arch/powerpc/kvm/book3s_hv.c                  |   6 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c            |  12 +-
 arch/powerpc/kvm/e500_mmu_host.c              |   4 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |   4 +-
 arch/powerpc/mm/book3s64/subpage_prot.c       |  12 +-
 arch/powerpc/mm/copro_fault.c                 |   4 +-
 arch/powerpc/mm/fault.c                       |  12 +-
 arch/powerpc/oprofile/cell/spu_task_sync.c    |   6 +-
 arch/powerpc/platforms/cell/spufs/file.c      |   4 +-
 arch/riscv/kernel/vdso.c                      |   4 +-
 arch/riscv/mm/fault.c                         |  10 +-
 arch/s390/kernel/vdso.c                       |   4 +-
 arch/s390/kvm/gaccess.c                       |   4 +-
 arch/s390/kvm/kvm-s390.c                      |  24 +-
 arch/s390/kvm/priv.c                          |  32 +-
 arch/s390/mm/fault.c                          |   6 +-
 arch/s390/mm/gmap.c                           |  40 +-
 arch/s390/pci/pci_mmio.c                      |   4 +-
 arch/sh/kernel/sys_sh.c                       |   6 +-
 arch/sh/kernel/vsyscall/vsyscall.c            |   4 +-
 arch/sh/mm/fault.c                            |  14 +-
 arch/sparc/mm/fault_32.c                      |  18 +-
 arch/sparc/mm/fault_64.c                      |  12 +-
 arch/sparc/vdso/vma.c                         |   4 +-
 arch/um/include/asm/mmu_context.h             |   6 +-
 arch/um/kernel/tlb.c                          |   2 +-
 arch/um/kernel/trap.c                         |   6 +-
 arch/unicore32/mm/fault.c                     |   6 +-
 arch/x86/entry/vdso/vma.c                     |  10 +-
 arch/x86/kernel/tboot.c                       |   2 +-
 arch/x86/kernel/vm86_32.c                     |   4 +-
 arch/x86/kvm/mmu/paging_tmpl.h                |   8 +-
 arch/x86/mm/debug_pagetables.c                |   8 +-
 arch/x86/mm/fault.c                           | 110 ++-
 arch/x86/mm/mpx.c                             |  15 +-
 arch/x86/um/vdso/vma.c                        |   4 +-
 arch/xtensa/mm/fault.c                        |  10 +-
 drivers/android/binder_alloc.c                |  10 +-
 drivers/firmware/efi/efi.c                    |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c       |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   8 +-
 drivers/gpu/drm/nouveau/nouveau_svm.c         |  20 +-
 drivers/gpu/drm/radeon/radeon_cs.c            |   4 +-
 drivers/gpu/drm/radeon/radeon_gem.c           |   6 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |   4 +-
 drivers/infiniband/core/umem.c                |   6 +-
 drivers/infiniband/core/umem_odp.c            |  10 +-
 drivers/infiniband/core/uverbs_main.c         |   4 +-
 drivers/infiniband/hw/mlx4/mr.c               |   4 +-
 drivers/infiniband/hw/qib/qib_user_pages.c    |   6 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c      |   4 +-
 drivers/infiniband/sw/siw/siw_mem.c           |   4 +-
 drivers/iommu/amd_iommu_v2.c                  |   4 +-
 drivers/iommu/intel-svm.c                     |   4 +-
 drivers/media/v4l2-core/videobuf-core.c       |   4 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |   4 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c     |   4 +-
 drivers/misc/cxl/cxllib.c                     |   4 +-
 drivers/misc/cxl/fault.c                      |   4 +-
 drivers/misc/sgi-gru/grufault.c               |  16 +-
 drivers/misc/sgi-gru/grufile.c                |   4 +-
 drivers/oprofile/buffer_sync.c                |  10 +-
 drivers/staging/kpc2000/kpc_dma/fileops.c     |   4 +-
 drivers/tee/optee/call.c                      |   4 +-
 drivers/vfio/vfio_iommu_type1.c               |  12 +-
 drivers/xen/gntdev.c                          |   4 +-
 drivers/xen/privcmd.c                         |  14 +-
 fs/aio.c                                      |  16 +-
 fs/coredump.c                                 |   4 +-
 fs/exec.c                                     |  16 +-
 fs/ext4/file.c                                |   1 +
 fs/io_uring.c                                 |   4 +-
 fs/proc/base.c                                |  18 +-
 fs/proc/task_mmu.c                            |  28 +-
 fs/proc/task_nommu.c                          |  18 +-
 fs/userfaultfd.c                              |  28 +-
 include/linux/hugetlb.h                       |   5 +-
 include/linux/mm.h                            |  56 +-
 include/linux/mm_lock.h                       | 285 ++++++++
 include/linux/mm_types.h                      |  22 +
 include/linux/mm_types_task.h                 |  21 +
 include/linux/mmu_notifier.h                  |   5 +-
 include/linux/pagemap.h                       |   7 +-
 include/linux/sched.h                         |   2 +
 init/init_task.c                              |   1 +
 ipc/shm.c                                     |  11 +-
 kernel/acct.c                                 |   4 +-
 kernel/bpf/stackmap.c                         |  32 +-
 kernel/events/core.c                          |   4 +-
 kernel/events/uprobes.c                       |  16 +-
 kernel/exit.c                                 |   8 +-
 kernel/fork.c                                 |  17 +-
 kernel/futex.c                                |   4 +-
 kernel/sched/fair.c                           |   4 +-
 kernel/sys.c                                  |  18 +-
 kernel/trace/trace_output.c                   |   4 +-
 mm/Kconfig                                    |  25 +
 mm/Makefile                                   |   2 +
 mm/filemap.c                                  |  10 +-
 mm/frame_vector.c                             |   4 +-
 mm/gup.c                                      |  20 +-
 mm/hugetlb.c                                  |  13 +-
 mm/init-mm.c                                  |   3 +-
 mm/internal.h                                 |   2 +-
 mm/khugepaged.c                               |  37 +-
 mm/ksm.c                                      |  34 +-
 mm/madvise.c                                  |  18 +-
 mm/memcontrol.c                               |   8 +-
 mm/memory.c                                   |  55 +-
 mm/mempolicy.c                                |  22 +-
 mm/migrate.c                                  |   8 +-
 mm/mincore.c                                  |   4 +-
 mm/mlock.c                                    |  16 +-
 mm/mm_lock_range.c                            | 691 ++++++++++++++++++
 mm/mm_lock_rwsem_checked.c                    | 134 ++++
 mm/mmap.c                                     | 170 +++--
 mm/mmu_notifier.c                             |   4 +-
 mm/mprotect.c                                 |  12 +-
 mm/mremap.c                                   |   6 +-
 mm/msync.c                                    |   8 +-
 mm/nommu.c                                    |  36 +-
 mm/oom_kill.c                                 |   4 +-
 mm/process_vm_access.c                        |   4 +-
 mm/shmem.c                                    |   1 +
 mm/swap_state.c                               |   6 +
 mm/swapfile.c                                 |   4 +-
 mm/userfaultfd.c                              |  14 +-
 mm/util.c                                     |  14 +-
 net/ipv4/tcp.c                                |   4 +-
 net/xdp/xdp_umem.c                            |   4 +-
 virt/kvm/arm/mmu.c                            |  14 +-
 virt/kvm/async_pf.c                           |   4 +-
 virt/kvm/kvm_main.c                           |   8 +-
 170 files changed, 2183 insertions(+), 798 deletions(-)
 create mode 100644 include/linux/mm_lock.h
 create mode 100644 mm/mm_lock_range.c
 create mode 100644 mm/mm_lock_rwsem_checked.c

-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 01/24] MM locking API: initial implementation as rwsem wrappers
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 02/24] MM locking API: use coccinelle to convert mmap_sem rwsem call sites Michel Lespinasse
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change wraps the existing mmap_sem related rwsem calls into a new
MM locking API. This is in preparation to extending that API to support
locking fine grained memory ranges.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/mm.h      |  1 +
 include/linux/mm_lock.h | 59 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 include/linux/mm_lock.h

diff --git include/linux/mm.h include/linux/mm.h
index cfaa8feecfe8..052f423d7f67 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -15,6 +15,7 @@
 #include <linux/atomic.h>
 #include <linux/debug_locks.h>
 #include <linux/mm_types.h>
+#include <linux/mm_lock.h>
 #include <linux/range.h>
 #include <linux/pfn.h>
 #include <linux/percpu-refcount.h>
diff --git include/linux/mm_lock.h include/linux/mm_lock.h
new file mode 100644
index 000000000000..b5f134285e53
--- /dev/null
+++ include/linux/mm_lock.h
@@ -0,0 +1,59 @@
+#ifndef _LINUX_MM_LOCK_H
+#define _LINUX_MM_LOCK_H
+
+static inline void mm_init_lock(struct mm_struct *mm)
+{
+	init_rwsem(&mm->mmap_sem);
+}
+
+static inline void mm_write_lock(struct mm_struct *mm)
+{
+	down_write(&mm->mmap_sem);
+}
+
+static inline int mm_write_lock_killable(struct mm_struct *mm)
+{
+	return down_write_killable(&mm->mmap_sem);
+}
+
+static inline bool mm_write_trylock(struct mm_struct *mm)
+{
+	return down_write_trylock(&mm->mmap_sem) != 0;
+}
+
+static inline void mm_write_unlock(struct mm_struct *mm)
+{
+	up_write(&mm->mmap_sem);
+}
+
+static inline void mm_downgrade_write_lock(struct mm_struct *mm)
+{
+	downgrade_write(&mm->mmap_sem);
+}
+
+static inline void mm_read_lock(struct mm_struct *mm)
+{
+	down_read(&mm->mmap_sem);
+}
+
+static inline int mm_read_lock_killable(struct mm_struct *mm)
+{
+	return down_read_killable(&mm->mmap_sem);
+}
+
+static inline bool mm_read_trylock(struct mm_struct *mm)
+{
+	return down_read_trylock(&mm->mmap_sem) != 0;
+}
+
+static inline void mm_read_unlock(struct mm_struct *mm)
+{
+	up_read(&mm->mmap_sem);
+}
+
+static inline bool mm_is_locked(struct mm_struct *mm)
+{
+	return rwsem_is_locked(&mm->mmap_sem) != 0;
+}
+
+#endif /* _LINUX_MM_LOCK_H */
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 02/24] MM locking API: use coccinelle to convert mmap_sem rwsem call sites
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 01/24] MM locking API: initial implementation as rwsem wrappers Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 03/24] MM locking API: manual conversion of mmap_sem call sites missed by coccinelle Michel Lespinasse
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change converts the existing mmap_sem rwsem calls to use the new
MM locking API instead.

The change is generated using coccinelle with the following rules:

// spatch --sp-file foo.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
-init_rwsem(&mm->mmap_sem)
+mm_init_lock(mm)

@@
expression mm;
@@
-down_write(&mm->mmap_sem)
+mm_write_lock(mm)

@@
expression mm;
@@
-down_write_killable(&mm->mmap_sem)
+mm_write_lock_killable(mm)

@@
expression mm;
@@
-down_write_trylock(&mm->mmap_sem)
+mm_write_trylock(mm)

@@
expression mm;
@@
-up_write(&mm->mmap_sem)
+mm_write_unlock(mm)

@@
expression mm;
@@
-downgrade_write(&mm->mmap_sem)
+mm_downgrade_write_lock(mm)

@@
expression mm;
@@
-down_read(&mm->mmap_sem)
+mm_read_lock(mm)

@@
expression mm;
@@
-down_read_killable(&mm->mmap_sem)
+mm_read_lock_killable(mm)

@@
expression mm;
@@
-down_read_trylock(&mm->mmap_sem)
+mm_read_trylock(mm)

@@
expression mm;
@@
-up_read(&mm->mmap_sem)
+mm_read_unlock(mm)

@@
expression mm;
@@
-rwsem_is_locked(&mm->mmap_sem)
+mm_is_locked(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/alpha/kernel/traps.c                     |  4 +-
 arch/alpha/mm/fault.c                         | 10 ++---
 arch/arc/kernel/process.c                     |  4 +-
 arch/arc/kernel/troubleshoot.c                |  4 +-
 arch/arc/mm/fault.c                           |  4 +-
 arch/arm/kernel/process.c                     |  4 +-
 arch/arm/kernel/swp_emulate.c                 |  4 +-
 arch/arm/lib/uaccess_with_memcpy.c            | 16 ++++----
 arch/arm/mm/fault.c                           |  6 +--
 arch/arm64/kernel/traps.c                     |  4 +-
 arch/arm64/kernel/vdso.c                      |  8 ++--
 arch/arm64/mm/fault.c                         |  8 ++--
 arch/csky/kernel/vdso.c                       |  4 +-
 arch/csky/mm/fault.c                          |  8 ++--
 arch/hexagon/kernel/vdso.c                    |  4 +-
 arch/hexagon/mm/vm_fault.c                    |  8 ++--
 arch/ia64/kernel/perfmon.c                    |  8 ++--
 arch/ia64/mm/fault.c                          |  8 ++--
 arch/ia64/mm/init.c                           | 12 +++---
 arch/m68k/kernel/sys_m68k.c                   | 14 +++----
 arch/m68k/mm/fault.c                          |  8 ++--
 arch/microblaze/mm/fault.c                    | 12 +++---
 arch/mips/kernel/traps.c                      |  4 +-
 arch/mips/kernel/vdso.c                       |  4 +-
 arch/nds32/kernel/vdso.c                      |  6 +--
 arch/nds32/mm/fault.c                         | 12 +++---
 arch/nios2/mm/fault.c                         | 12 +++---
 arch/nios2/mm/init.c                          |  4 +-
 arch/openrisc/mm/fault.c                      | 10 ++---
 arch/parisc/kernel/traps.c                    |  6 +--
 arch/parisc/mm/fault.c                        |  8 ++--
 arch/powerpc/kernel/vdso.c                    |  6 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c           |  4 +-
 arch/powerpc/kvm/book3s_hv.c                  |  6 +--
 arch/powerpc/kvm/book3s_hv_uvmem.c            | 12 +++---
 arch/powerpc/kvm/e500_mmu_host.c              |  4 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |  4 +-
 arch/powerpc/mm/book3s64/subpage_prot.c       | 12 +++---
 arch/powerpc/mm/copro_fault.c                 |  4 +-
 arch/powerpc/mm/fault.c                       | 12 +++---
 arch/powerpc/oprofile/cell/spu_task_sync.c    |  6 +--
 arch/powerpc/platforms/cell/spufs/file.c      |  4 +-
 arch/riscv/kernel/vdso.c                      |  4 +-
 arch/riscv/mm/fault.c                         | 10 ++---
 arch/s390/kernel/vdso.c                       |  4 +-
 arch/s390/kvm/gaccess.c                       |  4 +-
 arch/s390/kvm/kvm-s390.c                      | 24 +++++------
 arch/s390/kvm/priv.c                          | 32 +++++++--------
 arch/s390/mm/fault.c                          |  6 +--
 arch/s390/mm/gmap.c                           | 40 +++++++++----------
 arch/s390/pci/pci_mmio.c                      |  4 +-
 arch/sh/kernel/sys_sh.c                       |  6 +--
 arch/sh/kernel/vsyscall/vsyscall.c            |  4 +-
 arch/sh/mm/fault.c                            | 14 +++----
 arch/sparc/mm/fault_32.c                      | 18 ++++-----
 arch/sparc/mm/fault_64.c                      | 12 +++---
 arch/sparc/vdso/vma.c                         |  4 +-
 arch/um/include/asm/mmu_context.h             |  2 +-
 arch/um/kernel/tlb.c                          |  2 +-
 arch/um/kernel/trap.c                         |  6 +--
 arch/unicore32/mm/fault.c                     |  6 +--
 arch/x86/entry/vdso/vma.c                     | 10 ++---
 arch/x86/kernel/vm86_32.c                     |  4 +-
 arch/x86/mm/debug_pagetables.c                |  8 ++--
 arch/x86/mm/fault.c                           |  8 ++--
 arch/x86/mm/mpx.c                             | 12 +++---
 arch/x86/um/vdso/vma.c                        |  4 +-
 arch/xtensa/mm/fault.c                        | 10 ++---
 drivers/android/binder_alloc.c                |  6 +--
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 10 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c       |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  8 ++--
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 20 +++++-----
 drivers/gpu/drm/radeon/radeon_cs.c            |  4 +-
 drivers/gpu/drm/radeon/radeon_gem.c           |  6 +--
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |  4 +-
 drivers/infiniband/core/umem.c                |  6 +--
 drivers/infiniband/core/umem_odp.c            | 10 ++---
 drivers/infiniband/core/uverbs_main.c         |  4 +-
 drivers/infiniband/hw/mlx4/mr.c               |  4 +-
 drivers/infiniband/hw/qib/qib_user_pages.c    |  6 +--
 drivers/infiniband/hw/usnic/usnic_uiom.c      |  4 +-
 drivers/infiniband/sw/siw/siw_mem.c           |  4 +-
 drivers/iommu/amd_iommu_v2.c                  |  4 +-
 drivers/iommu/intel-svm.c                     |  4 +-
 drivers/media/v4l2-core/videobuf-core.c       |  4 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  4 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c     |  4 +-
 drivers/misc/cxl/cxllib.c                     |  4 +-
 drivers/misc/cxl/fault.c                      |  4 +-
 drivers/misc/sgi-gru/grufault.c               | 16 ++++----
 drivers/misc/sgi-gru/grufile.c                |  4 +-
 drivers/oprofile/buffer_sync.c                | 10 ++---
 drivers/staging/kpc2000/kpc_dma/fileops.c     |  4 +-
 drivers/tee/optee/call.c                      |  4 +-
 drivers/vfio/vfio_iommu_type1.c               | 12 +++---
 drivers/xen/gntdev.c                          |  4 +-
 drivers/xen/privcmd.c                         | 14 +++----
 fs/aio.c                                      |  4 +-
 fs/coredump.c                                 |  4 +-
 fs/exec.c                                     | 16 ++++----
 fs/io_uring.c                                 |  4 +-
 fs/proc/base.c                                | 12 +++---
 fs/proc/task_mmu.c                            | 28 ++++++-------
 fs/proc/task_nommu.c                          | 18 ++++-----
 fs/userfaultfd.c                              | 28 ++++++-------
 include/linux/mmu_notifier.h                  |  5 ++-
 ipc/shm.c                                     |  8 ++--
 kernel/acct.c                                 |  4 +-
 kernel/bpf/stackmap.c                         |  4 +-
 kernel/events/core.c                          |  4 +-
 kernel/events/uprobes.c                       | 16 ++++----
 kernel/exit.c                                 |  8 ++--
 kernel/fork.c                                 | 12 +++---
 kernel/futex.c                                |  4 +-
 kernel/sched/fair.c                           |  4 +-
 kernel/sys.c                                  | 18 ++++-----
 kernel/trace/trace_output.c                   |  4 +-
 mm/filemap.c                                  |  6 +--
 mm/frame_vector.c                             |  4 +-
 mm/gup.c                                      | 20 +++++-----
 mm/internal.h                                 |  2 +-
 mm/khugepaged.c                               | 36 ++++++++---------
 mm/ksm.c                                      | 34 ++++++++--------
 mm/madvise.c                                  | 18 ++++-----
 mm/memcontrol.c                               |  8 ++--
 mm/memory.c                                   | 12 +++---
 mm/mempolicy.c                                | 22 +++++-----
 mm/migrate.c                                  |  8 ++--
 mm/mincore.c                                  |  4 +-
 mm/mlock.c                                    | 16 ++++----
 mm/mmap.c                                     | 32 +++++++--------
 mm/mmu_notifier.c                             |  4 +-
 mm/mprotect.c                                 | 12 +++---
 mm/mremap.c                                   |  6 +--
 mm/msync.c                                    |  8 ++--
 mm/nommu.c                                    | 16 ++++----
 mm/oom_kill.c                                 |  4 +-
 mm/process_vm_access.c                        |  4 +-
 mm/swapfile.c                                 |  4 +-
 mm/userfaultfd.c                              | 14 +++----
 mm/util.c                                     |  8 ++--
 net/ipv4/tcp.c                                |  4 +-
 net/xdp/xdp_umem.c                            |  4 +-
 virt/kvm/arm/mmu.c                            | 14 +++----
 virt/kvm/async_pf.c                           |  4 +-
 virt/kvm/kvm_main.c                           |  8 ++--
 149 files changed, 653 insertions(+), 652 deletions(-)

diff --git arch/alpha/kernel/traps.c arch/alpha/kernel/traps.c
index f6b9664ac504..5f650ffe8db6 100644
--- arch/alpha/kernel/traps.c
+++ arch/alpha/kernel/traps.c
@@ -957,12 +957,12 @@ do_entUnaUser(void __user * va, unsigned long opcode,
 		si_code = SEGV_ACCERR;
 	else {
 		struct mm_struct *mm = current->mm;
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		if (find_vma(mm, (unsigned long)va))
 			si_code = SEGV_ACCERR;
 		else
 			si_code = SEGV_MAPERR;
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	}
 	send_sig_fault(SIGSEGV, si_code, va, 0, current);
 	return;
diff --git arch/alpha/mm/fault.c arch/alpha/mm/fault.c
index 741e61ef9d3f..074a3ed78f4c 100644
--- arch/alpha/mm/fault.c
+++ arch/alpha/mm/fault.c
@@ -117,7 +117,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
@@ -180,14 +180,14 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return;
 
 	/* Something tried to access memory that isn't in our memory map.
 	   Fix it, but check if it's kernel or user first.  */
  bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (user_mode(regs))
 		goto do_sigsegv;
@@ -211,14 +211,14 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
 	/* We ran out of memory, or some other thing happened to us that
 	   made us unable to handle the page fault gracefully.  */
  out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
  do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	/* Send a sigbus, regardless of whether we were in kernel
 	   or user mode.  */
 	force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *) address, 0);
diff --git arch/arc/kernel/process.c arch/arc/kernel/process.c
index e1889ce3faf9..adcacc94500d 100644
--- arch/arc/kernel/process.c
+++ arch/arc/kernel/process.c
@@ -88,10 +88,10 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
 	if (unlikely(ret != -EFAULT))
 		 goto fail;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	ret = fixup_user_fault(current, current->mm, (unsigned long) uaddr,
 			       FAULT_FLAG_WRITE, NULL);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	if (likely(!ret))
 		 goto again;
diff --git arch/arc/kernel/troubleshoot.c arch/arc/kernel/troubleshoot.c
index b79886a6cec8..e7d2d5fb0bb2 100644
--- arch/arc/kernel/troubleshoot.c
+++ arch/arc/kernel/troubleshoot.c
@@ -89,7 +89,7 @@ static void show_faulting_vma(unsigned long address)
 	/* can't use print_vma_addr() yet as it doesn't check for
 	 * non-inclusive vma
 	 */
-	down_read(&active_mm->mmap_sem);
+	mm_read_lock(active_mm);
 	vma = find_vma(active_mm, address);
 
 	/* check against the find_vma( ) behaviour which returns the next VMA
@@ -112,7 +112,7 @@ static void show_faulting_vma(unsigned long address)
 	} else
 		pr_info("    @No matching VMA found\n");
 
-	up_read(&active_mm->mmap_sem);
+	mm_read_unlock(active_mm);
 }
 
 static void show_ecr_verbose(struct pt_regs *regs)
diff --git arch/arc/mm/fault.c arch/arc/mm/fault.c
index fb86bc3e9b35..c6a1a39eb92d 100644
--- arch/arc/mm/fault.c
+++ arch/arc/mm/fault.c
@@ -107,7 +107,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 		flags |= FAULT_FLAG_WRITE;
 
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma(mm, address);
 	if (!vma)
@@ -159,7 +159,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
 	}
 
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Major/minor page fault accounting
diff --git arch/arm/kernel/process.c arch/arm/kernel/process.c
index 46e478fb5ea2..e50ef99fbf26 100644
--- arch/arm/kernel/process.c
+++ arch/arm/kernel/process.c
@@ -431,7 +431,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	npages = 1; /* for sigpage */
 	npages += vdso_total_pages;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 	hint = sigpage_addr(mm, npages);
 	addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
@@ -458,7 +458,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	arm_install_vdso(mm, addr + PAGE_SIZE);
 
  up_fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 #endif
diff --git arch/arm/kernel/swp_emulate.c arch/arm/kernel/swp_emulate.c
index e640871328c1..937ea5587add 100644
--- arch/arm/kernel/swp_emulate.c
+++ arch/arm/kernel/swp_emulate.c
@@ -97,12 +97,12 @@ static void set_segfault(struct pt_regs *regs, unsigned long addr)
 {
 	int si_code;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	if (find_vma(current->mm, addr) == NULL)
 		si_code = SEGV_MAPERR;
 	else
 		si_code = SEGV_ACCERR;
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	pr_debug("SWP{B} emulation: access caused memory abort!\n");
 	arm_notify_die("Illegal memory access", regs,
diff --git arch/arm/lib/uaccess_with_memcpy.c arch/arm/lib/uaccess_with_memcpy.c
index c9450982a155..c215fb5fa7ea 100644
--- arch/arm/lib/uaccess_with_memcpy.c
+++ arch/arm/lib/uaccess_with_memcpy.c
@@ -96,7 +96,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
 	atomic = faulthandler_disabled();
 
 	if (!atomic)
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 	while (n) {
 		pte_t *pte;
 		spinlock_t *ptl;
@@ -104,11 +104,11 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
 
 		while (!pin_page_for_write(to, &pte, &ptl)) {
 			if (!atomic)
-				up_read(&current->mm->mmap_sem);
+				mm_read_unlock(current->mm);
 			if (__put_user(0, (char __user *)to))
 				goto out;
 			if (!atomic)
-				down_read(&current->mm->mmap_sem);
+				mm_read_lock(current->mm);
 		}
 
 		tocopy = (~(unsigned long)to & ~PAGE_MASK) + 1;
@@ -128,7 +128,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
 			spin_unlock(ptl);
 	}
 	if (!atomic)
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 out:
 	return n;
@@ -165,17 +165,17 @@ __clear_user_memset(void __user *addr, unsigned long n)
 		return 0;
 	}
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	while (n) {
 		pte_t *pte;
 		spinlock_t *ptl;
 		int tocopy;
 
 		while (!pin_page_for_write(addr, &pte, &ptl)) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			if (__put_user(0, (char __user *)addr))
 				goto out;
-			down_read(&current->mm->mmap_sem);
+			mm_read_lock(current->mm);
 		}
 
 		tocopy = (~(unsigned long)addr & ~PAGE_MASK) + 1;
@@ -193,7 +193,7 @@ __clear_user_memset(void __user *addr, unsigned long n)
 		else
 			spin_unlock(ptl);
 	}
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 out:
 	return n;
diff --git arch/arm/mm/fault.c arch/arm/mm/fault.c
index bd0f4821f7e1..a0c0d80c3180 100644
--- arch/arm/mm/fault.c
+++ arch/arm/mm/fault.c
@@ -270,11 +270,11 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	 * validly references user space from well defined areas of the code,
 	 * we can bug out early if this is from code which shouldn't.
 	 */
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
 			goto no_context;
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -327,7 +327,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Handle the "normal" case first - VM_FAULT_MAJOR
diff --git arch/arm64/kernel/traps.c arch/arm64/kernel/traps.c
index 73caf35c2262..5aaa302dd6dd 100644
--- arch/arm64/kernel/traps.c
+++ arch/arm64/kernel/traps.c
@@ -384,12 +384,12 @@ void arm64_notify_segfault(unsigned long addr)
 {
 	int code;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	if (find_vma(current->mm, addr) == NULL)
 		code = SEGV_MAPERR;
 	else
 		code = SEGV_ACCERR;
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	force_signal_inject(SIGSEGV, code, addr);
 }
diff --git arch/arm64/kernel/vdso.c arch/arm64/kernel/vdso.c
index 354b11e27c07..2fe5523ef7b8 100644
--- arch/arm64/kernel/vdso.c
+++ arch/arm64/kernel/vdso.c
@@ -357,7 +357,7 @@ int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct mm_struct *mm = current->mm;
 	int ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	ret = aarch32_kuser_helpers_setup(mm);
@@ -374,7 +374,7 @@ int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 #endif /* CONFIG_COMPAT_VDSO */
 
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 #endif /* CONFIG_COMPAT */
@@ -418,7 +418,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	struct mm_struct *mm = current->mm;
 	int ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	ret = __setup_additional_pages(ARM64_VDSO,
@@ -426,7 +426,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 				       bprm,
 				       uses_interp);
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return ret;
 }
diff --git arch/arm64/mm/fault.c arch/arm64/mm/fault.c
index 85566d32958f..e1afd506340b 100644
--- arch/arm64/mm/fault.c
+++ arch/arm64/mm/fault.c
@@ -491,11 +491,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 	 * validly references user space from well defined areas of the code,
 	 * we can bug out early if this is from code which shouldn't.
 	 */
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		if (!user_mode(regs) && !search_exception_tables(regs->pc))
 			goto no_context;
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in which
@@ -504,7 +504,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		might_sleep();
 #ifdef CONFIG_DEBUG_VM
 		if (!user_mode(regs) && !search_exception_tables(regs->pc)) {
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			goto no_context;
 		}
 #endif
@@ -536,7 +536,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			goto retry;
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Handle the "normal" (no error) case first.
diff --git arch/csky/kernel/vdso.c arch/csky/kernel/vdso.c
index 60ff7adfad1d..f4f3831a0974 100644
--- arch/csky/kernel/vdso.c
+++ arch/csky/kernel/vdso.c
@@ -50,7 +50,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long addr;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	addr = get_unmapped_area(NULL, STACK_TOP, PAGE_SIZE, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
@@ -70,7 +70,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	mm->context.vdso = (void *)addr;
 
 up_fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/csky/mm/fault.c arch/csky/mm/fault.c
index f76618b630f9..d86f90b95f70 100644
--- arch/csky/mm/fault.c
+++ arch/csky/mm/fault.c
@@ -116,7 +116,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
 	if (in_atomic() || !mm)
 		goto bad_area_nosemaphore;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
@@ -166,7 +166,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
 			      address);
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 	/*
@@ -174,7 +174,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
 	 * Fix it, but check if it's kernel or user first..
 	 */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 	/* User mode accesses just cause a SIGSEGV */
@@ -206,7 +206,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Kernel mode? Handle exceptions or die */
 	if (!user_mode(regs))
diff --git arch/hexagon/kernel/vdso.c arch/hexagon/kernel/vdso.c
index 25a1d9cfd4cc..55e0d55f8152 100644
--- arch/hexagon/kernel/vdso.c
+++ arch/hexagon/kernel/vdso.c
@@ -52,7 +52,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long vdso_base;
 	struct mm_struct *mm = current->mm;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	/* Try to get it loaded right near ld.so/glibc. */
@@ -76,7 +76,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	mm->context.vdso = (void *)vdso_base;
 
 up_fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/hexagon/mm/vm_fault.c arch/hexagon/mm/vm_fault.c
index b3bc71680ae4..14040908faee 100644
--- arch/hexagon/mm/vm_fault.c
+++ arch/hexagon/mm/vm_fault.c
@@ -55,7 +55,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
@@ -108,11 +108,11 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 			}
 		}
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		return;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Handle copyin/out exception cases */
 	if (!user_mode(regs))
@@ -139,7 +139,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
 	return;
 
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (user_mode(regs)) {
 		force_sig_fault(SIGSEGV, si_code, (void __user *)address);
diff --git arch/ia64/kernel/perfmon.c arch/ia64/kernel/perfmon.c
index a23c3938a1c4..a1da9e6435d8 100644
--- arch/ia64/kernel/perfmon.c
+++ arch/ia64/kernel/perfmon.c
@@ -2258,13 +2258,13 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
 	 * now we atomically find some area in the address space and
 	 * remap the buffer in it.
 	 */
-	down_write(&task->mm->mmap_sem);
+	mm_write_lock(task->mm);
 
 	/* find some free area in address space, must have mmap sem held */
 	vma->vm_start = get_unmapped_area(NULL, 0, size, 0, MAP_PRIVATE|MAP_ANONYMOUS);
 	if (IS_ERR_VALUE(vma->vm_start)) {
 		DPRINT(("Cannot find unmapped area for size %ld\n", size));
-		up_write(&task->mm->mmap_sem);
+		mm_write_unlock(task->mm);
 		goto error;
 	}
 	vma->vm_end = vma->vm_start + size;
@@ -2275,7 +2275,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
 	/* can only be applied to current task, need to have the mm semaphore held when called */
 	if (pfm_remap_buffer(vma, (unsigned long)smpl_buf, vma->vm_start, size)) {
 		DPRINT(("Can't remap buffer\n"));
-		up_write(&task->mm->mmap_sem);
+		mm_write_unlock(task->mm);
 		goto error;
 	}
 
@@ -2286,7 +2286,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
 	insert_vm_struct(mm, vma);
 
 	vm_stat_account(vma->vm_mm, vma->vm_flags, vma_pages(vma));
-	up_write(&task->mm->mmap_sem);
+	mm_write_unlock(task->mm);
 
 	/*
 	 * keep track of user level virtual address
diff --git arch/ia64/mm/fault.c arch/ia64/mm/fault.c
index c2f299fe9e04..595487df86a8 100644
--- arch/ia64/mm/fault.c
+++ arch/ia64/mm/fault.c
@@ -102,7 +102,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 	if (mask & VM_WRITE)
 		flags |= FAULT_FLAG_WRITE;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma_prev(mm, address, &prev_vma);
 	if (!vma && !prev_vma )
@@ -179,7 +179,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
   check_expansion:
@@ -210,7 +210,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 	goto good_area;
 
   bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 #ifdef CONFIG_VIRTUAL_MEM_MAP
   bad_area_no_up:
 #endif
@@ -276,7 +276,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
 	return;
 
   out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
diff --git arch/ia64/mm/init.c arch/ia64/mm/init.c
index b01d68a2d5d9..42409924a028 100644
--- arch/ia64/mm/init.c
+++ arch/ia64/mm/init.c
@@ -118,13 +118,13 @@ ia64_init_addr_space (void)
 		vma->vm_end = vma->vm_start + PAGE_SIZE;
 		vma->vm_flags = VM_DATA_DEFAULT_FLAGS|VM_GROWSUP|VM_ACCOUNT;
 		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-		down_write(&current->mm->mmap_sem);
+		mm_write_lock(current->mm);
 		if (insert_vm_struct(current->mm, vma)) {
-			up_write(&current->mm->mmap_sem);
+			mm_write_unlock(current->mm);
 			vm_area_free(vma);
 			return;
 		}
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 	}
 
 	/* map NaT-page at address zero to speed up speculative dereferencing of NULL: */
@@ -136,13 +136,13 @@ ia64_init_addr_space (void)
 			vma->vm_page_prot = __pgprot(pgprot_val(PAGE_READONLY) | _PAGE_MA_NAT);
 			vma->vm_flags = VM_READ | VM_MAYREAD | VM_IO |
 					VM_DONTEXPAND | VM_DONTDUMP;
-			down_write(&current->mm->mmap_sem);
+			mm_write_lock(current->mm);
 			if (insert_vm_struct(current->mm, vma)) {
-				up_write(&current->mm->mmap_sem);
+				mm_write_unlock(current->mm);
 				vm_area_free(vma);
 				return;
 			}
-			up_write(&current->mm->mmap_sem);
+			mm_write_unlock(current->mm);
 		}
 	}
 }
diff --git arch/m68k/kernel/sys_m68k.c arch/m68k/kernel/sys_m68k.c
index 18a4de7d5934..ba7e522d47db 100644
--- arch/m68k/kernel/sys_m68k.c
+++ arch/m68k/kernel/sys_m68k.c
@@ -399,7 +399,7 @@ sys_cacheflush (unsigned long addr, int scope, int cache, unsigned long len)
 		 * Verify that the specified address region actually belongs
 		 * to this process.
 		 */
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		vma = find_vma(current->mm, addr);
 		if (!vma || addr < vma->vm_start || addr + len > vma->vm_end)
 			goto out_unlock;
@@ -450,7 +450,7 @@ sys_cacheflush (unsigned long addr, int scope, int cache, unsigned long len)
 	    }
 	}
 out_unlock:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 out:
 	return ret;
 }
@@ -472,7 +472,7 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
 		spinlock_t *ptl;
 		unsigned long mem_value;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		pgd = pgd_offset(mm, (unsigned long)mem);
 		if (!pgd_present(*pgd))
 			goto bad_access;
@@ -501,11 +501,11 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
 			__put_user(newval, mem);
 
 		pte_unmap_unlock(pte, ptl);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		return mem_value;
 
 	      bad_access:
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		/* This is not necessarily a bad access, we can get here if
 		   a memory we're trying to write to should be copied-on-write.
 		   Make the kernel do the necessary page stuff, then re-iterate.
@@ -545,13 +545,13 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
 	struct mm_struct *mm = current->mm;
 	unsigned long mem_value;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	mem_value = *mem;
 	if (mem_value == oldval)
 		*mem = newval;
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return mem_value;
 }
 
diff --git arch/m68k/mm/fault.c arch/m68k/mm/fault.c
index e9b1d7585b43..fd65a9103d54 100644
--- arch/m68k/mm/fault.c
+++ arch/m68k/mm/fault.c
@@ -86,7 +86,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma(mm, address);
 	if (!vma)
@@ -177,7 +177,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return 0;
 
 /*
@@ -185,7 +185,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
  * us unable to handle the page fault gracefully.
  */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
@@ -214,6 +214,6 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	current->thread.faddr = address;
 
 send_sig:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return send_fault_sig(regs);
 }
diff --git arch/microblaze/mm/fault.c arch/microblaze/mm/fault.c
index e6a810b0c7ad..83e9147a71d6 100644
--- arch/microblaze/mm/fault.c
+++ arch/microblaze/mm/fault.c
@@ -137,12 +137,12 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * source.  If this is invalid we can skip the address space check,
 	 * thus avoiding the deadlock.
 	 */
-	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+	if (unlikely(!mm_read_trylock(mm))) {
 		if (kernel_mode(regs) && !search_exception_tables(regs->pc))
 			goto bad_area_nosemaphore;
 
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	}
 
 	vma = find_vma(mm, address);
@@ -249,7 +249,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * keep track of tlb+htab misses that are good addrs but
@@ -260,7 +260,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	return;
 
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 	pte_errors++;
@@ -279,7 +279,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
  * us unable to handle the page fault gracefully.
  */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		bad_page_fault(regs, address, SIGKILL);
 	else
@@ -287,7 +287,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (user_mode(regs)) {
 		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
 		return;
diff --git arch/mips/kernel/traps.c arch/mips/kernel/traps.c
index 83f2a437d9e2..a7a2acf41d5d 100644
--- arch/mips/kernel/traps.c
+++ arch/mips/kernel/traps.c
@@ -754,13 +754,13 @@ int process_fpemu_return(int sig, void __user *fault_addr, unsigned long fcr31)
 		return 1;
 
 	case SIGSEGV:
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		vma = find_vma(current->mm, (unsigned long)fault_addr);
 		if (vma && (vma->vm_start <= (unsigned long)fault_addr))
 			si_code = SEGV_ACCERR;
 		else
 			si_code = SEGV_MAPERR;
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		force_sig_fault(SIGSEGV, si_code, fault_addr);
 		return 1;
 
diff --git arch/mips/kernel/vdso.c arch/mips/kernel/vdso.c
index bc35f8499111..5b4025fc6b85 100644
--- arch/mips/kernel/vdso.c
+++ arch/mips/kernel/vdso.c
@@ -92,7 +92,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct vm_area_struct *vma;
 	int ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	/* Map delay slot emulation page */
@@ -183,6 +183,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	ret = 0;
 
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
diff --git arch/nds32/kernel/vdso.c arch/nds32/kernel/vdso.c
index 90bcae6f8554..f74873e51715 100644
--- arch/nds32/kernel/vdso.c
+++ arch/nds32/kernel/vdso.c
@@ -130,7 +130,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	vdso_mapping_len += L1_cache_info[DCACHE].aliasing_num - 1;
 #endif
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	addr = vdso_random_addr(vdso_mapping_len);
@@ -185,12 +185,12 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		goto up_fail;
 	}
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return 0;
 
 up_fail:
 	mm->context.vdso = NULL;
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/nds32/mm/fault.c arch/nds32/mm/fault.c
index 906dfb25353c..a53b70d062a4 100644
--- arch/nds32/mm/fault.c
+++ arch/nds32/mm/fault.c
@@ -127,12 +127,12 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 	 * validly references user space from well defined areas of the code,
 	 * we can bug out early if this is from code which shouldn't.
 	 */
-	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+	if (unlikely(!mm_read_trylock(mm))) {
 		if (!user_mode(regs) &&
 		    !search_exception_tables(instruction_pointer(regs)))
 			goto no_context;
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in which
@@ -257,7 +257,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 	/*
@@ -265,7 +265,7 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 	 * Fix it, but check if it's kernel or user first..
 	 */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 
@@ -325,14 +325,14 @@ void do_page_fault(unsigned long entry, unsigned long addr,
 	 */
 
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Kernel mode? Handle exceptions or die */
 	if (!user_mode(regs))
diff --git arch/nios2/mm/fault.c arch/nios2/mm/fault.c
index 6a2e716b959f..3e4c7aa33b55 100644
--- arch/nios2/mm/fault.c
+++ arch/nios2/mm/fault.c
@@ -83,11 +83,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		if (!user_mode(regs) && !search_exception_tables(regs->ea))
 			goto bad_area_nosemaphore;
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	}
 
 	vma = find_vma(mm, address);
@@ -172,7 +172,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 /*
@@ -180,7 +180,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
  * Fix it, but check if it's kernel or user first..
  */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 	/* User mode accesses just cause a SIGSEGV */
@@ -218,14 +218,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
  * us unable to handle the page fault gracefully.
  */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Kernel mode? Handle exceptions or die */
 	if (!user_mode(regs))
diff --git arch/nios2/mm/init.c arch/nios2/mm/init.c
index 2c609c2516b2..334b8134b8fd 100644
--- arch/nios2/mm/init.c
+++ arch/nios2/mm/init.c
@@ -112,14 +112,14 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct mm_struct *mm = current->mm;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	/* Map kuser helpers to user space address */
 	ret = install_special_mapping(mm, KUSER_BASE, KUSER_SIZE,
 				      VM_READ | VM_EXEC | VM_MAYREAD |
 				      VM_MAYEXEC, kuser_page);
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return ret;
 }
diff --git arch/openrisc/mm/fault.c arch/openrisc/mm/fault.c
index 5d4d3a9691d0..57d486743a1a 100644
--- arch/openrisc/mm/fault.c
+++ arch/openrisc/mm/fault.c
@@ -104,7 +104,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 		goto no_context;
 
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 
 	if (!vma)
@@ -193,7 +193,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 	/*
@@ -202,7 +202,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 
@@ -261,14 +261,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
 	__asm__ __volatile__("l.nop 42");
 	__asm__ __volatile__("l.nop 1");
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Send a sigbus, regardless of whether we were in kernel
diff --git arch/parisc/kernel/traps.c arch/parisc/kernel/traps.c
index 82fc01189488..5ecffd3420aa 100644
--- arch/parisc/kernel/traps.c
+++ arch/parisc/kernel/traps.c
@@ -717,7 +717,7 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
 		if (user_mode(regs)) {
 			struct vm_area_struct *vma;
 
-			down_read(&current->mm->mmap_sem);
+			mm_read_lock(current->mm);
 			vma = find_vma(current->mm,regs->iaoq[0]);
 			if (vma && (regs->iaoq[0] >= vma->vm_start)
 				&& (vma->vm_flags & VM_EXEC)) {
@@ -725,10 +725,10 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
 				fault_address = regs->iaoq[0];
 				fault_space = regs->iasq[0];
 
-				up_read(&current->mm->mmap_sem);
+				mm_read_unlock(current->mm);
 				break; /* call do_page_fault() */
 			}
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 		}
 		/* Fall Through */
 	case 27: 
diff --git arch/parisc/mm/fault.c arch/parisc/mm/fault.c
index adbd5e2144a3..344de2018465 100644
--- arch/parisc/mm/fault.c
+++ arch/parisc/mm/fault.c
@@ -282,7 +282,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 	if (acc_type & VM_WRITE)
 		flags |= FAULT_FLAG_WRITE;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma_prev(mm, address, &prev_vma);
 	if (!vma || address < vma->vm_start)
 		goto check_expansion;
@@ -339,7 +339,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 			goto retry;
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 check_expansion:
@@ -351,7 +351,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
  * Something tried to access memory that isn't in our memory map..
  */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (user_mode(regs)) {
 		int signo, si_code;
@@ -423,7 +423,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 	parisc_terminate("Bad Address (null pointer deref?)", regs, code, address);
 
   out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
diff --git arch/powerpc/kernel/vdso.c arch/powerpc/kernel/vdso.c
index eae9ddaecbcf..9592c988b0e3 100644
--- arch/powerpc/kernel/vdso.c
+++ arch/powerpc/kernel/vdso.c
@@ -171,7 +171,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * and end up putting it elsewhere.
 	 * Add enough to the size so that the result can be aligned.
 	 */
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, vdso_base,
 				      (vdso_pages << PAGE_SHIFT) +
@@ -211,11 +211,11 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		goto fail_mmapsem;
 	}
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return 0;
 
  fail_mmapsem:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return rc;
 }
 
diff --git arch/powerpc/kvm/book3s_64_mmu_hv.c arch/powerpc/kvm/book3s_64_mmu_hv.c
index d381526c5c9b..a04352030f88 100644
--- arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -582,7 +582,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	npages = get_user_pages_fast(hva, 1, writing ? FOLL_WRITE : 0, pages);
 	if (npages < 1) {
 		/* Check if it's an I/O mapping */
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_vma(mm, hva);
 		if (vma && vma->vm_start <= hva && hva + psize <= vma->vm_end &&
 		    (vma->vm_flags & VM_PFNMAP)) {
@@ -592,7 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			is_ci = pte_ci(__pte((pgprot_val(vma->vm_page_prot))));
 			write_ok = vma->vm_flags & VM_WRITE;
 		}
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		if (!pfn)
 			goto out_put;
 	} else {
diff --git arch/powerpc/kvm/book3s_hv.c arch/powerpc/kvm/book3s_hv.c
index 6ff3f896d908..e566b80185f4 100644
--- arch/powerpc/kvm/book3s_hv.c
+++ arch/powerpc/kvm/book3s_hv.c
@@ -4640,14 +4640,14 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 
 	/* Look up the VMA for the start of this memory slot */
 	hva = memslot->userspace_addr;
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	vma = find_vma(current->mm, hva);
 	if (!vma || vma->vm_start > hva || (vma->vm_flags & VM_IO))
 		goto up_out;
 
 	psize = vma_kernel_pagesize(vma);
 
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	/* We can handle 4k, 64k or 16M pages in the VRMA */
 	if (psize >= 0x1000000)
@@ -4680,7 +4680,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	return err;
 
  up_out:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	goto out_srcu;
 }
 
diff --git arch/powerpc/kvm/book3s_hv_uvmem.c arch/powerpc/kvm/book3s_hv_uvmem.c
index 2de264fc3156..df02fc08a4ee 100644
--- arch/powerpc/kvm/book3s_hv_uvmem.c
+++ arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -366,7 +366,7 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	 */
 	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
 			  MADV_UNMERGEABLE, &vma->vm_flags);
-	downgrade_write(&kvm->mm->mmap_sem);
+	mm_downgrade_write_lock(kvm->mm);
 	*downgrade = true;
 	if (ret)
 		return ret;
@@ -483,7 +483,7 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	down_write(&kvm->mm->mmap_sem);
+	mm_write_lock(kvm->mm);
 
 	start = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(start))
@@ -506,9 +506,9 @@ kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	mutex_unlock(&kvm->arch.uvmem_lock);
 out:
 	if (downgrade)
-		up_read(&kvm->mm->mmap_sem);
+		mm_read_unlock(kvm->mm);
 	else
-		up_write(&kvm->mm->mmap_sem);
+		mm_write_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
@@ -660,7 +660,7 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
 
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	down_read(&kvm->mm->mmap_sem);
+	mm_read_lock(kvm->mm);
 	start = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(start))
 		goto out;
@@ -673,7 +673,7 @@ kvmppc_h_svm_page_out(struct kvm *kvm, unsigned long gpa,
 	if (!kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa))
 		ret = H_SUCCESS;
 out:
-	up_read(&kvm->mm->mmap_sem);
+	mm_read_unlock(kvm->mm);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	return ret;
 }
diff --git arch/powerpc/kvm/e500_mmu_host.c arch/powerpc/kvm/e500_mmu_host.c
index 425d13806645..6e0785b3515d 100644
--- arch/powerpc/kvm/e500_mmu_host.c
+++ arch/powerpc/kvm/e500_mmu_host.c
@@ -355,7 +355,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (tlbsel == 1) {
 		struct vm_area_struct *vma;
-		down_read(&kvm->mm->mmap_sem);
+		mm_read_lock(kvm->mm);
 
 		vma = find_vma(kvm->mm, hva);
 		if (vma && hva >= vma->vm_start &&
@@ -441,7 +441,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 			tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
 		}
 
-		up_read(&kvm->mm->mmap_sem);
+		mm_read_unlock(kvm->mm);
 	}
 
 	if (likely(!pfnmap)) {
diff --git arch/powerpc/mm/book3s64/iommu_api.c arch/powerpc/mm/book3s64/iommu_api.c
index 56cc84520577..90ef878d7d91 100644
--- arch/powerpc/mm/book3s64/iommu_api.c
+++ arch/powerpc/mm/book3s64/iommu_api.c
@@ -96,7 +96,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 		goto unlock_exit;
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
 			sizeof(struct vm_area_struct *);
 	chunk = min(chunk, entries);
@@ -114,7 +114,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 			pinned += ret;
 		break;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (pinned != entries) {
 		if (!ret)
 			ret = -EFAULT;
diff --git arch/powerpc/mm/book3s64/subpage_prot.c arch/powerpc/mm/book3s64/subpage_prot.c
index 2ef24a53f4c9..e025e50798ff 100644
--- arch/powerpc/mm/book3s64/subpage_prot.c
+++ arch/powerpc/mm/book3s64/subpage_prot.c
@@ -92,7 +92,7 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
 	size_t nw;
 	unsigned long next, limit;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	spt = mm_ctx_subpage_prot(&mm->context);
 	if (!spt)
@@ -127,7 +127,7 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
 	}
 
 err_out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -217,7 +217,7 @@ SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
 	if (!access_ok(map, (len >> PAGE_SHIFT) * sizeof(u32)))
 		return -EFAULT;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	spt = mm_ctx_subpage_prot(&mm->context);
 	if (!spt) {
@@ -267,11 +267,11 @@ SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
 		if (addr + (nw << PAGE_SHIFT) > next)
 			nw = (next - addr) >> PAGE_SHIFT;
 
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 		if (__copy_from_user(spp, map, nw * sizeof(u32)))
 			return -EFAULT;
 		map += nw;
-		down_write(&mm->mmap_sem);
+		mm_write_lock(mm);
 
 		/* now flush any existing HPTEs for the range */
 		hpte_flush_range(mm, addr, nw);
@@ -280,6 +280,6 @@ SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
 		spt->maxaddr = limit;
 	err = 0;
  out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return err;
 }
diff --git arch/powerpc/mm/copro_fault.c arch/powerpc/mm/copro_fault.c
index beb060b96632..b84d5046f052 100644
--- arch/powerpc/mm/copro_fault.c
+++ arch/powerpc/mm/copro_fault.c
@@ -33,7 +33,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 	if (mm->pgd == NULL)
 		return -EFAULT;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	ret = -EFAULT;
 	vma = find_vma(mm, ea);
 	if (!vma)
@@ -82,7 +82,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
 		current->min_flt++;
 
 out_unlock:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
diff --git arch/powerpc/mm/fault.c arch/powerpc/mm/fault.c
index b5047f9b5dec..2288f1d195dd 100644
--- arch/powerpc/mm/fault.c
+++ arch/powerpc/mm/fault.c
@@ -108,7 +108,7 @@ static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return __bad_area_nosemaphore(regs, address, si_code);
 }
@@ -515,12 +515,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * source.  If this is invalid we can skip the address space check,
 	 * thus avoiding the deadlock.
 	 */
-	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+	if (unlikely(!mm_read_trylock(mm))) {
 		if (!is_user && !search_exception_tables(regs->nip))
 			return bad_area_nosemaphore(regs, address);
 
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -544,7 +544,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		if (!must_retry)
 			return bad_area(regs, address);
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		if (fault_in_pages_readable((const char __user *)regs->nip,
 					    sizeof(unsigned int)))
 			return bad_area_nosemaphore(regs, address);
@@ -576,7 +576,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 
 		int pkey = vma_pkey(vma);
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		return bad_key_fault_exception(regs, address, pkey);
 	}
 #endif /* CONFIG_PPC_MEM_KEYS */
@@ -607,7 +607,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return is_user ? 0 : SIGBUS;
 	}
 
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	if (unlikely(fault & VM_FAULT_ERROR))
 		return mm_fault_error(regs, address, fault);
diff --git arch/powerpc/oprofile/cell/spu_task_sync.c arch/powerpc/oprofile/cell/spu_task_sync.c
index 0caec3d8d436..7dad5c398bf3 100644
--- arch/powerpc/oprofile/cell/spu_task_sync.c
+++ arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -332,7 +332,7 @@ get_exec_dcookie_and_offset(struct spu *spu, unsigned int *offsetp,
 		fput(exe_file);
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		if (vma->vm_start > spu_ref || vma->vm_end <= spu_ref)
 			continue;
@@ -349,13 +349,13 @@ get_exec_dcookie_and_offset(struct spu *spu, unsigned int *offsetp,
 	*spu_bin_dcookie = fast_get_dcookie(&vma->vm_file->f_path);
 	pr_debug("got dcookie for %pD\n", vma->vm_file);
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 out:
 	return app_cookie;
 
 fail_no_image_cookie:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	printk(KERN_ERR "SPU_PROF: "
 		"%s, line %d: Cannot find dcookie for SPU binary\n",
diff --git arch/powerpc/platforms/cell/spufs/file.c arch/powerpc/platforms/cell/spufs/file.c
index c0f950a3f4e1..fed452f5db84 100644
--- arch/powerpc/platforms/cell/spufs/file.c
+++ arch/powerpc/platforms/cell/spufs/file.c
@@ -336,11 +336,11 @@ static vm_fault_t spufs_ps_fault(struct vm_fault *vmf,
 		goto refault;
 
 	if (ctx->state == SPU_STATE_SAVED) {
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		spu_context_nospu_trace(spufs_ps_fault__sleep, ctx);
 		err = spufs_wait(ctx->run_wq, ctx->state == SPU_STATE_RUNNABLE);
 		spu_context_trace(spufs_ps_fault__wake, ctx, ctx->spu);
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 	} else {
 		area = ctx->spu->problem_phys + ps_offs;
 		ret = vmf_insert_pfn(vmf->vma, vmf->address,
diff --git arch/riscv/kernel/vdso.c arch/riscv/kernel/vdso.c
index 484d95a70907..5b4fad784795 100644
--- arch/riscv/kernel/vdso.c
+++ arch/riscv/kernel/vdso.c
@@ -61,7 +61,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 
 	vdso_len = (vdso_pages + 1) << PAGE_SHIFT;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	vdso_base = get_unmapped_area(NULL, 0, vdso_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		ret = vdso_base;
@@ -83,7 +83,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 		mm->context.vdso = NULL;
 
 end:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/riscv/mm/fault.c arch/riscv/mm/fault.c
index cf7248e07f43..eb1a278a52c3 100644
--- arch/riscv/mm/fault.c
+++ arch/riscv/mm/fault.c
@@ -69,7 +69,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, addr);
 	if (unlikely(!vma))
 		goto bad_area;
@@ -160,7 +160,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 	/*
@@ -168,7 +168,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	 * Fix it, but check if it's kernel or user first.
 	 */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	/* User mode accesses just cause a SIGSEGV */
 	if (user_mode(regs)) {
 		do_trap(regs, SIGSEGV, code, addr);
@@ -196,14 +196,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
 	 * (which will retry the fault, or kill us if we got oom-killed).
 	 */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	/* Kernel mode? Handle exceptions or die */
 	if (!user_mode(regs))
 		goto no_context;
diff --git arch/s390/kernel/vdso.c arch/s390/kernel/vdso.c
index bcc9bdb39ba2..7e27f81eefd0 100644
--- arch/s390/kernel/vdso.c
+++ arch/s390/kernel/vdso.c
@@ -208,7 +208,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * it at vdso_base which is the "natural" base for it, but we might
 	 * fail and end up putting it elsewhere.
 	 */
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_pages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
@@ -239,7 +239,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	rc = 0;
 
 out_up:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return rc;
 }
 
diff --git arch/s390/kvm/gaccess.c arch/s390/kvm/gaccess.c
index 07d30ffcfa41..25d0760993eb 100644
--- arch/s390/kvm/gaccess.c
+++ arch/s390/kvm/gaccess.c
@@ -1170,7 +1170,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	int dat_protection, fake;
 	int rc;
 
-	down_read(&sg->mm->mmap_sem);
+	mm_read_lock(sg->mm);
 	/*
 	 * We don't want any guest-2 tables to change - so the parent
 	 * tables/pointers we read stay valid - unshadowing is however
@@ -1199,6 +1199,6 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
 	if (!rc)
 		rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
 	ipte_unlock(vcpu);
-	up_read(&sg->mm->mmap_sem);
+	mm_read_unlock(sg->mm);
 	return rc;
 }
diff --git arch/s390/kvm/kvm-s390.c arch/s390/kvm/kvm-s390.c
index d9e6bf3d54f0..77026ec47470 100644
--- arch/s390/kvm/kvm-s390.c
+++ arch/s390/kvm/kvm-s390.c
@@ -753,9 +753,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 			r = -EINVAL;
 		else {
 			r = 0;
-			down_write(&kvm->mm->mmap_sem);
+			mm_write_lock(kvm->mm);
 			kvm->mm->context.allow_gmap_hpage_1m = 1;
-			up_write(&kvm->mm->mmap_sem);
+			mm_write_unlock(kvm->mm);
 			/*
 			 * We might have to create fake 4k page
 			 * tables. To avoid that the hardware works on
@@ -1805,7 +1805,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 	if (!keys)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	for (i = 0; i < args->count; i++) {
 		hva = gfn_to_hva(kvm, args->start_gfn + i);
@@ -1819,7 +1819,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 			break;
 	}
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	if (!r) {
 		r = copy_to_user((uint8_t __user *)args->skeydata_addr, keys,
@@ -1863,7 +1863,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 		goto out;
 
 	i = 0;
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	srcu_idx = srcu_read_lock(&kvm->srcu);
         while (i < args->count) {
 		unlocked = false;
@@ -1890,7 +1890,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 			i++;
 	}
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 out:
 	kvfree(keys);
 	return r;
@@ -2073,14 +2073,14 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
 	if (!values)
 		return -ENOMEM;
 
-	down_read(&kvm->mm->mmap_sem);
+	mm_read_lock(kvm->mm);
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	if (peek)
 		ret = kvm_s390_peek_cmma(kvm, args, values, bufsize);
 	else
 		ret = kvm_s390_get_cmma(kvm, args, values, bufsize);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-	up_read(&kvm->mm->mmap_sem);
+	mm_read_unlock(kvm->mm);
 
 	if (kvm->arch.migration_mode)
 		args->remaining = atomic64_read(&kvm->arch.cmma_dirty_pages);
@@ -2130,7 +2130,7 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 		goto out;
 	}
 
-	down_read(&kvm->mm->mmap_sem);
+	mm_read_lock(kvm->mm);
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	for (i = 0; i < args->count; i++) {
 		hva = gfn_to_hva(kvm, args->start_gfn + i);
@@ -2145,12 +2145,12 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
 		set_pgste_bits(kvm->mm, hva, mask, pgstev);
 	}
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-	up_read(&kvm->mm->mmap_sem);
+	mm_read_unlock(kvm->mm);
 
 	if (!kvm->mm->context.uses_cmm) {
-		down_write(&kvm->mm->mmap_sem);
+		mm_write_lock(kvm->mm);
 		kvm->mm->context.uses_cmm = 1;
-		up_write(&kvm->mm->mmap_sem);
+		mm_write_unlock(kvm->mm);
 	}
 out:
 	vfree(bits);
diff --git arch/s390/kvm/priv.c arch/s390/kvm/priv.c
index ed52ffa8d5d4..f9b4013e2f4e 100644
--- arch/s390/kvm/priv.c
+++ arch/s390/kvm/priv.c
@@ -270,18 +270,18 @@ static int handle_iske(struct kvm_vcpu *vcpu)
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 retry:
 	unlocked = false;
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	rc = get_guest_storage_key(current->mm, vmaddr, &key);
 
 	if (rc) {
 		rc = fixup_user_fault(current, current->mm, vmaddr,
 				      FAULT_FLAG_WRITE, &unlocked);
 		if (!rc) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			goto retry;
 		}
 	}
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	if (rc == -EFAULT)
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 	if (rc < 0)
@@ -317,17 +317,17 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 retry:
 	unlocked = false;
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	rc = reset_guest_reference_bit(current->mm, vmaddr);
 	if (rc < 0) {
 		rc = fixup_user_fault(current, current->mm, vmaddr,
 				      FAULT_FLAG_WRITE, &unlocked);
 		if (!rc) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			goto retry;
 		}
 	}
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	if (rc == -EFAULT)
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 	if (rc < 0)
@@ -385,7 +385,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
 		if (kvm_is_error_hva(vmaddr))
 			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		rc = cond_set_guest_storage_key(current->mm, vmaddr, key, &oldkey,
 						m3 & SSKE_NQ, m3 & SSKE_MR,
 						m3 & SSKE_MC);
@@ -395,7 +395,7 @@ static int handle_sske(struct kvm_vcpu *vcpu)
 					      FAULT_FLAG_WRITE, &unlocked);
 			rc = !rc ? -EAGAIN : rc;
 		}
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		if (rc == -EFAULT)
 			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 		if (rc < 0)
@@ -1084,7 +1084,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 
 			if (rc)
 				return rc;
-			down_read(&current->mm->mmap_sem);
+			mm_read_lock(current->mm);
 			rc = cond_set_guest_storage_key(current->mm, vmaddr,
 							key, NULL, nq, mr, mc);
 			if (rc < 0) {
@@ -1092,7 +1092,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 						      FAULT_FLAG_WRITE, &unlocked);
 				rc = !rc ? -EAGAIN : rc;
 			}
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			if (rc == -EFAULT)
 				return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 			if (rc == -EAGAIN)
@@ -1213,9 +1213,9 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 		 * already correct, we do nothing and avoid the lock.
 		 */
 		if (vcpu->kvm->mm->context.uses_cmm == 0) {
-			down_write(&vcpu->kvm->mm->mmap_sem);
+			mm_write_lock(vcpu->kvm->mm);
 			vcpu->kvm->mm->context.uses_cmm = 1;
-			up_write(&vcpu->kvm->mm->mmap_sem);
+			mm_write_unlock(vcpu->kvm->mm);
 		}
 		/*
 		 * If we are here, we are supposed to have CMMA enabled in
@@ -1232,11 +1232,11 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 	} else {
 		int srcu_idx;
 
-		down_read(&vcpu->kvm->mm->mmap_sem);
+		mm_read_lock(vcpu->kvm->mm);
 		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		i = __do_essa(vcpu, orc);
 		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
-		up_read(&vcpu->kvm->mm->mmap_sem);
+		mm_read_unlock(vcpu->kvm->mm);
 		if (i < 0)
 			return i;
 		/* Account for the possible extra cbrl entry */
@@ -1244,10 +1244,10 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 	}
 	vcpu->arch.sie_block->cbrlo &= PAGE_MASK;	/* reset nceo */
 	cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
-	down_read(&gmap->mm->mmap_sem);
+	mm_read_lock(gmap->mm);
 	for (i = 0; i < entries; ++i)
 		__gmap_zap(gmap, cbrlo[i]);
-	up_read(&gmap->mm->mmap_sem);
+	mm_read_unlock(gmap->mm);
 	return 0;
 }
 
diff --git arch/s390/mm/fault.c arch/s390/mm/fault.c
index 7b0bb475c166..757793fbbb2b 100644
--- arch/s390/mm/fault.c
+++ arch/s390/mm/fault.c
@@ -434,7 +434,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 		flags |= FAULT_FLAG_USER;
 	if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400)
 		flags |= FAULT_FLAG_WRITE;
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	gmap = NULL;
 	if (IS_ENABLED(CONFIG_PGSTE) && type == GMAP_FAULT) {
@@ -519,7 +519,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 			flags &= ~(FAULT_FLAG_ALLOW_RETRY |
 				   FAULT_FLAG_RETRY_NOWAIT);
 			flags |= FAULT_FLAG_TRIED;
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			goto retry;
 		}
 	}
@@ -537,7 +537,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
 	}
 	fault = 0;
 out_up:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 out:
 	return fault;
 }
diff --git arch/s390/mm/gmap.c arch/s390/mm/gmap.c
index edcdca97e85e..523f27a7ccaa 100644
--- arch/s390/mm/gmap.c
+++ arch/s390/mm/gmap.c
@@ -405,10 +405,10 @@ int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len)
 		return -EINVAL;
 
 	flush = 0;
-	down_write(&gmap->mm->mmap_sem);
+	mm_write_lock(gmap->mm);
 	for (off = 0; off < len; off += PMD_SIZE)
 		flush |= __gmap_unmap_by_gaddr(gmap, to + off);
-	up_write(&gmap->mm->mmap_sem);
+	mm_write_unlock(gmap->mm);
 	if (flush)
 		gmap_flush_tlb(gmap);
 	return 0;
@@ -438,7 +438,7 @@ int gmap_map_segment(struct gmap *gmap, unsigned long from,
 		return -EINVAL;
 
 	flush = 0;
-	down_write(&gmap->mm->mmap_sem);
+	mm_write_lock(gmap->mm);
 	for (off = 0; off < len; off += PMD_SIZE) {
 		/* Remove old translation */
 		flush |= __gmap_unmap_by_gaddr(gmap, to + off);
@@ -448,7 +448,7 @@ int gmap_map_segment(struct gmap *gmap, unsigned long from,
 				      (void *) from + off))
 			break;
 	}
-	up_write(&gmap->mm->mmap_sem);
+	mm_write_unlock(gmap->mm);
 	if (flush)
 		gmap_flush_tlb(gmap);
 	if (off >= len)
@@ -495,9 +495,9 @@ unsigned long gmap_translate(struct gmap *gmap, unsigned long gaddr)
 {
 	unsigned long rc;
 
-	down_read(&gmap->mm->mmap_sem);
+	mm_read_lock(gmap->mm);
 	rc = __gmap_translate(gmap, gaddr);
-	up_read(&gmap->mm->mmap_sem);
+	mm_read_unlock(gmap->mm);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(gmap_translate);
@@ -640,7 +640,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 	int rc;
 	bool unlocked;
 
-	down_read(&gmap->mm->mmap_sem);
+	mm_read_lock(gmap->mm);
 
 retry:
 	unlocked = false;
@@ -663,7 +663,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 
 	rc = __gmap_link(gmap, gaddr, vmaddr);
 out_up:
-	up_read(&gmap->mm->mmap_sem);
+	mm_read_unlock(gmap->mm);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(gmap_fault);
@@ -696,7 +696,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 	unsigned long gaddr, vmaddr, size;
 	struct vm_area_struct *vma;
 
-	down_read(&gmap->mm->mmap_sem);
+	mm_read_lock(gmap->mm);
 	for (gaddr = from; gaddr < to;
 	     gaddr = (gaddr + PMD_SIZE) & PMD_MASK) {
 		/* Find the vm address for the guest address */
@@ -719,7 +719,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
 		zap_page_range(vma, vmaddr, size);
 	}
-	up_read(&gmap->mm->mmap_sem);
+	mm_read_unlock(gmap->mm);
 }
 EXPORT_SYMBOL_GPL(gmap_discard);
 
@@ -1102,9 +1102,9 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
 		return -EINVAL;
 	if (!MACHINE_HAS_ESOP && prot == PROT_READ)
 		return -EINVAL;
-	down_read(&gmap->mm->mmap_sem);
+	mm_read_lock(gmap->mm);
 	rc = gmap_protect_range(gmap, gaddr, len, prot, GMAP_NOTIFY_MPROT);
-	up_read(&gmap->mm->mmap_sem);
+	mm_read_unlock(gmap->mm);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
@@ -1692,11 +1692,11 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
 	}
 	spin_unlock(&parent->shadow_lock);
 	/* protect after insertion, so it will get properly invalidated */
-	down_read(&parent->mm->mmap_sem);
+	mm_read_lock(parent->mm);
 	rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
 				((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
 				PROT_READ, GMAP_NOTIFY_SHADOW);
-	up_read(&parent->mm->mmap_sem);
+	mm_read_unlock(parent->mm);
 	spin_lock(&parent->shadow_lock);
 	new->initialized = true;
 	if (rc) {
@@ -2538,12 +2538,12 @@ int s390_enable_sie(void)
 	/* Fail if the page tables are 2K */
 	if (!mm_alloc_pgste(mm))
 		return -EINVAL;
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	mm->context.has_pgste = 1;
 	/* split thp mappings and disable thp for future mappings */
 	thp_split_mm(mm);
 	walk_page_range(mm, 0, TASK_SIZE, &zap_zero_walk_ops, NULL);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(s390_enable_sie);
@@ -2596,7 +2596,7 @@ int s390_enable_skey(void)
 	struct vm_area_struct *vma;
 	int rc = 0;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	if (mm_uses_skeys(mm))
 		goto out_up;
 
@@ -2614,7 +2614,7 @@ int s390_enable_skey(void)
 	walk_page_range(mm, 0, TASK_SIZE, &enable_skey_walk_ops, NULL);
 
 out_up:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
@@ -2635,8 +2635,8 @@ static const struct mm_walk_ops reset_cmma_walk_ops = {
 
 void s390_reset_cmma(struct mm_struct *mm)
 {
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	walk_page_range(mm, 0, TASK_SIZE, &reset_cmma_walk_ops, NULL);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 }
 EXPORT_SYMBOL_GPL(s390_reset_cmma);
diff --git arch/s390/pci/pci_mmio.c arch/s390/pci/pci_mmio.c
index 7d42a8794f10..0124284c0374 100644
--- arch/s390/pci/pci_mmio.c
+++ arch/s390/pci/pci_mmio.c
@@ -18,7 +18,7 @@ static long get_pfn(unsigned long user_addr, unsigned long access,
 	struct vm_area_struct *vma;
 	long ret;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	ret = -EINVAL;
 	vma = find_vma(current->mm, user_addr);
 	if (!vma)
@@ -28,7 +28,7 @@ static long get_pfn(unsigned long user_addr, unsigned long access,
 		goto out;
 	ret = follow_pfn(vma, user_addr, pfn);
 out:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	return ret;
 }
 
diff --git arch/sh/kernel/sys_sh.c arch/sh/kernel/sys_sh.c
index f8afc014e084..70d0a8c2e42e 100644
--- arch/sh/kernel/sys_sh.c
+++ arch/sh/kernel/sys_sh.c
@@ -69,10 +69,10 @@ asmlinkage int sys_cacheflush(unsigned long addr, unsigned long len, int op)
 	if (addr + len < addr)
 		return -EFAULT;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	vma = find_vma (current->mm, addr);
 	if (vma == NULL || addr < vma->vm_start || addr + len > vma->vm_end) {
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		return -EFAULT;
 	}
 
@@ -91,6 +91,6 @@ asmlinkage int sys_cacheflush(unsigned long addr, unsigned long len, int op)
 	if (op & CACHEFLUSH_I)
 		flush_icache_range(addr, addr+len);
 
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	return 0;
 }
diff --git arch/sh/kernel/vsyscall/vsyscall.c arch/sh/kernel/vsyscall/vsyscall.c
index 98494480f048..ad30993141a6 100644
--- arch/sh/kernel/vsyscall/vsyscall.c
+++ arch/sh/kernel/vsyscall/vsyscall.c
@@ -61,7 +61,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long addr;
 	int ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
@@ -80,7 +80,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	current->mm->context.vdso = (void *)addr;
 
 up_fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/sh/mm/fault.c arch/sh/mm/fault.c
index 5f51456f4fc7..63f3aec63972 100644
--- arch/sh/mm/fault.c
+++ arch/sh/mm/fault.c
@@ -261,7 +261,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	__bad_area_nosemaphore(regs, error_code, address, si_code);
 }
@@ -285,7 +285,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 	struct task_struct *tsk = current;
 	struct mm_struct *mm = tsk->mm;
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Kernel mode? Handle exceptions or die: */
 	if (!user_mode(regs))
@@ -304,7 +304,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	 */
 	if (fatal_signal_pending(current)) {
 		if (!(fault & VM_FAULT_RETRY))
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 		if (!user_mode(regs))
 			no_context(regs, error_code, address);
 		return 1;
@@ -316,11 +316,11 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	if (fault & VM_FAULT_OOM) {
 		/* Kernel mode? Handle exceptions or die: */
 		if (!user_mode(regs)) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			no_context(regs, error_code, address);
 			return 1;
 		}
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 		/*
 		 * We ran out of memory, call the OOM killer, and return the
@@ -424,7 +424,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 	}
 
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
@@ -493,5 +493,5 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 }
diff --git arch/sparc/mm/fault_32.c arch/sparc/mm/fault_32.c
index 89976c9b936c..2435daad854e 100644
--- arch/sparc/mm/fault_32.c
+++ arch/sparc/mm/fault_32.c
@@ -196,7 +196,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	if (!from_user && address >= PAGE_OFFSET)
 		goto bad_area;
@@ -273,7 +273,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 	/*
@@ -281,7 +281,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	 * Fix it, but check if it's kernel or user first..
 	 */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 	/* User mode accesses just cause a SIGSEGV */
@@ -330,7 +330,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
  * us unable to handle the page fault gracefully.
  */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (from_user) {
 		pagefault_out_of_memory();
 		return;
@@ -338,7 +338,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
 	goto no_context;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	do_fault_siginfo(BUS_ADRERR, SIGBUS, regs, text_fault);
 	if (!from_user)
 		goto no_context;
@@ -392,7 +392,7 @@ static void force_user_fault(unsigned long address, int write)
 
 	code = SEGV_MAPERR;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
@@ -417,15 +417,15 @@ static void force_user_fault(unsigned long address, int write)
 	case VM_FAULT_OOM:
 		goto do_sigbus;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	__do_fault_siginfo(code, SIGSEGV, tsk->thread.kregs, address);
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	__do_fault_siginfo(BUS_ADRERR, SIGBUS, tsk->thread.kregs, address);
 }
 
diff --git arch/sparc/mm/fault_64.c arch/sparc/mm/fault_64.c
index 2371fb6b97e4..918b42b6467f 100644
--- arch/sparc/mm/fault_64.c
+++ arch/sparc/mm/fault_64.c
@@ -315,7 +315,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		if ((regs->tstate & TSTATE_PRIV) &&
 		    !search_exception_tables(regs->tpc)) {
 			insn = get_fault_insn(regs, insn);
@@ -323,7 +323,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 		}
 
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	}
 
 	if (fault_code & FAULT_CODE_BAD_RA)
@@ -456,7 +456,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 			goto retry;
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	mm_rss = get_mm_rss(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -487,7 +487,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 	 */
 bad_area:
 	insn = get_fault_insn(regs, insn);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 handle_kernel_fault:
 	do_kernel_fault(regs, si_code, fault_code, insn, address);
@@ -499,7 +499,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
  */
 out_of_memory:
 	insn = get_fault_insn(regs, insn);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!(regs->tstate & TSTATE_PRIV)) {
 		pagefault_out_of_memory();
 		goto exit_exception;
@@ -512,7 +512,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 
 do_sigbus:
 	insn = get_fault_insn(regs, insn);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Send a sigbus, regardless of whether we were in kernel
diff --git arch/sparc/vdso/vma.c arch/sparc/vdso/vma.c
index 9961b0f81693..a2db050aba7a 100644
--- arch/sparc/vdso/vma.c
+++ arch/sparc/vdso/vma.c
@@ -366,7 +366,7 @@ static int map_vdso(const struct vdso_image *image,
 	unsigned long text_start, addr = 0;
 	int ret = 0;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	/*
 	 * First, get an unmapped region: then randomize it, and make sure that
@@ -422,7 +422,7 @@ static int map_vdso(const struct vdso_image *image,
 	if (ret)
 		current->mm->context.vdso = NULL;
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
diff --git arch/um/include/asm/mmu_context.h arch/um/include/asm/mmu_context.h
index 5aee0626e390..7bd591231e2d 100644
--- arch/um/include/asm/mmu_context.h
+++ arch/um/include/asm/mmu_context.h
@@ -54,7 +54,7 @@ static inline void activate_mm(struct mm_struct *old, struct mm_struct *new)
 	__switch_mm(&new->context.id);
 	down_write_nested(&new->mmap_sem, 1);
 	uml_setup_stubs(new);
-	up_write(&new->mmap_sem);
+	mm_write_unlock(new);
 }
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
diff --git arch/um/kernel/tlb.c arch/um/kernel/tlb.c
index 80a358c6d652..1c9b198a6278 100644
--- arch/um/kernel/tlb.c
+++ arch/um/kernel/tlb.c
@@ -350,7 +350,7 @@ void fix_range_common(struct mm_struct *mm, unsigned long start_addr,
 		printk(KERN_ERR "fix_range_common: failed, killing current "
 		       "process: %d\n", task_tgid_vnr(current));
 		/* We are under mmap_sem, release it such that current can terminate */
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 		force_sig(SIGKILL);
 		do_signal(&current->thread.regs);
 	}
diff --git arch/um/kernel/trap.c arch/um/kernel/trap.c
index 818553064f04..8c0f0882ca8f 100644
--- arch/um/kernel/trap.c
+++ arch/um/kernel/trap.c
@@ -47,7 +47,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
 	if (is_user)
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto out;
@@ -124,7 +124,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
 #endif
 	flush_tlb_page(vma, address);
 out:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 out_nosemaphore:
 	return err;
 
@@ -133,7 +133,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
 	 * We ran out of memory, call the OOM killer, and return the userspace
 	 * (which will retry the fault, or kill us if we got oom-killed).
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!is_user)
 		goto out_nosemaphore;
 	pagefault_out_of_memory();
diff --git arch/unicore32/mm/fault.c arch/unicore32/mm/fault.c
index 76342de9cf8c..2f20cdac675f 100644
--- arch/unicore32/mm/fault.c
+++ arch/unicore32/mm/fault.c
@@ -224,12 +224,12 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	 * validly references user space from well defined areas of the code,
 	 * we can bug out early if this is from code which shouldn't.
 	 */
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		if (!user_mode(regs)
 		    && !search_exception_tables(regs->UCreg_pc))
 			goto no_context;
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -266,7 +266,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Handle the "normal" case first - VM_FAULT_MAJOR
diff --git arch/x86/entry/vdso/vma.c arch/x86/entry/vdso/vma.c
index f5937742b290..87278df00a46 100644
--- arch/x86/entry/vdso/vma.c
+++ arch/x86/entry/vdso/vma.c
@@ -150,7 +150,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 	unsigned long text_start;
 	int ret = 0;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	addr = get_unmapped_area(NULL, addr,
@@ -193,7 +193,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
 	}
 
 up_fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
@@ -255,7 +255,7 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr)
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	/*
 	 * Check if we have already mapped vdso blob - fail to prevent
 	 * abusing from userspace install_speciall_mapping, which may
@@ -266,11 +266,11 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr)
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		if (vma_is_special_mapping(vma, &vdso_mapping) ||
 				vma_is_special_mapping(vma, &vvar_mapping)) {
-			up_write(&mm->mmap_sem);
+			mm_write_unlock(mm);
 			return -EEXIST;
 		}
 	}
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return map_vdso(image, addr);
 }
diff --git arch/x86/kernel/vm86_32.c arch/x86/kernel/vm86_32.c
index a76c12b38e92..6a05cf416e78 100644
--- arch/x86/kernel/vm86_32.c
+++ arch/x86/kernel/vm86_32.c
@@ -172,7 +172,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
 	pte_t *pte;
 	int i;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	pgd = pgd_offset(mm, 0xA0000);
 	if (pgd_none_or_clear_bad(pgd))
 		goto out;
@@ -198,7 +198,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
 	}
 	pte_unmap_unlock(pte, ptl);
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	flush_tlb_mm_range(mm, 0xA0000, 0xA0000 + 32*PAGE_SIZE, PAGE_SHIFT, false);
 }
 
diff --git arch/x86/mm/debug_pagetables.c arch/x86/mm/debug_pagetables.c
index 39001a401eff..6e3f19779385 100644
--- arch/x86/mm/debug_pagetables.c
+++ arch/x86/mm/debug_pagetables.c
@@ -16,9 +16,9 @@ DEFINE_SHOW_ATTRIBUTE(ptdump);
 static int ptdump_curknl_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 	return 0;
 }
@@ -29,9 +29,9 @@ DEFINE_SHOW_ATTRIBUTE(ptdump_curknl);
 static int ptdump_curusr_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 	return 0;
 }
diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index 304d31d8cbbc..a8ce9e160b72 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -928,7 +928,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -1379,7 +1379,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * 1. Failed to acquire mmap_sem, and
 	 * 2. The access did not originate in userspace.
 	 */
-	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+	if (unlikely(!mm_read_trylock(mm))) {
 		if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
 			/*
 			 * Fault from code in kernel from
@@ -1389,7 +1389,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			return;
 		}
 retry:
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -1464,7 +1464,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		mm_fault_error(regs, hw_error_code, address, fault);
 		return;
diff --git arch/x86/mm/mpx.c arch/x86/mm/mpx.c
index 895fb7a9294d..3835c18020b8 100644
--- arch/x86/mm/mpx.c
+++ arch/x86/mm/mpx.c
@@ -52,10 +52,10 @@ static unsigned long mpx_mmap(unsigned long len)
 	if (len != mpx_bt_size_bytes(mm))
 		return -EINVAL;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE,
 		       MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	if (populate)
 		mm_populate(addr, populate);
 
@@ -227,7 +227,7 @@ int mpx_enable_management(void)
 	 * unmap path; we can just use mm->context.bd_addr instead.
 	 */
 	bd_base = mpx_get_bounds_dir();
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	/* MPX doesn't support addresses above 47 bits yet. */
 	if (find_vma(mm, DEFAULT_MAP_WINDOW)) {
@@ -241,7 +241,7 @@ int mpx_enable_management(void)
 	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
 		ret = -ENXIO;
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 
@@ -252,9 +252,9 @@ int mpx_disable_management(void)
 	if (!cpu_feature_enabled(X86_FEATURE_MPX))
 		return -ENXIO;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return 0;
 }
 
diff --git arch/x86/um/vdso/vma.c arch/x86/um/vdso/vma.c
index 9e7c4aba6c3a..16f50eca50e3 100644
--- arch/x86/um/vdso/vma.c
+++ arch/x86/um/vdso/vma.c
@@ -58,7 +58,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	if (!vdso_enabled)
 		return 0;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
@@ -66,7 +66,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 		vdsop);
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return err;
 }
diff --git arch/xtensa/mm/fault.c arch/xtensa/mm/fault.c
index bee30a77cd70..b6f1a86eea7f 100644
--- arch/xtensa/mm/fault.c
+++ arch/xtensa/mm/fault.c
@@ -74,7 +74,7 @@ void do_page_fault(struct pt_regs *regs)
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 
 	if (!vma)
@@ -140,7 +140,7 @@ void do_page_fault(struct pt_regs *regs)
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 	if (flags & VM_FAULT_MAJOR)
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
@@ -153,7 +153,7 @@ void do_page_fault(struct pt_regs *regs)
 	 * Fix it, but check if it's kernel or user first..
 	 */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (user_mode(regs)) {
 		current->thread.bad_vaddr = address;
 		current->thread.error_code = is_write;
@@ -168,7 +168,7 @@ void do_page_fault(struct pt_regs *regs)
 	 * us unable to handle the page fault gracefully.
 	 */
 out_of_memory:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		bad_page_fault(regs, address, SIGKILL);
 	else
@@ -176,7 +176,7 @@ void do_page_fault(struct pt_regs *regs)
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Send a sigbus, regardless of whether we were in kernel
 	 * or user mode.
diff --git drivers/android/binder_alloc.c drivers/android/binder_alloc.c
index 2d8b9b91dee0..caddf155fcab 100644
--- drivers/android/binder_alloc.c
+++ drivers/android/binder_alloc.c
@@ -212,7 +212,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
 		mm = alloc->vma_vm_mm;
 
 	if (mm) {
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = alloc->vma;
 	}
 
@@ -270,7 +270,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
 		trace_binder_alloc_page_end(alloc, index);
 	}
 	if (mm) {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		mmput(mm);
 	}
 	return 0;
@@ -303,7 +303,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
 	}
 err_no_vma:
 	if (mm) {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		mmput(mm);
 	}
 	return vma ? -ENOMEM : -ESRCH;
diff --git drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 888209eb8cec..4ad4a09cf588 100644
--- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1343,9 +1343,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	 * concurrently and the queues are actually stopped
 	 */
 	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-		down_write(&current->mm->mmap_sem);
+		mm_write_lock(current->mm);
 		is_invalid_userptr = atomic_read(&mem->invalid);
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 	}
 
 	mutex_lock(&mem->lock);
diff --git drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2616e2eafdeb..d6d57c247ac6 100644
--- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -838,7 +838,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 		goto out_free_ranges;
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, start);
 	if (unlikely(!vma || start < vma->vm_start)) {
 		r = -EFAULT;
@@ -849,15 +849,15 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 		r = -EPERM;
 		goto out_unlock;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
 
 retry:
 	range->notifier_seq = mmu_interval_read_begin(&bo->notifier);
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	r = hmm_range_fault(range, 0);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (unlikely(r <= 0)) {
 		/*
 		 * FIXME: This timeout should encompass the retry from
@@ -886,7 +886,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 	return 0;
 
 out_unlock:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 out_free_pfns:
 	kvfree(range->pfns);
 out_free_ranges:
diff --git drivers/gpu/drm/amd/amdkfd/kfd_events.c drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 908081c85de1..96e299d0f2a7 100644
--- drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -902,7 +902,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 
 	memset(&memory_exception_data, 0, sizeof(memory_exception_data));
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 
 	memory_exception_data.gpu_id = dev->id;
@@ -925,7 +925,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 			memory_exception_data.failure.NoExecute = 0;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 
 	pr_debug("notpresent %d, noexecute %d, readonly %d\n",
diff --git drivers/gpu/drm/i915/gem/i915_gem_mman.c drivers/gpu/drm/i915/gem/i915_gem_mman.c
index e3002849844b..f1e3ae6bf3c3 100644
--- drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -89,7 +89,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 		struct mm_struct *mm = current->mm;
 		struct vm_area_struct *vma;
 
-		if (down_write_killable(&mm->mmap_sem)) {
+		if (mm_write_lock_killable(mm)) {
 			addr = -EINTR;
 			goto err;
 		}
@@ -99,7 +99,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 				pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
 		else
 			addr = -ENOMEM;
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 		if (IS_ERR_VALUE(addr))
 			goto err;
 	}
diff --git drivers/gpu/drm/i915/gem/i915_gem_userptr.c drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 0dbb44d30885..7646d4e77c63 100644
--- drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -201,7 +201,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm)
 	if (IS_ERR(mn))
 		err = PTR_ERR(mn);
 
-	down_write(&mm->mm->mmap_sem);
+	mm_write_lock(mm->mm);
 	mutex_lock(&mm->i915->mm_lock);
 	if (mm->mn == NULL && !err) {
 		/* Protected by mmap_sem (write-lock) */
@@ -218,7 +218,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm)
 		err = 0;
 	}
 	mutex_unlock(&mm->i915->mm_lock);
-	up_write(&mm->mm->mmap_sem);
+	mm_write_unlock(mm->mm);
 
 	if (mn && !IS_ERR(mn))
 		kfree(mn);
@@ -466,7 +466,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 
 		ret = -EFAULT;
 		if (mmget_not_zero(mm)) {
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			while (pinned < npages) {
 				ret = get_user_pages_remote
 					(work->task, mm,
@@ -479,7 +479,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 
 				pinned += ret;
 			}
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			mmput(mm);
 		}
 	}
diff --git drivers/gpu/drm/nouveau/nouveau_svm.c drivers/gpu/drm/nouveau/nouveau_svm.c
index df9bf1fd1bc0..2a56b3623e81 100644
--- drivers/gpu/drm/nouveau/nouveau_svm.c
+++ drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -169,7 +169,7 @@ nouveau_svmm_bind(struct drm_device *dev, void *data,
 	 */
 
 	mm = get_task_mm(current);
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	for (addr = args->va_start, end = args->va_start + size; addr < end;) {
 		struct vm_area_struct *vma;
@@ -192,7 +192,7 @@ nouveau_svmm_bind(struct drm_device *dev, void *data,
 	 */
 	args->result = 0;
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 
 	return 0;
@@ -342,7 +342,7 @@ nouveau_svmm_init(struct drm_device *dev, void *data,
 	if (ret)
 		goto out_free;
 
-	down_write(&current->mm->mmap_sem);
+	mm_write_lock(current->mm);
 	svmm->notifier.ops = &nouveau_mn_ops;
 	ret = __mmu_notifier_register(&svmm->notifier, current->mm);
 	if (ret)
@@ -351,12 +351,12 @@ nouveau_svmm_init(struct drm_device *dev, void *data,
 
 	cli->svm.svmm = svmm;
 	cli->svm.cli = cli;
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	mutex_unlock(&cli->mutex);
 	return 0;
 
 out_mm_unlock:
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 out_free:
 	mutex_unlock(&cli->mutex);
 	kfree(svmm);
@@ -540,9 +540,9 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
 		range.notifier_seq = mmu_interval_read_begin(range.notifier);
 		range.default_flags = 0;
 		range.pfn_flags_mask = -1UL;
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		ret = hmm_range_fault(&range, 0);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		if (ret <= 0) {
 			if (ret == 0 || ret == -EBUSY)
 				continue;
@@ -671,18 +671,18 @@ nouveau_svm_fault(struct nvif_notify *notify)
 		/* Intersect fault window with the CPU VMA, cancelling
 		 * the fault if the address is invalid.
 		 */
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_vma_intersection(mm, start, limit);
 		if (!vma) {
 			SVMM_ERR(svmm, "wndw %016llx-%016llx", start, limit);
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			mmput(mm);
 			nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]);
 			continue;
 		}
 		start = max_t(u64, start, vma->vm_start);
 		limit = min_t(u64, limit, vma->vm_end);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		SVMM_DBG(svmm, "wndw %016llx-%016llx", start, limit);
 
 		if (buffer->fault[fi]->addr != start) {
diff --git drivers/gpu/drm/radeon/radeon_cs.c drivers/gpu/drm/radeon/radeon_cs.c
index 7b5460678382..2486adcf7d91 100644
--- drivers/gpu/drm/radeon/radeon_cs.c
+++ drivers/gpu/drm/radeon/radeon_cs.c
@@ -196,12 +196,12 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 		p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
 					      &p->validated);
 	if (need_mmap_lock)
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 
 	r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
 
 	if (need_mmap_lock)
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 	return r;
 }
diff --git drivers/gpu/drm/radeon/radeon_gem.c drivers/gpu/drm/radeon/radeon_gem.c
index 67298a0739cb..ffcc2dcb41b6 100644
--- drivers/gpu/drm/radeon/radeon_gem.c
+++ drivers/gpu/drm/radeon/radeon_gem.c
@@ -341,17 +341,17 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	}
 
 	if (args->flags & RADEON_GEM_USERPTR_VALIDATE) {
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		r = radeon_bo_reserve(bo, true);
 		if (r) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			goto release_object;
 		}
 
 		radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
 		r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 		radeon_bo_unreserve(bo);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		if (r)
 			goto release_object;
 	}
diff --git drivers/gpu/drm/ttm/ttm_bo_vm.c drivers/gpu/drm/ttm/ttm_bo_vm.c
index 11863fbdd5d6..652f125919d2 100644
--- drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -67,7 +67,7 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
 			goto out_unlock;
 
 		ttm_bo_get(bo);
-		up_read(&vmf->vma->vm_mm->mmap_sem);
+		mm_read_unlock(vmf->vma->vm_mm);
 		(void) dma_fence_wait(bo->moving, true);
 		dma_resv_unlock(bo->base.resv);
 		ttm_bo_put(bo);
@@ -138,7 +138,7 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 		if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
 			if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
 				ttm_bo_get(bo);
-				up_read(&vmf->vma->vm_mm->mmap_sem);
+				mm_read_unlock(vmf->vma->vm_mm);
 				(void) ttm_bo_wait_unreserved(bo);
 				ttm_bo_put(bo);
 			}
diff --git drivers/infiniband/core/umem.c drivers/infiniband/core/umem.c
index 7a3b99597ead..d16230ff81db 100644
--- drivers/infiniband/core/umem.c
+++ drivers/infiniband/core/umem.c
@@ -266,14 +266,14 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 	sg = umem->sg_head.sgl;
 
 	while (npages) {
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		ret = get_user_pages(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
 				     gup_flags | FOLL_LONGTERM,
 				     page_list, NULL);
 		if (ret < 0) {
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			goto umem_release;
 		}
 
@@ -284,7 +284,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 			dma_get_max_seg_size(context->device->dma_device),
 			&umem->sg_nents);
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	}
 
 	sg_mark_end(sg);
diff --git drivers/infiniband/core/umem_odp.c drivers/infiniband/core/umem_odp.c
index e42d44e501fd..d1118f56539f 100644
--- drivers/infiniband/core/umem_odp.c
+++ drivers/infiniband/core/umem_odp.c
@@ -246,16 +246,16 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_udata *udata, unsigned long addr,
 		struct vm_area_struct *vma;
 		struct hstate *h;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_vma(mm, ib_umem_start(umem_odp));
 		if (!vma || !is_vm_hugetlb_page(vma)) {
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			ret = -EINVAL;
 			goto err_free;
 		}
 		h = hstate_vma(vma);
 		umem_odp->page_shift = huge_page_shift(h);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	}
 
 	umem_odp->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
@@ -443,7 +443,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 				(bcnt + BIT(page_shift) - 1) >> page_shift,
 				PAGE_SIZE / sizeof(struct page *));
 
-		down_read(&owning_mm->mmap_sem);
+		mm_read_lock(owning_mm);
 		/*
 		 * Note: this might result in redundent page getting. We can
 		 * avoid this by checking dma_list to be 0 before calling
@@ -454,7 +454,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		npages = get_user_pages_remote(owning_process, owning_mm,
 				user_virt, gup_num_pages,
 				flags, local_page_list, NULL, NULL);
-		up_read(&owning_mm->mmap_sem);
+		mm_read_unlock(owning_mm);
 
 		if (npages < 0) {
 			if (npages != -EAGAIN)
diff --git drivers/infiniband/core/uverbs_main.c drivers/infiniband/core/uverbs_main.c
index 970d8e31dd65..cc5e25314930 100644
--- drivers/infiniband/core/uverbs_main.c
+++ drivers/infiniband/core/uverbs_main.c
@@ -939,7 +939,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 		 * at a time to get the lock ordering right. Typically there
 		 * will only be one mm, so no big deal.
 		 */
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		if (!mmget_still_valid(mm))
 			goto skip_mm;
 		mutex_lock(&ufile->umap_lock);
@@ -961,7 +961,7 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
 		}
 		mutex_unlock(&ufile->umap_lock);
 	skip_mm:
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		mmput(mm);
 	}
 }
diff --git drivers/infiniband/hw/mlx4/mr.c drivers/infiniband/hw/mlx4/mr.c
index dfa17bcdcdbc..3a0a872ab868 100644
--- drivers/infiniband/hw/mlx4/mr.c
+++ drivers/infiniband/hw/mlx4/mr.c
@@ -380,7 +380,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
 		unsigned long untagged_start = untagged_addr(start);
 		struct vm_area_struct *vma;
 
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		/*
 		 * FIXME: Ideally this would iterate over all the vmas that
 		 * cover the memory, but for now it requires a single vma to
@@ -395,7 +395,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start,
 			access_flags |= IB_ACCESS_LOCAL_WRITE;
 		}
 
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 
 	return ib_umem_get(udata, start, length, access_flags);
diff --git drivers/infiniband/hw/qib/qib_user_pages.c drivers/infiniband/hw/qib/qib_user_pages.c
index 6bf764e41891..754f4fa3fc7a 100644
--- drivers/infiniband/hw/qib/qib_user_pages.c
+++ drivers/infiniband/hw/qib/qib_user_pages.c
@@ -106,18 +106,18 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 		goto bail;
 	}
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages(start_page + got * PAGE_SIZE,
 				     num_pages - got,
 				     FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
 				     p + got, NULL);
 		if (ret < 0) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			goto bail_release;
 		}
 	}
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	return 0;
 bail_release:
diff --git drivers/infiniband/hw/usnic/usnic_uiom.c drivers/infiniband/hw/usnic/usnic_uiom.c
index 62e6ffa9ad78..a739e74efa73 100644
--- drivers/infiniband/hw/usnic/usnic_uiom.c
+++ drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -123,7 +123,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
 	npages = PAGE_ALIGN(size + (addr & ~PAGE_MASK)) >> PAGE_SHIFT;
 
 	uiomr->owning_mm = mm = current->mm;
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	locked = atomic64_add_return(npages, &current->mm->pinned_vm);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
@@ -187,7 +187,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
 	} else
 		mmgrab(uiomr->owning_mm);
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	free_page((unsigned long) page_list);
 	return ret;
 }
diff --git drivers/infiniband/sw/siw/siw_mem.c drivers/infiniband/sw/siw/siw_mem.c
index e99983f07663..3b4ddd6758c5 100644
--- drivers/infiniband/sw/siw/siw_mem.c
+++ drivers/infiniband/sw/siw/siw_mem.c
@@ -397,7 +397,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 	if (!writable)
 		foll_flags |= FOLL_FORCE;
 
-	down_read(&mm_s->mmap_sem);
+	mm_read_lock(mm_s);
 
 	mlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
@@ -441,7 +441,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 		num_pages -= got;
 	}
 out_sem_up:
-	up_read(&mm_s->mmap_sem);
+	mm_read_unlock(mm_s);
 
 	if (rv > 0)
 		return umem;
diff --git drivers/iommu/amd_iommu_v2.c drivers/iommu/amd_iommu_v2.c
index d6d85debd01b..3cd2e96c83a9 100644
--- drivers/iommu/amd_iommu_v2.c
+++ drivers/iommu/amd_iommu_v2.c
@@ -487,7 +487,7 @@ static void do_fault(struct work_struct *work)
 		flags |= FAULT_FLAG_WRITE;
 	flags |= FAULT_FLAG_REMOTE;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_extend_vma(mm, address);
 	if (!vma || address < vma->vm_start)
 		/* failed to get a vma in the right range */
@@ -499,7 +499,7 @@ static void do_fault(struct work_struct *work)
 
 	ret = handle_mm_fault(vma, address, flags);
 out:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (ret & VM_FAULT_ERROR)
 		/* failed to service fault */
diff --git drivers/iommu/intel-svm.c drivers/iommu/intel-svm.c
index dca88f9fdf29..ad57bb178fee 100644
--- drivers/iommu/intel-svm.c
+++ drivers/iommu/intel-svm.c
@@ -591,7 +591,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		if (!is_canonical_address(address))
 			goto bad_req;
 
-		down_read(&svm->mm->mmap_sem);
+		mm_read_lock(svm->mm);
 		vma = find_extend_vma(svm->mm, address);
 		if (!vma || address < vma->vm_start)
 			goto invalid;
@@ -606,7 +606,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 
 		result = QI_RESP_SUCCESS;
 	invalid:
-		up_read(&svm->mm->mmap_sem);
+		mm_read_unlock(svm->mm);
 		mmput(svm->mm);
 	bad_req:
 		/* Accounting for major/minor faults? */
diff --git drivers/media/v4l2-core/videobuf-core.c drivers/media/v4l2-core/videobuf-core.c
index 939fc11cf080..8e15ef51b1c3 100644
--- drivers/media/v4l2-core/videobuf-core.c
+++ drivers/media/v4l2-core/videobuf-core.c
@@ -534,7 +534,7 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b)
 	MAGIC_CHECK(q->int_ops->magic, MAGIC_QTYPE_OPS);
 
 	if (b->memory == V4L2_MEMORY_MMAP)
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 
 	videobuf_queue_lock(q);
 	retval = -EBUSY;
@@ -621,7 +621,7 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b)
 	videobuf_queue_unlock(q);
 
 	if (b->memory == V4L2_MEMORY_MMAP)
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 	return retval;
 }
diff --git drivers/media/v4l2-core/videobuf-dma-contig.c drivers/media/v4l2-core/videobuf-dma-contig.c
index aeb2f497c683..1421a0b4d909 100644
--- drivers/media/v4l2-core/videobuf-dma-contig.c
+++ drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -169,7 +169,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	mem->size = PAGE_ALIGN(vb->size + offset);
 	ret = -EINVAL;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma(mm, untagged_baddr);
 	if (!vma)
@@ -201,7 +201,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	}
 
 out_up:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	return ret;
 }
diff --git drivers/media/v4l2-core/videobuf-dma-sg.c drivers/media/v4l2-core/videobuf-dma-sg.c
index 66a6c6c236a7..57422766ba6f 100644
--- drivers/media/v4l2-core/videobuf-dma-sg.c
+++ drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -200,9 +200,9 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
 {
 	int ret;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	ret = videobuf_dma_init_user_locked(dma, direction, data, size);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	return ret;
 }
diff --git drivers/misc/cxl/cxllib.c drivers/misc/cxl/cxllib.c
index 258c43a95ac3..68764a8a4b89 100644
--- drivers/misc/cxl/cxllib.c
+++ drivers/misc/cxl/cxllib.c
@@ -207,7 +207,7 @@ static int get_vma_info(struct mm_struct *mm, u64 addr,
 	struct vm_area_struct *vma = NULL;
 	int rc = 0;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vma = find_vma(mm, addr);
 	if (!vma) {
@@ -218,7 +218,7 @@ static int get_vma_info(struct mm_struct *mm, u64 addr,
 	*vma_start = vma->vm_start;
 	*vma_end = vma->vm_end;
 out:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return rc;
 }
 
diff --git drivers/misc/cxl/fault.c drivers/misc/cxl/fault.c
index 2297e6fc1544..960fdf881478 100644
--- drivers/misc/cxl/fault.c
+++ drivers/misc/cxl/fault.c
@@ -321,7 +321,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
 		return;
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		for (ea = vma->vm_start; ea < vma->vm_end;
 				ea = next_segment(ea, slb.vsid)) {
@@ -336,7 +336,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
 			last_esid = slb.esid;
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	mmput(mm);
 }
diff --git drivers/misc/sgi-gru/grufault.c drivers/misc/sgi-gru/grufault.c
index 4b713a80b572..1f865d980680 100644
--- drivers/misc/sgi-gru/grufault.c
+++ drivers/misc/sgi-gru/grufault.c
@@ -69,14 +69,14 @@ static struct gru_thread_state *gru_find_lock_gts(unsigned long vaddr)
 	struct vm_area_struct *vma;
 	struct gru_thread_state *gts = NULL;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = gru_find_vma(vaddr);
 	if (vma)
 		gts = gru_find_thread_state(vma, TSID(vaddr, vma));
 	if (gts)
 		mutex_lock(&gts->ts_ctxlock);
 	else
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	return gts;
 }
 
@@ -86,7 +86,7 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr)
 	struct vm_area_struct *vma;
 	struct gru_thread_state *gts = ERR_PTR(-EINVAL);
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	vma = gru_find_vma(vaddr);
 	if (!vma)
 		goto err;
@@ -95,11 +95,11 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr)
 	if (IS_ERR(gts))
 		goto err;
 	mutex_lock(&gts->ts_ctxlock);
-	downgrade_write(&mm->mmap_sem);
+	mm_downgrade_write_lock(mm);
 	return gts;
 
 err:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return gts;
 }
 
@@ -109,7 +109,7 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr)
 static void gru_unlock_gts(struct gru_thread_state *gts)
 {
 	mutex_unlock(&gts->ts_ctxlock);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 }
 
 /*
@@ -575,9 +575,9 @@ static irqreturn_t gru_intr(int chiplet, int blade)
 		 */
 		gts->ustats.fmm_tlbmiss++;
 		if (!gts->ts_force_cch_reload &&
-					down_read_trylock(&gts->ts_mm->mmap_sem)) {
+					mm_read_trylock(gts->ts_mm)) {
 			gru_try_dropin(gru, gts, tfh, NULL);
-			up_read(&gts->ts_mm->mmap_sem);
+			mm_read_unlock(gts->ts_mm);
 		} else {
 			tfh_user_polling_mode(tfh);
 			STAT(intr_mm_lock_failed);
diff --git drivers/misc/sgi-gru/grufile.c drivers/misc/sgi-gru/grufile.c
index 9d042310214f..173a020d0acc 100644
--- drivers/misc/sgi-gru/grufile.c
+++ drivers/misc/sgi-gru/grufile.c
@@ -135,7 +135,7 @@ static int gru_create_new_context(unsigned long arg)
 	if (!(req.options & GRU_OPT_MISS_MASK))
 		req.options |= GRU_OPT_MISS_FMM_INTR;
 
-	down_write(&current->mm->mmap_sem);
+	mm_write_lock(current->mm);
 	vma = gru_find_vma(req.gseg);
 	if (vma) {
 		vdata = vma->vm_private_data;
@@ -146,7 +146,7 @@ static int gru_create_new_context(unsigned long arg)
 		vdata->vd_tlb_preload_count = req.tlb_preload_count;
 		ret = 0;
 	}
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 
 	return ret;
 }
diff --git drivers/oprofile/buffer_sync.c drivers/oprofile/buffer_sync.c
index ac27f3d3fbb4..7971649d2ea1 100644
--- drivers/oprofile/buffer_sync.c
+++ drivers/oprofile/buffer_sync.c
@@ -91,11 +91,11 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data)
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *mpnt;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	mpnt = find_vma(mm, addr);
 	if (mpnt && mpnt->vm_file && (mpnt->vm_flags & VM_EXEC)) {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		/* To avoid latency problems, we only process the current CPU,
 		 * hoping that most samples for the task are on this CPU
 		 */
@@ -103,7 +103,7 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data)
 		return 0;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return 0;
 }
 
@@ -256,7 +256,7 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)
 	unsigned long cookie = NO_COOKIE;
 	struct vm_area_struct *vma;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
 
 		if (addr < vma->vm_start || addr >= vma->vm_end)
@@ -276,7 +276,7 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)
 
 	if (!vma)
 		cookie = INVALID_COOKIE;
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return cookie;
 }
diff --git drivers/staging/kpc2000/kpc_dma/fileops.c drivers/staging/kpc2000/kpc_dma/fileops.c
index cb52bd9a6d2f..38b02b61fba7 100644
--- drivers/staging/kpc2000/kpc_dma/fileops.c
+++ drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -76,9 +76,9 @@ static int kpc_dma_transfer(struct dev_private_data *priv,
 	}
 
 	// Lock the user buffer pages in memory, and hold on to the page pointers (for the sglist)
-	down_read(&current->mm->mmap_sem);      /*  get memory map semaphore */
+	mm_read_lock(current->mm);      /*  get memory map semaphore */
 	rv = get_user_pages(iov_base, acd->page_count, FOLL_TOUCH | FOLL_WRITE | FOLL_GET, acd->user_pages, NULL);
-	up_read(&current->mm->mmap_sem);        /*  release the semaphore */
+	mm_read_unlock(current->mm);        /*  release the semaphore */
 	if (rv != acd->page_count) {
 		dev_err(&priv->ldev->pldev->dev, "Couldn't get_user_pages (%ld)\n", rv);
 		goto err_get_user_pages;
diff --git drivers/tee/optee/call.c drivers/tee/optee/call.c
index cf2367ba08d6..f97c1c392749 100644
--- drivers/tee/optee/call.c
+++ drivers/tee/optee/call.c
@@ -561,10 +561,10 @@ static int check_mem_type(unsigned long start, size_t num_pages)
 	if (virt_addr_valid(start))
 		return 0;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	rc = __check_mem_type(find_vma(mm, start),
 			      start + num_pages * PAGE_SIZE);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return rc;
 }
diff --git drivers/vfio/vfio_iommu_type1.c drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..255fb6268c24 100644
--- drivers/vfio/vfio_iommu_type1.c
+++ drivers/vfio/vfio_iommu_type1.c
@@ -277,11 +277,11 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!mm)
 		return -ESRCH; /* process exited */
 
-	ret = down_write_killable(&mm->mmap_sem);
+	ret = mm_write_lock_killable(mm);
 	if (!ret) {
 		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
 					  dma->lock_cap);
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 	}
 
 	if (async)
@@ -329,7 +329,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 	if (prot & IOMMU_WRITE)
 		flags |= FOLL_WRITE;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	if (mm == current->mm) {
 		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
 				     vmas);
@@ -348,14 +348,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 			put_page(page[0]);
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
 		return 0;
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	vaddr = untagged_addr(vaddr);
 
@@ -367,7 +367,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 			ret = 0;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return ret;
 }
 
diff --git drivers/xen/gntdev.c drivers/xen/gntdev.c
index 4fc83e3f5ad3..c698d7f5400a 100644
--- drivers/xen/gntdev.c
+++ drivers/xen/gntdev.c
@@ -625,7 +625,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
 		return -EFAULT;
 	pr_debug("priv %p, offset for vaddr %lx\n", priv, (unsigned long)op.vaddr);
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	vma = find_vma(current->mm, op.vaddr);
 	if (!vma || vma->vm_ops != &gntdev_vmops)
 		goto out_unlock;
@@ -639,7 +639,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
 	rv = 0;
 
  out_unlock:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	if (rv == 0 && copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
diff --git drivers/xen/privcmd.c drivers/xen/privcmd.c
index c6070e70dd73..56a6ba529407 100644
--- drivers/xen/privcmd.c
+++ drivers/xen/privcmd.c
@@ -278,7 +278,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
 	if (rc || list_empty(&pagelist))
 		goto out;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	{
 		struct page *page = list_first_entry(&pagelist,
@@ -303,7 +303,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
 
 
 out_up:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 out:
 	free_page_list(&pagelist);
@@ -499,7 +499,7 @@ static long privcmd_ioctl_mmap_batch(
 		}
 	}
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	vma = find_vma(mm, m.addr);
 	if (!vma ||
@@ -555,7 +555,7 @@ static long privcmd_ioctl_mmap_batch(
 	BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
 				    &pagelist, mmap_batch_fn, &state));
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	if (state.global_error) {
 		/* Write back errors in second pass. */
@@ -576,7 +576,7 @@ static long privcmd_ioctl_mmap_batch(
 	return ret;
 
 out_unlock:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	goto out;
 }
 
@@ -741,7 +741,7 @@ static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata)
 	if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
 		return -EPERM;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 
 	vma = find_vma(mm, kdata.addr);
 	if (!vma || vma->vm_ops != &privcmd_vm_ops) {
@@ -820,7 +820,7 @@ static long privcmd_ioctl_mmap_resource(struct file *file, void __user *udata)
 	}
 
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	kfree(pfns);
 
 	return rc;
diff --git fs/aio.c fs/aio.c
index a9fbad2ce5e6..704766588df4 100644
--- fs/aio.c
+++ fs/aio.c
@@ -519,7 +519,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
-	if (down_write_killable(&mm->mmap_sem)) {
+	if (mm_write_lock_killable(mm)) {
 		ctx->mmap_size = 0;
 		aio_free_ring(ctx);
 		return -EINTR;
@@ -528,7 +528,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 	ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
 				       PROT_READ | PROT_WRITE,
 				       MAP_SHARED, 0, &unused, NULL);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		ctx->mmap_size = 0;
 		aio_free_ring(ctx);
diff --git fs/coredump.c fs/coredump.c
index b1ea7dfbd149..c88c618da0d2 100644
--- fs/coredump.c
+++ fs/coredump.c
@@ -443,12 +443,12 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 	core_state->dumper.task = tsk;
 	core_state->dumper.next = NULL;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	if (!mm->core_state)
 		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
diff --git fs/exec.c fs/exec.c
index 74d88dab98dd..6d1e2687072d 100644
--- fs/exec.c
+++ fs/exec.c
@@ -250,7 +250,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 		return -ENOMEM;
 	vma_set_anonymous(vma);
 
-	if (down_write_killable(&mm->mmap_sem)) {
+	if (mm_write_lock_killable(mm)) {
 		err = -EINTR;
 		goto err_free;
 	}
@@ -273,11 +273,11 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 
 	mm->stack_vm = mm->total_vm = 1;
 	arch_bprm_mm_init(mm, vma);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	bprm->p = vma->vm_end - sizeof(void *);
 	return 0;
 err:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 err_free:
 	bprm->vma = NULL;
 	vm_area_free(vma);
@@ -738,7 +738,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	vm_flags = VM_STACK_FLAGS;
@@ -795,7 +795,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		ret = -EFAULT;
 
 out_unlock:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 EXPORT_SYMBOL(setup_arg_pages);
@@ -1024,9 +1024,9 @@ static int exec_mmap(struct mm_struct *mm)
 		 * through with the exec.  We must hold mmap_sem around
 		 * checking core_state and changing tsk->mm.
 		 */
-		down_read(&old_mm->mmap_sem);
+		mm_read_lock(old_mm);
 		if (unlikely(old_mm->core_state)) {
-			up_read(&old_mm->mmap_sem);
+			mm_read_unlock(old_mm);
 			return -EINTR;
 		}
 	}
@@ -1040,7 +1040,7 @@ static int exec_mmap(struct mm_struct *mm)
 	vmacache_flush(tsk);
 	task_unlock(tsk);
 	if (old_mm) {
-		up_read(&old_mm->mmap_sem);
+		mm_read_unlock(old_mm);
 		BUG_ON(active_mm != old_mm);
 		setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm);
 		mm_update_next_owner(old_mm);
diff --git fs/io_uring.c fs/io_uring.c
index e54556b0fcc6..8aa9d7263e83 100644
--- fs/io_uring.c
+++ fs/io_uring.c
@@ -4822,7 +4822,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		}
 
 		ret = 0;
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		pret = get_user_pages(ubuf, nr_pages,
 				      FOLL_WRITE | FOLL_LONGTERM,
 				      pages, vmas);
@@ -4840,7 +4840,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
 		} else {
 			ret = pret < 0 ? pret : -EFAULT;
 		}
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		if (ret) {
 			/*
 			 * if we did partial map, or found file backed vmas,
diff --git fs/proc/base.c fs/proc/base.c
index ebea9501afb8..31c56a08af0f 100644
--- fs/proc/base.c
+++ fs/proc/base.c
@@ -1979,11 +1979,11 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
 		goto out;
 
 	if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
-		status = down_read_killable(&mm->mmap_sem);
+		status = mm_read_lock_killable(mm);
 		if (!status) {
 			exact_vma_exists = !!find_exact_vma(mm, vm_start,
 							    vm_end);
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 		}
 	}
 
@@ -2030,7 +2030,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
 	if (rc)
 		goto out_mmput;
 
-	rc = down_read_killable(&mm->mmap_sem);
+	rc = mm_read_lock_killable(mm);
 	if (rc)
 		goto out_mmput;
 
@@ -2041,7 +2041,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
 		path_get(path);
 		rc = 0;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 out_mmput:
 	mmput(mm);
@@ -2131,7 +2131,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
 		goto out_put_task;
 
 	result = ERR_PTR(-EINTR);
-	if (down_read_killable(&mm->mmap_sem))
+	if (mm_read_lock_killable(mm))
 		goto out_put_mm;
 
 	result = ERR_PTR(-ENOENT);
@@ -2144,7 +2144,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
 				(void *)(unsigned long)vma->vm_file->f_mode);
 
 out_no_vma:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 out_put_mm:
 	mmput(mm);
 out_put_task:
diff --git fs/proc/task_mmu.c fs/proc/task_mmu.c
index 9442631fd4af..9de244eb6fbc 100644
--- fs/proc/task_mmu.c
+++ fs/proc/task_mmu.c
@@ -128,7 +128,7 @@ static void vma_stop(struct proc_maps_private *priv)
 	struct mm_struct *mm = priv->mm;
 
 	release_task_mempolicy(priv);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 }
 
@@ -166,7 +166,7 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 	if (!mm || !mmget_not_zero(mm))
 		return NULL;
 
-	if (down_read_killable(&mm->mmap_sem)) {
+	if (mm_read_lock_killable(mm)) {
 		mmput(mm);
 		return ERR_PTR(-EINTR);
 	}
@@ -873,7 +873,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 
 	memset(&mss, 0, sizeof(mss));
 
-	ret = down_read_killable(&mm->mmap_sem);
+	ret = mm_read_lock_killable(mm);
 	if (ret)
 		goto out_put_mm;
 
@@ -892,7 +892,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
 	__show_smap(m, &mss, true);
 
 	release_task_mempolicy(priv);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 out_put_mm:
 	mmput(mm);
@@ -1166,7 +1166,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		};
 
 		if (type == CLEAR_REFS_MM_HIWATER_RSS) {
-			if (down_write_killable(&mm->mmap_sem)) {
+			if (mm_write_lock_killable(mm)) {
 				count = -EINTR;
 				goto out_mm;
 			}
@@ -1176,11 +1176,11 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			 * resident set size to this mm's current rss value.
 			 */
 			reset_mm_hiwater_rss(mm);
-			up_write(&mm->mmap_sem);
+			mm_write_unlock(mm);
 			goto out_mm;
 		}
 
-		if (down_read_killable(&mm->mmap_sem)) {
+		if (mm_read_lock_killable(mm)) {
 			count = -EINTR;
 			goto out_mm;
 		}
@@ -1189,8 +1189,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			for (vma = mm->mmap; vma; vma = vma->vm_next) {
 				if (!(vma->vm_flags & VM_SOFTDIRTY))
 					continue;
-				up_read(&mm->mmap_sem);
-				if (down_write_killable(&mm->mmap_sem)) {
+				mm_read_unlock(mm);
+				if (mm_write_lock_killable(mm)) {
 					count = -EINTR;
 					goto out_mm;
 				}
@@ -1209,14 +1209,14 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 					 * failed like if
 					 * get_proc_task() fails?
 					 */
-					up_write(&mm->mmap_sem);
+					mm_write_unlock(mm);
 					goto out_mm;
 				}
 				for (vma = mm->mmap; vma; vma = vma->vm_next) {
 					vma->vm_flags &= ~VM_SOFTDIRTY;
 					vma_set_page_prot(vma);
 				}
-				downgrade_write(&mm->mmap_sem);
+				mm_downgrade_write_lock(mm);
 				break;
 			}
 
@@ -1229,7 +1229,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		if (type == CLEAR_REFS_SOFT_DIRTY)
 			mmu_notifier_invalidate_range_end(&range);
 		tlb_finish_mmu(&tlb, 0, -1);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 out_mm:
 		mmput(mm);
 	}
@@ -1590,11 +1590,11 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 		/* overflow ? */
 		if (end < start_vaddr || end > end_vaddr)
 			end = end_vaddr;
-		ret = down_read_killable(&mm->mmap_sem);
+		ret = mm_read_lock_killable(mm);
 		if (ret)
 			goto out_free;
 		ret = walk_page_range(mm, start_vaddr, end, &pagemap_ops, &pm);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		start_vaddr = end;
 
 		len = min(count, PM_ENTRY_BYTES * pm.pos);
diff --git fs/proc/task_nommu.c fs/proc/task_nommu.c
index 7907e6419e57..2a6efd126cf8 100644
--- fs/proc/task_nommu.c
+++ fs/proc/task_nommu.c
@@ -25,7 +25,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	struct rb_node *p;
 	unsigned long bytes = 0, sbytes = 0, slack = 0, size;
         
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
 		vma = rb_entry(p, struct vm_area_struct, vm_rb);
 
@@ -77,7 +77,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"Shared:\t%8lu bytes\n",
 		bytes, slack, sbytes);
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 }
 
 unsigned long task_vsize(struct mm_struct *mm)
@@ -86,12 +86,12 @@ unsigned long task_vsize(struct mm_struct *mm)
 	struct rb_node *p;
 	unsigned long vsize = 0;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
 		vma = rb_entry(p, struct vm_area_struct, vm_rb);
 		vsize += vma->vm_end - vma->vm_start;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return vsize;
 }
 
@@ -104,7 +104,7 @@ unsigned long task_statm(struct mm_struct *mm,
 	struct rb_node *p;
 	unsigned long size = kobjsize(mm);
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
 		vma = rb_entry(p, struct vm_area_struct, vm_rb);
 		size += kobjsize(vma);
@@ -119,7 +119,7 @@ unsigned long task_statm(struct mm_struct *mm,
 		>> PAGE_SHIFT;
 	*data = (PAGE_ALIGN(mm->start_stack) - (mm->start_data & PAGE_MASK))
 		>> PAGE_SHIFT;
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	size >>= PAGE_SHIFT;
 	size += *text + *data;
 	*resident = size;
@@ -211,7 +211,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (!mm || !mmget_not_zero(mm))
 		return NULL;
 
-	if (down_read_killable(&mm->mmap_sem)) {
+	if (mm_read_lock_killable(mm)) {
 		mmput(mm);
 		return ERR_PTR(-EINTR);
 	}
@@ -221,7 +221,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 		if (n-- == 0)
 			return p;
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 	return NULL;
 }
@@ -231,7 +231,7 @@ static void m_stop(struct seq_file *m, void *_vml)
 	struct proc_maps_private *priv = m->private;
 
 	if (!IS_ERR_OR_NULL(_vml)) {
-		up_read(&priv->mm->mmap_sem);
+		mm_read_unlock(priv->mm);
 		mmput(priv->mm);
 	}
 	if (priv->task) {
diff --git fs/userfaultfd.c fs/userfaultfd.c
index 37df7c9eedb1..f38095a7ebcd 100644
--- fs/userfaultfd.c
+++ fs/userfaultfd.c
@@ -234,7 +234,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 	pte_t *ptep, pte;
 	bool ret = true;
 
-	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
+	VM_BUG_ON(!mm_is_locked(mm));
 
 	ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
 
@@ -286,7 +286,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	pte_t *pte;
 	bool ret = true;
 
-	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
+	VM_BUG_ON(!mm_is_locked(mm));
 
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
@@ -376,7 +376,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	 * Coredumping runs without mmap_sem so we can only check that
 	 * the mmap_sem is held, if PF_DUMPCORE was not set.
 	 */
-	WARN_ON_ONCE(!rwsem_is_locked(&mm->mmap_sem));
+	WARN_ON_ONCE(!mm_is_locked(mm));
 
 	ctx = vmf->vma->vm_userfaultfd_ctx.ctx;
 	if (!ctx)
@@ -489,7 +489,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 		must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
 						       vmf->address,
 						       vmf->flags, reason);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	if (likely(must_wait && !READ_ONCE(ctx->released) &&
 		   (return_to_userland ? !signal_pending(current) :
@@ -543,7 +543,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 			 * and there's no need to retake the mmap_sem
 			 * in such case.
 			 */
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			ret = VM_FAULT_NOPAGE;
 		}
 	}
@@ -638,7 +638,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
 		struct mm_struct *mm = release_new_ctx->mm;
 
 		/* the various vma->vm_userfaultfd_ctx still points to it */
-		down_write(&mm->mmap_sem);
+		mm_write_lock(mm);
 		/* no task can run (and in turn coredump) yet */
 		VM_WARN_ON(!mmget_still_valid(mm));
 		for (vma = mm->mmap; vma; vma = vma->vm_next)
@@ -646,7 +646,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
 				vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
 				vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
 			}
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 
 		userfaultfd_ctx_put(release_new_ctx);
 	}
@@ -800,7 +800,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
 
 	userfaultfd_ctx_get(ctx);
 	WRITE_ONCE(ctx->mmap_changing, true);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	msg_init(&ewq.msg);
 
@@ -895,7 +895,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	 * it's critical that released is set to true (above), before
 	 * taking the mmap_sem for writing.
 	 */
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	still_valid = mmget_still_valid(mm);
 	prev = NULL;
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
@@ -921,7 +921,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 		vma->vm_flags = new_flags;
 		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
 	}
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	mmput(mm);
 wakeup:
 	/*
@@ -1350,7 +1350,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	if (!mmget_not_zero(mm))
 		goto out;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	if (!mmget_still_valid(mm))
 		goto out_unlock;
 	vma = find_vma_prev(mm, start, &prev);
@@ -1495,7 +1495,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vma = vma->vm_next;
 	} while (vma && vma->vm_start < end);
 out_unlock:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	mmput(mm);
 	if (!ret) {
 		/*
@@ -1540,7 +1540,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	if (!mmget_not_zero(mm))
 		goto out;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	if (!mmget_still_valid(mm))
 		goto out_unlock;
 	vma = find_vma_prev(mm, start, &prev);
@@ -1657,7 +1657,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 		vma = vma->vm_next;
 	} while (vma && vma->vm_start < end);
 out_unlock:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	mmput(mm);
 out:
 	return ret;
diff --git include/linux/mmu_notifier.h include/linux/mmu_notifier.h
index 9e6caa8ecd19..316927a91f88 100644
--- include/linux/mmu_notifier.h
+++ include/linux/mmu_notifier.h
@@ -5,6 +5,7 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/mm_types.h>
+#include <linux/mm_lock.h>
 #include <linux/srcu.h>
 #include <linux/interval_tree.h>
 
@@ -275,9 +276,9 @@ mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm)
 {
 	struct mmu_notifier *ret;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	ret = mmu_notifier_get_locked(ops, mm);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 void mmu_notifier_put(struct mmu_notifier *mn);
diff --git ipc/shm.c ipc/shm.c
index ce1ca9f7c6e9..c04fc21cbe46 100644
--- ipc/shm.c
+++ ipc/shm.c
@@ -1544,7 +1544,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
 	if (err)
 		goto out_fput;
 
-	if (down_write_killable(&current->mm->mmap_sem)) {
+	if (mm_write_lock_killable(current->mm)) {
 		err = -EINTR;
 		goto out_fput;
 	}
@@ -1564,7 +1564,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
 	if (IS_ERR_VALUE(addr))
 		err = (long)addr;
 invalid:
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	if (populate)
 		mm_populate(addr, populate);
 
@@ -1638,7 +1638,7 @@ long ksys_shmdt(char __user *shmaddr)
 	if (addr & ~PAGE_MASK)
 		return retval;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	/*
@@ -1726,7 +1726,7 @@ long ksys_shmdt(char __user *shmaddr)
 
 #endif
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return retval;
 }
 
diff --git kernel/acct.c kernel/acct.c
index 81f9831a7859..32257bd2b38f 100644
--- kernel/acct.c
+++ kernel/acct.c
@@ -539,13 +539,13 @@ void acct_collect(long exitcode, int group_dead)
 	if (group_dead && current->mm) {
 		struct vm_area_struct *vma;
 
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		vma = current->mm->mmap;
 		while (vma) {
 			vsize += vma->vm_end - vma->vm_start;
 			vma = vma->vm_next;
 		}
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 
 	spin_lock_irq(&current->sighand->siglock);
diff --git kernel/bpf/stackmap.c kernel/bpf/stackmap.c
index 3f958b90d914..8087d31b6471 100644
--- kernel/bpf/stackmap.c
+++ kernel/bpf/stackmap.c
@@ -305,7 +305,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	 * with build_id.
 	 */
 	if (!user || !current || !current->mm || irq_work_busy ||
-	    down_read_trylock(&current->mm->mmap_sem) == 0) {
+	    mm_read_trylock(current->mm) == 0) {
 		/* cannot access current->mm, fall back to ips */
 		for (i = 0; i < trace_nr; i++) {
 			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
@@ -330,7 +330,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	}
 
 	if (!work) {
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	} else {
 		work->sem = &current->mm->mmap_sem;
 		irq_work_queue(&work->irq_work);
diff --git kernel/events/core.c kernel/events/core.c
index 2173c23c25b4..4921a5e48931 100644
--- kernel/events/core.c
+++ kernel/events/core.c
@@ -9467,7 +9467,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
 		if (!mm)
 			goto restart;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	}
 
 	raw_spin_lock_irqsave(&ifh->lock, flags);
@@ -9493,7 +9493,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
 	raw_spin_unlock_irqrestore(&ifh->lock, flags);
 
 	if (ifh->nr_file_filters) {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 
 		mmput(mm);
 	}
diff --git kernel/events/uprobes.c kernel/events/uprobes.c
index ece7e13f6e4a..8c1417f4ee48 100644
--- kernel/events/uprobes.c
+++ kernel/events/uprobes.c
@@ -1064,7 +1064,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 		if (err && is_register)
 			goto free;
 
-		down_write(&mm->mmap_sem);
+		mm_write_lock(mm);
 		vma = find_vma(mm, info->vaddr);
 		if (!vma || !valid_vma(vma, is_register) ||
 		    file_inode(vma->vm_file) != uprobe->inode)
@@ -1086,7 +1086,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 		}
 
  unlock:
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
  free:
 		mmput(mm);
 		info = free_map_info(info);
@@ -1241,7 +1241,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	int err = 0;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		unsigned long vaddr;
 		loff_t offset;
@@ -1258,7 +1258,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		vaddr = offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, mm, vaddr);
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return err;
 }
@@ -1445,7 +1445,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	struct vm_area_struct *vma;
 	int ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	if (mm->uprobes_state.xol_area) {
@@ -1475,7 +1475,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	/* pairs with get_xol_area() */
 	smp_store_release(&mm->uprobes_state.xol_area, area); /* ^^^ */
  fail:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return ret;
 }
@@ -2045,7 +2045,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	struct uprobe *uprobe = NULL;
 	struct vm_area_struct *vma;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, bp_vaddr);
 	if (vma && vma->vm_start <= bp_vaddr) {
 		if (valid_vma(vma, false)) {
@@ -2063,7 +2063,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 
 	if (!uprobe && test_and_clear_bit(MMF_RECALC_UPROBES, &mm->flags))
 		mmf_recalc_uprobes(mm);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return uprobe;
 }
diff --git kernel/exit.c kernel/exit.c
index 2833ffb0c211..9a0b72562adb 100644
--- kernel/exit.c
+++ kernel/exit.c
@@ -448,12 +448,12 @@ static void exit_mm(void)
 	 * will increment ->nr_threads for each thread in the
 	 * group with ->mm != NULL.
 	 */
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	core_state = mm->core_state;
 	if (core_state) {
 		struct core_thread self;
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 
 		self.task = current;
 		self.next = xchg(&core_state->dumper.next, &self);
@@ -471,14 +471,14 @@ static void exit_mm(void)
 			freezable_schedule();
 		}
 		__set_current_state(TASK_RUNNING);
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 	}
 	mmgrab(mm);
 	BUG_ON(mm != current->active_mm);
 	/* more a memory barrier than a real lock */
 	task_lock(current);
 	current->mm = NULL;
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	enter_lazy_tlb(mm, current);
 	task_unlock(current);
 	mm_update_next_owner(mm);
diff --git kernel/fork.c kernel/fork.c
index 080809560072..d598f56e4b1e 100644
--- kernel/fork.c
+++ kernel/fork.c
@@ -488,7 +488,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	LIST_HEAD(uf);
 
 	uprobe_start_dup_mmap();
-	if (down_write_killable(&oldmm->mmap_sem)) {
+	if (mm_write_lock_killable(oldmm)) {
 		retval = -EINTR;
 		goto fail_uprobe_end;
 	}
@@ -612,9 +612,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	/* a new mm has just been created */
 	retval = arch_dup_mmap(oldmm, mm);
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	flush_tlb_mm(oldmm);
-	up_write(&oldmm->mmap_sem);
+	mm_write_unlock(oldmm);
 	dup_userfaultfd_complete(&uf);
 fail_uprobe_end:
 	uprobe_end_dup_mmap();
@@ -644,9 +644,9 @@ static inline void mm_free_pgd(struct mm_struct *mm)
 #else
 static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 {
-	down_write(&oldmm->mmap_sem);
+	mm_write_lock(oldmm);
 	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
-	up_write(&oldmm->mmap_sem);
+	mm_write_unlock(oldmm);
 	return 0;
 }
 #define mm_alloc_pgd(mm)	(0)
@@ -1011,7 +1011,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm->vmacache_seqnum = 0;
 	atomic_set(&mm->mm_users, 1);
 	atomic_set(&mm->mm_count, 1);
-	init_rwsem(&mm->mmap_sem);
+	mm_init_lock(mm);
 	INIT_LIST_HEAD(&mm->mmlist);
 	mm->core_state = NULL;
 	mm_pgtables_bytes_init(mm);
diff --git kernel/futex.c kernel/futex.c
index 0cf84c8664f2..0081f1b8530f 100644
--- kernel/futex.c
+++ kernel/futex.c
@@ -753,10 +753,10 @@ static int fault_in_user_writeable(u32 __user *uaddr)
 	struct mm_struct *mm = current->mm;
 	int ret;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
 			       FAULT_FLAG_WRITE, NULL);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return ret < 0 ? ret : 0;
 }
diff --git kernel/sched/fair.c kernel/sched/fair.c
index ba749f579714..ed309e43c39d 100644
--- kernel/sched/fair.c
+++ kernel/sched/fair.c
@@ -2545,7 +2545,7 @@ static void task_numa_work(struct callback_head *work)
 		return;
 
 
-	if (!down_read_trylock(&mm->mmap_sem))
+	if (!mm_read_trylock(mm))
 		return;
 	vma = find_vma(mm, start);
 	if (!vma) {
@@ -2613,7 +2613,7 @@ static void task_numa_work(struct callback_head *work)
 		mm->numa_scan_offset = start;
 	else
 		reset_ptenuma_scan(p);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/*
 	 * Make sure tasks use at least 32x as much time to run other code
diff --git kernel/sys.c kernel/sys.c
index a9331f101883..55413c799735 100644
--- kernel/sys.c
+++ kernel/sys.c
@@ -1845,7 +1845,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (exe_file) {
 		struct vm_area_struct *vma;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			if (!vma->vm_file)
 				continue;
@@ -1854,7 +1854,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 				goto exit_err;
 		}
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		fput(exe_file);
 	}
 
@@ -1868,7 +1868,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	fdput(exe);
 	return err;
 exit_err:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	fput(exe_file);
 	goto exit;
 }
@@ -2009,7 +2009,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	 * arg_lock protects concurent updates but we still need mmap_sem for
 	 * read to exclude races with sys_brk.
 	 */
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	/*
 	 * We don't validate if these members are pointing to
@@ -2048,7 +2048,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	if (prctl_map.auxv_size)
 		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return 0;
 }
 #endif /* CONFIG_CHECKPOINT_RESTORE */
@@ -2124,7 +2124,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
 	 * mmap_sem for a) concurrent sys_brk, b) finding VMA for addr
 	 * validation.
 	 */
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, addr);
 
 	spin_lock(&mm->arg_lock);
@@ -2216,7 +2216,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
 	error = 0;
 out:
 	spin_unlock(&mm->arg_lock);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return error;
 }
 
@@ -2439,13 +2439,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SET_THP_DISABLE:
 		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		if (down_write_killable(&me->mm->mmap_sem))
+		if (mm_write_lock_killable(me->mm))
 			return -EINTR;
 		if (arg2)
 			set_bit(MMF_DISABLE_THP, &me->mm->flags);
 		else
 			clear_bit(MMF_DISABLE_THP, &me->mm->flags);
-		up_write(&me->mm->mmap_sem);
+		mm_write_unlock(me->mm);
 		break;
 	case PR_MPX_ENABLE_MANAGEMENT:
 	case PR_MPX_DISABLE_MANAGEMENT:
diff --git kernel/trace/trace_output.c kernel/trace/trace_output.c
index d9b4b7c22db4..c715cd737476 100644
--- kernel/trace/trace_output.c
+++ kernel/trace/trace_output.c
@@ -393,7 +393,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
 	if (mm) {
 		const struct vm_area_struct *vma;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_vma(mm, ip);
 		if (vma) {
 			file = vma->vm_file;
@@ -405,7 +405,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
 				trace_seq_printf(s, "[+0x%lx]",
 						 ip - vmstart);
 		}
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	}
 	if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
 		trace_seq_printf(s, " <" IP_FMT ">", ip);
diff --git mm/filemap.c mm/filemap.c
index bf6aa30be58d..eb6487065ca0 100644
--- mm/filemap.c
+++ mm/filemap.c
@@ -1416,7 +1416,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 		if (flags & FAULT_FLAG_RETRY_NOWAIT)
 			return 0;
 
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		if (flags & FAULT_FLAG_KILLABLE)
 			wait_on_page_locked_killable(page);
 		else
@@ -1428,7 +1428,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 
 			ret = __lock_page_killable(page);
 			if (ret) {
-				up_read(&mm->mmap_sem);
+				mm_read_unlock(mm);
 				return 0;
 			}
 		} else
@@ -2364,7 +2364,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 			 * mmap_sem here and return 0 if we don't have a fpin.
 			 */
 			if (*fpin == NULL)
-				up_read(&vmf->vma->vm_mm->mmap_sem);
+				mm_read_unlock(vmf->vma->vm_mm);
 			return 0;
 		}
 	} else
diff --git mm/frame_vector.c mm/frame_vector.c
index c431ca81dad5..d0a0355e3456 100644
--- mm/frame_vector.c
+++ mm/frame_vector.c
@@ -48,7 +48,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	locked = 1;
 	vma = find_vma_intersection(mm, start, start + 1);
 	if (!vma) {
@@ -102,7 +102,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
 out:
 	if (locked)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
diff --git mm/gup.c mm/gup.c
index 7646bf993b25..48b5ec82b1c2 100644
--- mm/gup.c
+++ mm/gup.c
@@ -982,7 +982,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	}
 
 	if (ret & VM_FAULT_RETRY) {
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		if (!(fault_flags & FAULT_FLAG_TRIED)) {
 			*unlocked = true;
 			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
@@ -1068,7 +1068,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 		 */
 		*locked = 1;
 		lock_dropped = true;
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
 				       pages, NULL, NULL);
 		if (ret != 1) {
@@ -1090,7 +1090,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
 		 * We must let the caller know we temporarily dropped the lock
 		 * and so the critical section protected by it was lost.
 		 */
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		*locked = 0;
 	}
 	return pages_done;
@@ -1208,7 +1208,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
 	VM_BUG_ON(end   & ~PAGE_MASK);
 	VM_BUG_ON_VMA(start < vma->vm_start, vma);
 	VM_BUG_ON_VMA(end   > vma->vm_end, vma);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
+	VM_BUG_ON_MM(!mm_is_locked(mm), mm);
 
 	gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK;
 	if (vma->vm_flags & VM_LOCKONFAULT)
@@ -1260,7 +1260,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 		 */
 		if (!locked) {
 			locked = 1;
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			vma = find_vma(mm, nstart);
 		} else if (nstart >= vma->vm_end)
 			vma = vma->vm_next;
@@ -1292,7 +1292,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
 		ret = 0;
 	}
 	if (locked)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	return ret;	/* 0 or negative error code */
 }
 
@@ -1698,11 +1698,11 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
 		return -EINVAL;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
 				      &locked, gup_flags | FOLL_TOUCH);
 	if (locked)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	return ret;
 }
 EXPORT_SYMBOL(get_user_pages_unlocked);
@@ -2380,11 +2380,11 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
 	 * get_user_pages_unlocked() (see comments in that function)
 	 */
 	if (gup_flags & FOLL_LONGTERM) {
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		ret = __gup_longterm_locked(current, current->mm,
 					    start, nr_pages,
 					    pages, NULL, gup_flags);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	} else {
 		ret = get_user_pages_unlocked(start, nr_pages,
 					      pages, gup_flags);
diff --git mm/internal.h mm/internal.h
index 3cf20ab3ca01..22f361a1e284 100644
--- mm/internal.h
+++ mm/internal.h
@@ -382,7 +382,7 @@ static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
 	if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
 	    FAULT_FLAG_ALLOW_RETRY) {
 		fpin = get_file(vmf->vma->vm_file);
-		up_read(&vmf->vma->vm_mm->mmap_sem);
+		mm_read_unlock(vmf->vma->vm_mm);
 	}
 	return fpin;
 }
diff --git mm/khugepaged.c mm/khugepaged.c
index b679908743cb..7ee8ae64824b 100644
--- mm/khugepaged.c
+++ mm/khugepaged.c
@@ -508,8 +508,8 @@ void __khugepaged_exit(struct mm_struct *mm)
 		 * khugepaged has finished working on the pagetables
 		 * under the mmap_sem.
 		 */
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
+		mm_write_lock(mm);
+		mm_write_unlock(mm);
 	}
 }
 
@@ -918,7 +918,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 
 		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
 		if (ret & VM_FAULT_RETRY) {
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			if (hugepage_vma_revalidate(mm, address, &vmf.vma)) {
 				/* vma is no longer available, don't continue to swapin */
 				trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
@@ -970,7 +970,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * sync compaction, and we do not need to hold the mmap_sem during
 	 * that. We will recheck the vma after taking it again in write mode.
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	new_page = khugepaged_alloc_page(hpage, gfp, node);
 	if (!new_page) {
 		result = SCAN_ALLOC_HUGE_PAGE_FAIL;
@@ -982,11 +982,11 @@ static void collapse_huge_page(struct mm_struct *mm,
 		goto out_nolock;
 	}
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result) {
 		mem_cgroup_cancel_charge(new_page, memcg, true);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		goto out_nolock;
 	}
 
@@ -994,7 +994,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	if (!pmd) {
 		result = SCAN_PMD_NULL;
 		mem_cgroup_cancel_charge(new_page, memcg, true);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		goto out_nolock;
 	}
 
@@ -1005,17 +1005,17 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 */
 	if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
 		mem_cgroup_cancel_charge(new_page, memcg, true);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		goto out_nolock;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	/*
 	 * Prevent all access to pagetables with the exception of
 	 * gup_fast later handled by the ptep_clear_flush and the VM
 	 * handled by the anon_vma lock + PG_lock.
 	 */
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	result = SCAN_ANY_PROCESS;
 	if (!mmget_still_valid(mm))
 		goto out;
@@ -1103,7 +1103,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	khugepaged_pages_collapsed++;
 	result = SCAN_SUCCEED;
 out_up_write:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 out_nolock:
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
@@ -1399,7 +1399,7 @@ static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
 	if (likely(mm_slot->nr_pte_mapped_thp == 0))
 		return 0;
 
-	if (!down_write_trylock(&mm->mmap_sem))
+	if (!mm_write_trylock(mm))
 		return -EBUSY;
 
 	if (unlikely(khugepaged_test_exit(mm)))
@@ -1410,7 +1410,7 @@ static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
 
 out:
 	mm_slot->nr_pte_mapped_thp = 0;
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return 0;
 }
 
@@ -1455,12 +1455,12 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 		 * mmap_sem while holding page lock. Fault path does it in
 		 * reverse order. Trylock is a way to avoid deadlock.
 		 */
-		if (down_write_trylock(&vma->vm_mm->mmap_sem)) {
+		if (mm_write_trylock(vma->vm_mm)) {
 			spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
 			/* assume page table is clear */
 			_pmd = pmdp_collapse_flush(vma, addr, pmd);
 			spin_unlock(ptl);
-			up_write(&vma->vm_mm->mmap_sem);
+			mm_write_unlock(vma->vm_mm);
 			mm_dec_nr_ptes(vma->vm_mm);
 			pte_free(vma->vm_mm, pmd_pgtable(_pmd));
 		} else {
@@ -1947,7 +1947,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 	 * the next mm on the list.
 	 */
 	vma = NULL;
-	if (unlikely(!down_read_trylock(&mm->mmap_sem)))
+	if (unlikely(!mm_read_trylock(mm)))
 		goto breakouterloop_mmap_sem;
 	if (likely(!khugepaged_test_exit(mm)))
 		vma = find_vma(mm, khugepaged_scan.address);
@@ -1994,7 +1994,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 				    && !shmem_huge_enabled(vma))
 					goto skip;
 				file = get_file(vma->vm_file);
-				up_read(&mm->mmap_sem);
+				mm_read_unlock(mm);
 				ret = 1;
 				khugepaged_scan_file(mm, file, pgoff, hpage);
 				fput(file);
@@ -2014,7 +2014,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 		}
 	}
 breakouterloop:
-	up_read(&mm->mmap_sem); /* exit_mmap will destroy ptes after this */
+	mm_read_unlock(mm); /* exit_mmap will destroy ptes after this */
 breakouterloop_mmap_sem:
 
 	spin_lock(&khugepaged_mm_lock);
diff --git mm/ksm.c mm/ksm.c
index d17c7d57d0d8..65592828f40f 100644
--- mm/ksm.c
+++ mm/ksm.c
@@ -542,11 +542,11 @@ static void break_cow(struct rmap_item *rmap_item)
 	 */
 	put_anon_vma(rmap_item->anon_vma);
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_mergeable_vma(mm, addr);
 	if (vma)
 		break_ksm(vma, addr);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 }
 
 static struct page *get_mergeable_page(struct rmap_item *rmap_item)
@@ -556,7 +556,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
 	struct vm_area_struct *vma;
 	struct page *page;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_mergeable_vma(mm, addr);
 	if (!vma)
 		goto out;
@@ -572,7 +572,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
 out:
 		page = NULL;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return page;
 }
 
@@ -976,7 +976,7 @@ static int unmerge_and_remove_all_rmap_items(void)
 	for (mm_slot = ksm_scan.mm_slot;
 			mm_slot != &ksm_mm_head; mm_slot = ksm_scan.mm_slot) {
 		mm = mm_slot->mm;
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			if (ksm_test_exit(mm))
 				break;
@@ -989,7 +989,7 @@ static int unmerge_and_remove_all_rmap_items(void)
 		}
 
 		remove_trailing_rmap_items(mm_slot, &mm_slot->rmap_list);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 
 		spin_lock(&ksm_mmlist_lock);
 		ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
@@ -1012,7 +1012,7 @@ static int unmerge_and_remove_all_rmap_items(void)
 	return 0;
 
 error:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	spin_lock(&ksm_mmlist_lock);
 	ksm_scan.mm_slot = &ksm_mm_head;
 	spin_unlock(&ksm_mmlist_lock);
@@ -1280,7 +1280,7 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item,
 	struct vm_area_struct *vma;
 	int err = -EFAULT;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_mergeable_vma(mm, rmap_item->address);
 	if (!vma)
 		goto out;
@@ -1296,7 +1296,7 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item,
 	rmap_item->anon_vma = vma->anon_vma;
 	get_anon_vma(vma->anon_vma);
 out:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return err;
 }
 
@@ -2110,11 +2110,11 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 	if (ksm_use_zero_pages && (checksum == zero_checksum)) {
 		struct vm_area_struct *vma;
 
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_mergeable_vma(mm, rmap_item->address);
 		err = try_to_merge_one_page(vma, page,
 					    ZERO_PAGE(rmap_item->address));
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		/*
 		 * In case of failure, the page was not really empty, so we
 		 * need to continue. Otherwise we're done.
@@ -2277,7 +2277,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 	}
 
 	mm = slot->mm;
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	if (ksm_test_exit(mm))
 		vma = NULL;
 	else
@@ -2311,7 +2311,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 					ksm_scan.address += PAGE_SIZE;
 				} else
 					put_page(*page);
-				up_read(&mm->mmap_sem);
+				mm_read_unlock(mm);
 				return rmap_item;
 			}
 			put_page(*page);
@@ -2349,10 +2349,10 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 
 		free_mm_slot(slot);
 		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		mmdrop(mm);
 	} else {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		/*
 		 * up_read(&mm->mmap_sem) first because after
 		 * spin_unlock(&ksm_mmlist_lock) run, the "mm" may
@@ -2552,8 +2552,8 @@ void __ksm_exit(struct mm_struct *mm)
 		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		mmdrop(mm);
 	} else if (mm_slot) {
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
+		mm_write_lock(mm);
+		mm_write_unlock(mm);
 	}
 }
 
diff --git mm/madvise.c mm/madvise.c
index bcdb6a042787..2c48ea26eb8a 100644
--- mm/madvise.c
+++ mm/madvise.c
@@ -288,12 +288,12 @@ static long madvise_willneed(struct vm_area_struct *vma,
 	 */
 	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 	get_file(file);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	offset = (loff_t)(start - vma->vm_start)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 	vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
 	fput(file);
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	return 0;
 }
 
@@ -763,7 +763,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 	if (!userfaultfd_remove(vma, start, end)) {
 		*prev = NULL; /* mmap_sem has been dropped, prev is stale */
 
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		vma = find_vma(current->mm, start);
 		if (!vma)
 			return -ENOMEM;
@@ -845,13 +845,13 @@ static long madvise_remove(struct vm_area_struct *vma,
 	get_file(f);
 	if (userfaultfd_remove(vma, start, end)) {
 		/* mmap_sem was not released by userfaultfd_remove() */
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 	error = vfs_fallocate(f,
 				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
 				offset, end - start);
 	fput(f);
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	return error;
 }
 
@@ -1082,10 +1082,10 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 
 	write = madvise_need_mmap_write(behavior);
 	if (write) {
-		if (down_write_killable(&current->mm->mmap_sem))
+		if (mm_write_lock_killable(current->mm))
 			return -EINTR;
 	} else {
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 	}
 
 	/*
@@ -1135,9 +1135,9 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 out:
 	blk_finish_plug(&plug);
 	if (write)
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 	else
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 	return error;
 }
diff --git mm/memcontrol.c mm/memcontrol.c
index 6c83cf4ed970..98ededf5c764 100644
--- mm/memcontrol.c
+++ mm/memcontrol.c
@@ -5533,9 +5533,9 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
 {
 	unsigned long precharge;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	walk_page_range(mm, 0, mm->highest_vm_end, &precharge_walk_ops, NULL);
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	precharge = mc.precharge;
 	mc.precharge = 0;
@@ -5818,7 +5818,7 @@ static void mem_cgroup_move_charge(void)
 	atomic_inc(&mc.from->moving_account);
 	synchronize_rcu();
 retry:
-	if (unlikely(!down_read_trylock(&mc.mm->mmap_sem))) {
+	if (unlikely(!mm_read_trylock(mc.mm))) {
 		/*
 		 * Someone who are holding the mmap_sem might be waiting in
 		 * waitq. So we cancel all extra charges, wake up all waiters,
@@ -5837,7 +5837,7 @@ static void mem_cgroup_move_charge(void)
 	walk_page_range(mc.mm, 0, mc.mm->highest_vm_end, &charge_walk_ops,
 			NULL);
 
-	up_read(&mc.mm->mmap_sem);
+	mm_read_unlock(mc.mm);
 	atomic_dec(&mc.from->moving_account);
 }
 
diff --git mm/memory.c mm/memory.c
index 45442d9a4f52..45b42fa02a2e 100644
--- mm/memory.c
+++ mm/memory.c
@@ -1202,7 +1202,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
 		next = pud_addr_end(addr, end);
 		if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
 			if (next - addr != HPAGE_PUD_SIZE) {
-				VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma);
+				VM_BUG_ON_VMA(!mm_is_locked(tlb->mm), vma);
 				split_huge_pud(vma, pud, addr);
 			} else if (zap_huge_pud(tlb, vma, pud, addr))
 				goto next;
@@ -1507,7 +1507,7 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
 	if (!page_count(page))
 		return -EINVAL;
 	if (!(vma->vm_flags & VM_MIXEDMAP)) {
-		BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
+		BUG_ON(mm_read_trylock(vma->vm_mm));
 		BUG_ON(vma->vm_flags & VM_PFNMAP);
 		vma->vm_flags |= VM_MIXEDMAP;
 	}
@@ -4447,7 +4447,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 	void *old_buf = buf;
 	int write = gup_flags & FOLL_WRITE;
 
-	if (down_read_killable(&mm->mmap_sem))
+	if (mm_read_lock_killable(mm))
 		return 0;
 
 	/* ignore errors, just check how much was successfully transferred */
@@ -4498,7 +4498,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 		buf += bytes;
 		addr += bytes;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return buf - old_buf;
 }
@@ -4555,7 +4555,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
 	/*
 	 * we might be running from an atomic context so we cannot sleep
 	 */
-	if (!down_read_trylock(&mm->mmap_sem))
+	if (!mm_read_trylock(mm))
 		return;
 
 	vma = find_vma(mm, ip);
@@ -4574,7 +4574,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
 			free_page((unsigned long)buf);
 		}
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 }
 
 #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
diff --git mm/mempolicy.c mm/mempolicy.c
index b2920ae87a61..d64031767f8a 100644
--- mm/mempolicy.c
+++ mm/mempolicy.c
@@ -379,10 +379,10 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
 {
 	struct vm_area_struct *vma;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next)
 		mpol_rebind_policy(vma->vm_policy, new);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 }
 
 static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
@@ -878,7 +878,7 @@ static int lookup_node(struct mm_struct *mm, unsigned long addr)
 		put_page(p);
 	}
 	if (locked)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	return err;
 }
 
@@ -911,10 +911,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
 		 * vma/shared policy at addr is NULL.  We
 		 * want to return MPOL_DEFAULT in this case.
 		 */
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		vma = find_vma_intersection(mm, addr, addr+1);
 		if (!vma) {
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			return -EFAULT;
 		}
 		if (vma->vm_ops && vma->vm_ops->get_policy)
@@ -973,7 +973,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
  out:
 	mpol_cond_put(pol);
 	if (vma)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	if (pol_refcount)
 		mpol_put(pol_refcount);
 	return err;
@@ -1082,7 +1082,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 	if (err)
 		return err;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	/*
 	 * Find a 'source' bit set in 'tmp' whose corresponding 'dest'
@@ -1163,7 +1163,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		if (err < 0)
 			break;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (err < 0)
 		return err;
 	return busy;
@@ -1286,12 +1286,12 @@ static long do_mbind(unsigned long start, unsigned long len,
 	{
 		NODEMASK_SCRATCH(scratch);
 		if (scratch) {
-			down_write(&mm->mmap_sem);
+			mm_write_lock(mm);
 			task_lock(current);
 			err = mpol_set_nodemask(new, nmask, scratch);
 			task_unlock(current);
 			if (err)
-				up_write(&mm->mmap_sem);
+				mm_write_unlock(mm);
 		} else
 			err = -ENOMEM;
 		NODEMASK_SCRATCH_FREE(scratch);
@@ -1328,7 +1328,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 			putback_movable_pages(&pagelist);
 	}
 
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 mpol_out:
 	mpol_put(new);
 	return err;
diff --git mm/migrate.c mm/migrate.c
index 86873b6f38a7..3f3f22e9551e 100644
--- mm/migrate.c
+++ mm/migrate.c
@@ -1526,7 +1526,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 	unsigned int follflags;
 	int err;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	err = -EFAULT;
 	vma = find_vma(mm, addr);
 	if (!vma || addr < vma->vm_start || !vma_migratable(vma))
@@ -1579,7 +1579,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 	 */
 	put_page(page);
 out:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return err;
 }
 
@@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 {
 	unsigned long i;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 
 	for (i = 0; i < nr_pages; i++) {
 		unsigned long addr = (unsigned long)(*pages);
@@ -1717,7 +1717,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
 		status++;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 }
 
 /*
diff --git mm/mincore.c mm/mincore.c
index 49b6fa2f6aa1..2bf0b3a0fff9 100644
--- mm/mincore.c
+++ mm/mincore.c
@@ -283,9 +283,9 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
 		 * Do at most PAGE_SIZE entries per iteration, due to
 		 * the temporary buffer size.
 		 */
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		retval = do_mincore(start, min(pages, PAGE_SIZE), tmp);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 
 		if (retval <= 0)
 			break;
diff --git mm/mlock.c mm/mlock.c
index a72c1eeded77..2a435eb98a58 100644
--- mm/mlock.c
+++ mm/mlock.c
@@ -686,7 +686,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
 	lock_limit >>= PAGE_SHIFT;
 	locked = len >> PAGE_SHIFT;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 
 	locked += current->mm->locked_vm;
@@ -705,7 +705,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
 	if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
 		error = apply_vma_lock_flags(start, len, flags);
 
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	if (error)
 		return error;
 
@@ -742,10 +742,10 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 	len = PAGE_ALIGN(len + (offset_in_page(start)));
 	start &= PAGE_MASK;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 	ret = apply_vma_lock_flags(start, len, 0);
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 
 	return ret;
 }
@@ -811,14 +811,14 @@ SYSCALL_DEFINE1(mlockall, int, flags)
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 
 	ret = -ENOMEM;
 	if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
 	    capable(CAP_IPC_LOCK))
 		ret = apply_mlockall_flags(flags);
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	if (!ret && (flags & MCL_CURRENT))
 		mm_populate(0, TASK_SIZE);
 
@@ -829,10 +829,10 @@ SYSCALL_DEFINE0(munlockall)
 {
 	int ret;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 	ret = apply_mlockall_flags(0);
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	return ret;
 }
 
diff --git mm/mmap.c mm/mmap.c
index 71e4ffc83bcd..0f95300c2788 100644
--- mm/mmap.c
+++ mm/mmap.c
@@ -197,7 +197,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 
 	brk = untagged_addr(brk);
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	origbrk = mm->brk;
@@ -271,9 +271,9 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 success:
 	populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0;
 	if (downgraded)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 	else
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 	userfaultfd_unmap_complete(mm, &uf);
 	if (populate)
 		mm_populate(oldbrk, newbrk - oldbrk);
@@ -281,7 +281,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 
 out:
 	retval = origbrk;
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return retval;
 }
 
@@ -2812,7 +2812,7 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	detach_vmas_to_be_unmapped(mm, vma, prev, end);
 
 	if (downgrade)
-		downgrade_write(&mm->mmap_sem);
+		mm_downgrade_write_lock(mm);
 
 	unmap_region(mm, vma, prev, start, end);
 
@@ -2834,7 +2834,7 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
 	struct mm_struct *mm = current->mm;
 	LIST_HEAD(uf);
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	ret = __do_munmap(mm, start, len, &uf, downgrade);
@@ -2844,10 +2844,10 @@ static int __vm_munmap(unsigned long start, size_t len, bool downgrade)
 	 * it to 0 before return.
 	 */
 	if (ret == 1) {
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 		ret = 0;
 	} else
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 
 	userfaultfd_unmap_complete(mm, &uf);
 	return ret;
@@ -2895,7 +2895,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	vma = find_vma(mm, start);
@@ -2958,7 +2958,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 			prot, flags, pgoff, &populate, NULL);
 	fput(file);
 out:
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	if (populate)
 		mm_populate(ret, populate);
 	if (!IS_ERR_VALUE(ret))
@@ -3058,12 +3058,12 @@ int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
 	if (!len)
 		return 0;
 
-	if (down_write_killable(&mm->mmap_sem))
+	if (mm_write_lock_killable(mm))
 		return -EINTR;
 
 	ret = do_brk_flags(addr, len, flags, &uf);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	userfaultfd_unmap_complete(mm, &uf);
 	if (populate && !ret)
 		mm_populate(addr, len);
@@ -3107,8 +3107,8 @@ void exit_mmap(struct mm_struct *mm)
 		(void)__oom_reap_task_mm(mm);
 
 		set_bit(MMF_OOM_SKIP, &mm->flags);
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
+		mm_write_lock(mm);
+		mm_write_unlock(mm);
 	}
 
 	if (mm->locked_vm) {
@@ -3532,7 +3532,7 @@ int mm_take_all_locks(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	struct anon_vma_chain *avc;
 
-	BUG_ON(down_read_trylock(&mm->mmap_sem));
+	BUG_ON(mm_read_trylock(mm));
 
 	mutex_lock(&mm_all_locks_mutex);
 
@@ -3612,7 +3612,7 @@ void mm_drop_all_locks(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	struct anon_vma_chain *avc;
 
-	BUG_ON(down_read_trylock(&mm->mmap_sem));
+	BUG_ON(mm_read_trylock(mm));
 	BUG_ON(!mutex_is_locked(&mm_all_locks_mutex));
 
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
diff --git mm/mmu_notifier.c mm/mmu_notifier.c
index f76ea05b1cb0..fcfaddc2d2f0 100644
--- mm/mmu_notifier.c
+++ mm/mmu_notifier.c
@@ -665,9 +665,9 @@ int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
 {
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	ret = __mmu_notifier_register(mn, mm);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_register);
diff --git mm/mprotect.c mm/mprotect.c
index 7a8e84f86831..fce136e67c6b 100644
--- mm/mprotect.c
+++ mm/mprotect.c
@@ -478,7 +478,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 
 	reqprot = prot;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 
 	/*
@@ -568,7 +568,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 		prot = reqprot;
 	}
 out:
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	return error;
 }
 
@@ -598,7 +598,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
 	if (init_val & ~PKEY_ACCESS_MASK)
 		return -EINVAL;
 
-	down_write(&current->mm->mmap_sem);
+	mm_write_lock(current->mm);
 	pkey = mm_pkey_alloc(current->mm);
 
 	ret = -ENOSPC;
@@ -612,7 +612,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
 	}
 	ret = pkey;
 out:
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	return ret;
 }
 
@@ -620,9 +620,9 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
 {
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
+	mm_write_lock(current->mm);
 	ret = mm_pkey_free(current->mm, pkey);
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 
 	/*
 	 * We could provie warnings or errors if any VMA still
diff --git mm/mremap.c mm/mremap.c
index 122938dcec15..7793d6c51ac2 100644
--- mm/mremap.c
+++ mm/mremap.c
@@ -629,7 +629,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	if (!new_len)
 		return ret;
 
-	if (down_write_killable(&current->mm->mmap_sem))
+	if (mm_write_lock_killable(current->mm))
 		return -EINTR;
 
 	if (flags & MREMAP_FIXED) {
@@ -720,9 +720,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		locked = 0;
 	}
 	if (downgraded)
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	else
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 	if (locked && new_len > old_len)
 		mm_populate(new_addr + old_len, new_len - old_len);
 	userfaultfd_unmap_complete(mm, &uf_unmap_early);
diff --git mm/msync.c mm/msync.c
index c3bd3e75f687..f7c3acd3a69f 100644
--- mm/msync.c
+++ mm/msync.c
@@ -57,7 +57,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 	 * If the interval [start,end) covers some unmapped address ranges,
 	 * just ignore them, but return -ENOMEM at the end.
 	 */
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, start);
 	for (;;) {
 		struct file *file;
@@ -88,12 +88,12 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 		if ((flags & MS_SYNC) && file &&
 				(vma->vm_flags & VM_SHARED)) {
 			get_file(file);
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			error = vfs_fsync_range(file, fstart, fend, 1);
 			fput(file);
 			if (error || start >= end)
 				goto out;
-			down_read(&mm->mmap_sem);
+			mm_read_lock(mm);
 			vma = find_vma(mm, start);
 		} else {
 			if (start >= end) {
@@ -104,7 +104,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 		}
 	}
 out_unlock:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 out:
 	return error ? : unmapped_error;
 }
diff --git mm/nommu.c mm/nommu.c
index bd2b4e5ef144..c137db1923bd 100644
--- mm/nommu.c
+++ mm/nommu.c
@@ -163,11 +163,11 @@ static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
 	if (ret) {
 		struct vm_area_struct *vma;
 
-		down_write(&current->mm->mmap_sem);
+		mm_write_lock(current->mm);
 		vma = find_vma(current->mm, (unsigned long)ret);
 		if (vma)
 			vma->vm_flags |= VM_USERMAP;
-		up_write(&current->mm->mmap_sem);
+		mm_write_unlock(current->mm);
 	}
 
 	return ret;
@@ -1548,9 +1548,9 @@ int vm_munmap(unsigned long addr, size_t len)
 	struct mm_struct *mm = current->mm;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	ret = do_munmap(mm, addr, len, NULL);
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 	return ret;
 }
 EXPORT_SYMBOL(vm_munmap);
@@ -1637,9 +1637,9 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 {
 	unsigned long ret;
 
-	down_write(&current->mm->mmap_sem);
+	mm_write_lock(current->mm);
 	ret = do_mremap(addr, old_len, new_len, flags, new_addr);
-	up_write(&current->mm->mmap_sem);
+	mm_write_unlock(current->mm);
 	return ret;
 }
 
@@ -1711,7 +1711,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	int write = gup_flags & FOLL_WRITE;
 
-	if (down_read_killable(&mm->mmap_sem))
+	if (mm_read_lock_killable(mm))
 		return 0;
 
 	/* the access must start within one of the target process's mappings */
@@ -1734,7 +1734,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 		len = 0;
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return len;
 }
diff --git mm/oom_kill.c mm/oom_kill.c
index d58c481b3df8..2acc196cac84 100644
--- mm/oom_kill.c
+++ mm/oom_kill.c
@@ -568,7 +568,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
 	bool ret = true;
 
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (!mm_read_trylock(mm)) {
 		trace_skip_task_reaping(tsk->pid);
 		return false;
 	}
@@ -599,7 +599,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 out_finish:
 	trace_finish_task_reaping(tsk->pid);
 out_unlock:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	return ret;
 }
diff --git mm/process_vm_access.c mm/process_vm_access.c
index 357aa7bef6c0..12f3a7631682 100644
--- mm/process_vm_access.c
+++ mm/process_vm_access.c
@@ -105,11 +105,11 @@ static int process_vm_rw_single_vec(unsigned long addr,
 		 * access remotely because task/mm might not
 		 * current/current->mm
 		 */
-		down_read(&mm->mmap_sem);
+		mm_read_lock(mm);
 		pages = get_user_pages_remote(task, mm, pa, pages, flags,
 					      process_pages, NULL, &locked);
 		if (locked)
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 		if (pages <= 0)
 			return -EFAULT;
 
diff --git mm/swapfile.c mm/swapfile.c
index bb3261d45b6a..899f51e11ec5 100644
--- mm/swapfile.c
+++ mm/swapfile.c
@@ -2070,7 +2070,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type,
 	struct vm_area_struct *vma;
 	int ret = 0;
 
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		if (vma->anon_vma) {
 			ret = unuse_vma(vma, type, frontswap,
@@ -2080,7 +2080,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type,
 		}
 		cond_resched();
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return ret;
 }
 
diff --git mm/userfaultfd.c mm/userfaultfd.c
index 1b0d7abad1d4..2e27f1276a92 100644
--- mm/userfaultfd.c
+++ mm/userfaultfd.c
@@ -226,7 +226,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	 * feature is not supported.
 	 */
 	if (zeropage) {
-		up_read(&dst_mm->mmap_sem);
+		mm_read_unlock(dst_mm);
 		return -EINVAL;
 	}
 
@@ -306,7 +306,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		cond_resched();
 
 		if (unlikely(err == -ENOENT)) {
-			up_read(&dst_mm->mmap_sem);
+			mm_read_unlock(dst_mm);
 			BUG_ON(!page);
 
 			err = copy_huge_page_from_user(page,
@@ -317,7 +317,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 				err = -EFAULT;
 				goto out;
 			}
-			down_read(&dst_mm->mmap_sem);
+			mm_read_lock(dst_mm);
 
 			dst_vma = NULL;
 			goto retry;
@@ -337,7 +337,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	}
 
 out_unlock:
-	up_read(&dst_mm->mmap_sem);
+	mm_read_unlock(dst_mm);
 out:
 	if (page) {
 		/*
@@ -471,7 +471,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	copied = 0;
 	page = NULL;
 retry:
-	down_read(&dst_mm->mmap_sem);
+	mm_read_lock(dst_mm);
 
 	/*
 	 * If memory mappings are changing because of non-cooperative
@@ -561,7 +561,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 		if (unlikely(err == -ENOENT)) {
 			void *page_kaddr;
 
-			up_read(&dst_mm->mmap_sem);
+			mm_read_unlock(dst_mm);
 			BUG_ON(!page);
 
 			page_kaddr = kmap(page);
@@ -590,7 +590,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	}
 
 out_unlock:
-	up_read(&dst_mm->mmap_sem);
+	mm_read_unlock(dst_mm);
 out:
 	if (page)
 		put_page(page);
diff --git mm/util.c mm/util.c
index 988d11e6c17c..511e442e7329 100644
--- mm/util.c
+++ mm/util.c
@@ -481,10 +481,10 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc)
 	if (pages == 0 || !mm)
 		return 0;
 
-	down_write(&mm->mmap_sem);
+	mm_write_lock(mm);
 	ret = __account_locked_vm(mm, pages, inc, current,
 				  capable(CAP_IPC_LOCK));
-	up_write(&mm->mmap_sem);
+	mm_write_unlock(mm);
 
 	return ret;
 }
@@ -501,11 +501,11 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		if (down_write_killable(&mm->mmap_sem))
+		if (mm_write_lock_killable(mm))
 			return -EINTR;
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate, &uf);
-		up_write(&mm->mmap_sem);
+		mm_write_unlock(mm);
 		userfaultfd_unmap_complete(mm, &uf);
 		if (populate)
 			mm_populate(ret, populate);
diff --git net/ipv4/tcp.c net/ipv4/tcp.c
index a7d766e6390e..041d6585f97d 100644
--- net/ipv4/tcp.c
+++ net/ipv4/tcp.c
@@ -1755,7 +1755,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 
 	sock_rps_record_flow(sk);
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 
 	ret = -EINVAL;
 	vma = find_vma(current->mm, address);
@@ -1817,7 +1817,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 		frags++;
 	}
 out:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	if (length) {
 		WRITE_ONCE(tp->copied_seq, seq);
 		tcp_rcv_space_adjust(sk);
diff --git net/xdp/xdp_umem.c net/xdp/xdp_umem.c
index 3049af269fbf..93d6c717987b 100644
--- net/xdp/xdp_umem.c
+++ net/xdp/xdp_umem.c
@@ -290,10 +290,10 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
 	if (!umem->pgs)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	npgs = get_user_pages(umem->address, umem->npgs,
 			      gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	if (npgs != umem->npgs) {
 		if (npgs >= 0) {
diff --git virt/kvm/arm/mmu.c virt/kvm/arm/mmu.c
index 0b32a904a1bb..1b3923a6f199 100644
--- virt/kvm/arm/mmu.c
+++ virt/kvm/arm/mmu.c
@@ -975,7 +975,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 	int idx;
 
 	idx = srcu_read_lock(&kvm->srcu);
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
@@ -983,7 +983,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 		stage2_unmap_memslot(kvm, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
@@ -1693,11 +1693,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	/* Let's check if we will get back a huge page backed by hugetlbfs */
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	vma = find_vma_intersection(current->mm, hva, hva + 1);
 	if (unlikely(!vma)) {
 		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 		return -EFAULT;
 	}
 
@@ -1719,7 +1719,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PMD_SIZE ||
 	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	/* We need minimum second+third level pages */
 	ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
@@ -2294,7 +2294,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	    (kvm_phys_size(kvm) >> PAGE_SHIFT))
 		return -EFAULT;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	/*
 	 * A memory region could potentially cover multiple VMAs, and any holes
 	 * between them, so iterate over all of them to find out if we can map
@@ -2353,7 +2353,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		stage2_flush_memslot(kvm, memslot);
 	spin_unlock(&kvm->mmu_lock);
 out:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	return ret;
 }
 
diff --git virt/kvm/async_pf.c virt/kvm/async_pf.c
index 35305d6e68cc..ab72a1a5ac0b 100644
--- virt/kvm/async_pf.c
+++ virt/kvm/async_pf.c
@@ -74,11 +74,11 @@ static void async_pf_execute(struct work_struct *work)
 	 * mm and might be done in another context, so we must
 	 * access remotely.
 	 */
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
 			&locked);
 	if (locked)
-		up_read(&mm->mmap_sem);
+		mm_read_unlock(mm);
 
 	kvm_async_page_present_sync(vcpu, apf);
 
diff --git virt/kvm/kvm_main.c virt/kvm/kvm_main.c
index 887051ded021..b1bb96c72efb 100644
--- virt/kvm/kvm_main.c
+++ virt/kvm/kvm_main.c
@@ -1417,7 +1417,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
 	if (kvm_is_error_hva(addr))
 		return PAGE_SIZE;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	vma = find_vma(current->mm, addr);
 	if (!vma)
 		goto out;
@@ -1425,7 +1425,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
 	size = vma_kernel_pagesize(vma);
 
 out:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 
 	return size;
 }
@@ -1680,7 +1680,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (npages == 1)
 		return pfn;
 
-	down_read(&current->mm->mmap_sem);
+	mm_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
@@ -1704,7 +1704,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 		pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
-	up_read(&current->mm->mmap_sem);
+	mm_read_unlock(current->mm);
 	return pfn;
 }
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 03/24] MM locking API: manual conversion of mmap_sem call sites missed by coccinelle
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 01/24] MM locking API: initial implementation as rwsem wrappers Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 02/24] MM locking API: use coccinelle to convert mmap_sem rwsem call sites Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 04/24] MM locking API: add range arguments Michel Lespinasse
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Convert the last few remaining mmap_sem rwsem calls to use the new
MM locking API. These were missed by coccinelle for some reason
(I think coccinelle does not support some of the preprocessor
constructs in these files ?)

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/mips/mm/fault.c           | 10 +++++-----
 arch/x86/kvm/mmu/paging_tmpl.h |  8 ++++----
 drivers/android/binder_alloc.c |  4 ++--
 fs/proc/base.c                 |  6 +++---
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git arch/mips/mm/fault.c arch/mips/mm/fault.c
index 1e8d00793784..58cfc3f5f659 100644
--- arch/mips/mm/fault.c
+++ arch/mips/mm/fault.c
@@ -97,7 +97,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
 	if (user_mode(regs))
 		flags |= FAULT_FLAG_USER;
 retry:
-	down_read(&mm->mmap_sem);
+	mm_read_lock(mm);
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
@@ -191,7 +191,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
 		}
 	}
 
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	return;
 
 /*
@@ -199,7 +199,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
  * Fix it, but check if it's kernel or user first..
  */
 bad_area:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 bad_area_nosemaphore:
 	/* User mode accesses just cause a SIGSEGV */
@@ -251,14 +251,14 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
 	 * We ran out of memory, call the OOM killer, and return the userspace
 	 * (which will retry the fault, or kill us if we got oom-killed).
 	 */
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	if (!user_mode(regs))
 		goto no_context;
 	pagefault_out_of_memory();
 	return;
 
 do_sigbus:
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 
 	/* Kernel mode? Handle exceptions or die */
 	if (!user_mode(regs))
diff --git arch/x86/kvm/mmu/paging_tmpl.h arch/x86/kvm/mmu/paging_tmpl.h
index 97b21e7fd013..01b633e800b9 100644
--- arch/x86/kvm/mmu/paging_tmpl.h
+++ arch/x86/kvm/mmu/paging_tmpl.h
@@ -150,22 +150,22 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		unsigned long pfn;
 		unsigned long paddr;
 
-		down_read(&current->mm->mmap_sem);
+		mm_read_lock(current->mm);
 		vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE);
 		if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			return -EFAULT;
 		}
 		pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 		paddr = pfn << PAGE_SHIFT;
 		table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
 		if (!table) {
-			up_read(&current->mm->mmap_sem);
+			mm_read_unlock(current->mm);
 			return -EFAULT;
 		}
 		ret = CMPXCHG(&table[index], orig_pte, new_pte);
 		memunmap(table);
-		up_read(&current->mm->mmap_sem);
+		mm_read_unlock(current->mm);
 	}
 
 	return (ret != orig_pte);
diff --git drivers/android/binder_alloc.c drivers/android/binder_alloc.c
index caddf155fcab..f607fa2d00c3 100644
--- drivers/android/binder_alloc.c
+++ drivers/android/binder_alloc.c
@@ -932,7 +932,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	mm = alloc->vma_vm_mm;
 	if (!mmget_not_zero(mm))
 		goto err_mmget;
-	if (!down_read_trylock(&mm->mmap_sem))
+	if (!mm_read_trylock(mm))
 		goto err_down_read_mmap_sem_failed;
 	vma = binder_alloc_get_vma(alloc);
 
@@ -946,7 +946,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 
 		trace_binder_unmap_user_end(alloc, index);
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 
 	trace_binder_unmap_kernel_start(alloc, index);
diff --git fs/proc/base.c fs/proc/base.c
index 31c56a08af0f..33ab92802834 100644
--- fs/proc/base.c
+++ fs/proc/base.c
@@ -2189,7 +2189,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 	if (!mm)
 		goto out_put_task;
 
-	ret = down_read_killable(&mm->mmap_sem);
+	ret = mm_read_lock_killable(mm);
 	if (ret) {
 		mmput(mm);
 		goto out_put_task;
@@ -2216,7 +2216,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 		p = genradix_ptr_alloc(&fa, nr_files++, GFP_KERNEL);
 		if (!p) {
 			ret = -ENOMEM;
-			up_read(&mm->mmap_sem);
+			mm_read_unlock(mm);
 			mmput(mm);
 			goto out_put_task;
 		}
@@ -2225,7 +2225,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 		p->end = vma->vm_end;
 		p->mode = vma->vm_file->f_mode;
 	}
-	up_read(&mm->mmap_sem);
+	mm_read_unlock(mm);
 	mmput(mm);
 
 	for (i = 0; i < nr_files; i++) {
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/24] MM locking API: add range arguments
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (2 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 03/24] MM locking API: manual conversion of mmap_sem call sites missed by coccinelle Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 05/24] MM locking API: allow for sleeping during unlock Michel Lespinasse
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change extends the MM locking API to pass ranges.
The ranges will be used to implement range locking, but for now
we only check that the passed ranges match between lock and unlock calls.

Add a new CONFIG_MM_LOCK_RWSEM_CHECKED config option to verify that
ranges are correctly paired accross lock/unlock function calls.

To ensure an easy transition, the existing coarse MM locking calls
are using a default range, which is represented by a per-task
structure. This allows a task's paired coarse lock/unlock calls to be
translated into correctly paired struct mm_lock_range locks and unlocks.

Add some small additional changes to kernel/fork.c (dup_mmap had a
single task locking two MM's at once, so it has to explicitly manage
the corresponding struct mm_lock_range) and kernel/bpf/stackmap.c
(dumping user stacks from interrupt context requires explicit tracking
of struct mm_lock_range).

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/um/include/asm/mmu_context.h |   4 +-
 include/linux/mm_lock.h           | 148 ++++++++++++++++++++++++++++--
 include/linux/mm_types_task.h     |   6 ++
 include/linux/sched.h             |   2 +
 init/init_task.c                  |   1 +
 kernel/bpf/stackmap.c             |  26 ++++--
 kernel/fork.c                     |   7 +-
 mm/Kconfig                        |  18 ++++
 mm/Makefile                       |   1 +
 mm/mm_lock_rwsem_checked.c        | 131 ++++++++++++++++++++++++++
 10 files changed, 322 insertions(+), 22 deletions(-)
 create mode 100644 mm/mm_lock_rwsem_checked.c

diff --git arch/um/include/asm/mmu_context.h arch/um/include/asm/mmu_context.h
index 7bd591231e2d..2e84e7d98141 100644
--- arch/um/include/asm/mmu_context.h
+++ arch/um/include/asm/mmu_context.h
@@ -47,12 +47,14 @@ extern void force_flush_all(void);
 
 static inline void activate_mm(struct mm_struct *old, struct mm_struct *new)
 {
+	struct mm_lock_range mm_range = MM_COARSE_LOCK_RANGE_INITIALIZER;
+
 	/*
 	 * This is called by fs/exec.c and sys_unshare()
 	 * when the new ->mm is used for the first time.
 	 */
 	__switch_mm(&new->context.id);
-	down_write_nested(&new->mmap_sem, 1);
+	mm_write_range_lock_nested(new, &mm_range, 1);
 	uml_setup_stubs(new);
 	mm_write_unlock(new);
 }
diff --git include/linux/mm_lock.h include/linux/mm_lock.h
index b5f134285e53..8ed92ebe58a1 100644
--- include/linux/mm_lock.h
+++ include/linux/mm_lock.h
@@ -1,56 +1,186 @@
 #ifndef _LINUX_MM_LOCK_H
 #define _LINUX_MM_LOCK_H
 
+#include <linux/sched.h>
+
 static inline void mm_init_lock(struct mm_struct *mm)
 {
 	init_rwsem(&mm->mmap_sem);
 }
 
-static inline void mm_write_lock(struct mm_struct *mm)
+#ifdef CONFIG_MM_LOCK_RWSEM_INLINE
+
+#define MM_COARSE_LOCK_RANGE_INITIALIZER {}
+
+static inline void mm_init_coarse_lock_range(struct mm_lock_range *range) {}
+
+static inline void mm_write_range_lock(struct mm_struct *mm,
+				       struct mm_lock_range *range)
 {
 	down_write(&mm->mmap_sem);
 }
 
-static inline int mm_write_lock_killable(struct mm_struct *mm)
+static inline void mm_write_range_lock_nested(struct mm_struct *mm,
+					      struct mm_lock_range *range,
+					      int subclass)
+{
+	down_write_nested(&mm->mmap_sem, subclass);
+}
+
+static inline int mm_write_range_lock_killable(struct mm_struct *mm,
+					       struct mm_lock_range *range)
 {
 	return down_write_killable(&mm->mmap_sem);
 }
 
-static inline bool mm_write_trylock(struct mm_struct *mm)
+static inline bool mm_write_range_trylock(struct mm_struct *mm,
+					  struct mm_lock_range *range)
 {
 	return down_write_trylock(&mm->mmap_sem) != 0;
 }
 
-static inline void mm_write_unlock(struct mm_struct *mm)
+static inline void mm_write_range_unlock(struct mm_struct *mm,
+					 struct mm_lock_range *range)
 {
 	up_write(&mm->mmap_sem);
 }
 
-static inline void mm_downgrade_write_lock(struct mm_struct *mm)
+static inline void mm_downgrade_write_range_lock(struct mm_struct *mm,
+						 struct mm_lock_range *range)
 {
 	downgrade_write(&mm->mmap_sem);
 }
 
-static inline void mm_read_lock(struct mm_struct *mm)
+static inline void mm_read_range_lock(struct mm_struct *mm,
+				      struct mm_lock_range *range)
 {
 	down_read(&mm->mmap_sem);
 }
 
-static inline int mm_read_lock_killable(struct mm_struct *mm)
+static inline int mm_read_range_lock_killable(struct mm_struct *mm,
+					      struct mm_lock_range *range)
 {
 	return down_read_killable(&mm->mmap_sem);
 }
 
-static inline bool mm_read_trylock(struct mm_struct *mm)
+static inline bool mm_read_range_trylock(struct mm_struct *mm,
+					 struct mm_lock_range *range)
 {
 	return down_read_trylock(&mm->mmap_sem) != 0;
 }
 
-static inline void mm_read_unlock(struct mm_struct *mm)
+static inline void mm_read_range_unlock(struct mm_struct *mm,
+					struct mm_lock_range *range)
 {
 	up_read(&mm->mmap_sem);
 }
 
+static inline void mm_read_range_unlock_non_owner(struct mm_struct *mm,
+						  struct mm_lock_range *range)
+{
+	up_read_non_owner(&mm->mmap_sem);
+}
+
+static inline struct mm_lock_range *mm_coarse_lock_range(void)
+{
+	return NULL;
+}
+
+#else /* CONFIG_MM_LOCK_RWSEM_CHECKED */
+
+#define MM_COARSE_LOCK_RANGE_INITIALIZER { .mm = NULL }
+
+static inline void mm_init_coarse_lock_range(struct mm_lock_range *range)
+{
+	range->mm = NULL;
+}
+
+extern void mm_write_range_lock(struct mm_struct *mm,
+				struct mm_lock_range *range);
+#ifdef CONFIG_LOCKDEP
+extern void mm_write_range_lock_nested(struct mm_struct *mm,
+				       struct mm_lock_range *range,
+				       int subclass);
+#else
+#define mm_write_range_lock_nested(mm, range, subclass) \
+	mm_write_range_lock(mm, range)
+#endif
+extern int mm_write_range_lock_killable(struct mm_struct *mm,
+					struct mm_lock_range *range);
+extern bool mm_write_range_trylock(struct mm_struct *mm,
+				   struct mm_lock_range *range);
+extern void mm_write_range_unlock(struct mm_struct *mm,
+				  struct mm_lock_range *range);
+extern void mm_downgrade_write_range_lock(struct mm_struct *mm,
+					  struct mm_lock_range *range);
+extern void mm_read_range_lock(struct mm_struct *mm,
+			       struct mm_lock_range *range);
+extern int mm_read_range_lock_killable(struct mm_struct *mm,
+				       struct mm_lock_range *range);
+extern bool mm_read_range_trylock(struct mm_struct *mm,
+				  struct mm_lock_range *range);
+extern void mm_read_range_unlock(struct mm_struct *mm,
+				 struct mm_lock_range *range);
+extern void mm_read_range_unlock_non_owner(struct mm_struct *mm,
+					   struct mm_lock_range *range);
+
+static inline struct mm_lock_range *mm_coarse_lock_range(void)
+{
+	return &current->mm_coarse_lock_range;
+}
+
+#endif
+
+static inline void mm_read_release(struct mm_struct *mm, unsigned long ip)
+{
+	rwsem_release(&mm->mmap_sem.dep_map, ip);
+}
+
+static inline void mm_write_lock(struct mm_struct *mm)
+{
+	mm_write_range_lock(mm, mm_coarse_lock_range());
+}
+
+static inline int mm_write_lock_killable(struct mm_struct *mm)
+{
+	return mm_write_range_lock_killable(mm, mm_coarse_lock_range());
+}
+
+static inline bool mm_write_trylock(struct mm_struct *mm)
+{
+	return mm_write_range_trylock(mm, mm_coarse_lock_range());
+}
+
+static inline void mm_write_unlock(struct mm_struct *mm)
+{
+	mm_write_range_unlock(mm, mm_coarse_lock_range());
+}
+
+static inline void mm_downgrade_write_lock(struct mm_struct *mm)
+{
+	mm_downgrade_write_range_lock(mm, mm_coarse_lock_range());
+}
+
+static inline void mm_read_lock(struct mm_struct *mm)
+{
+	mm_read_range_lock(mm, mm_coarse_lock_range());
+}
+
+static inline int mm_read_lock_killable(struct mm_struct *mm)
+{
+	return mm_read_range_lock_killable(mm, mm_coarse_lock_range());
+}
+
+static inline bool mm_read_trylock(struct mm_struct *mm)
+{
+	return mm_read_range_trylock(mm, mm_coarse_lock_range());
+}
+
+static inline void mm_read_unlock(struct mm_struct *mm)
+{
+	mm_read_range_unlock(mm, mm_coarse_lock_range());
+}
+
 static inline bool mm_is_locked(struct mm_struct *mm)
 {
 	return rwsem_is_locked(&mm->mmap_sem) != 0;
diff --git include/linux/mm_types_task.h include/linux/mm_types_task.h
index c1bc6731125c..d98c2a2293c1 100644
--- include/linux/mm_types_task.h
+++ include/linux/mm_types_task.h
@@ -96,4 +96,10 @@ struct tlbflush_unmap_batch {
 #endif
 };
 
+struct mm_lock_range {
+#ifdef CONFIG_MM_LOCK_RWSEM_CHECKED
+	struct mm_struct *mm;
+#endif
+};
+
 #endif /* _LINUX_MM_TYPES_TASK_H */
diff --git include/linux/sched.h include/linux/sched.h
index 716ad1d8d95e..c573590076e1 100644
--- include/linux/sched.h
+++ include/linux/sched.h
@@ -1281,6 +1281,8 @@ struct task_struct {
 	unsigned long			prev_lowest_stack;
 #endif
 
+	struct mm_lock_range		mm_coarse_lock_range;
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git init/init_task.c init/init_task.c
index 9e5cbe5eab7b..ae54f69092a2 100644
--- init/init_task.c
+++ init/init_task.c
@@ -181,6 +181,7 @@ struct task_struct init_task
 #ifdef CONFIG_SECURITY
 	.security	= NULL,
 #endif
+	.mm_coarse_lock_range	= MM_COARSE_LOCK_RANGE_INITIALIZER,
 };
 EXPORT_SYMBOL(init_task);
 
diff --git kernel/bpf/stackmap.c kernel/bpf/stackmap.c
index 8087d31b6471..ba2399ce00e4 100644
--- kernel/bpf/stackmap.c
+++ kernel/bpf/stackmap.c
@@ -33,7 +33,8 @@ struct bpf_stack_map {
 /* irq_work to run up_read() for build_id lookup in nmi context */
 struct stack_map_irq_work {
 	struct irq_work irq_work;
-	struct rw_semaphore *sem;
+	struct mm_struct *mm;
+	struct mm_lock_range mm_range;
 };
 
 static void do_up_read(struct irq_work *entry)
@@ -41,8 +42,7 @@ static void do_up_read(struct irq_work *entry)
 	struct stack_map_irq_work *work;
 
 	work = container_of(entry, struct stack_map_irq_work, irq_work);
-	up_read_non_owner(work->sem);
-	work->sem = NULL;
+	mm_read_range_unlock_non_owner(work->mm, &work->mm_range);
 }
 
 static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work);
@@ -286,12 +286,17 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	struct vm_area_struct *vma;
 	bool irq_work_busy = false;
 	struct stack_map_irq_work *work = NULL;
+	struct mm_lock_range mm_range = MM_COARSE_LOCK_RANGE_INITIALIZER;
+	struct mm_lock_range *mm_range_ptr = &mm_range;
 
 	if (irqs_disabled()) {
 		work = this_cpu_ptr(&up_read_work);
-		if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
+		if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY) {
 			/* cannot queue more up_read, fallback */
 			irq_work_busy = true;
+		} else {
+			mm_range_ptr = &work->mm_range;
+		}
 	}
 
 	/*
@@ -305,7 +310,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	 * with build_id.
 	 */
 	if (!user || !current || !current->mm || irq_work_busy ||
-	    mm_read_trylock(current->mm) == 0) {
+	    !mm_read_range_trylock(current->mm, mm_range_ptr)) {
 		/* cannot access current->mm, fall back to ips */
 		for (i = 0; i < trace_nr; i++) {
 			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
@@ -330,16 +335,16 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	}
 
 	if (!work) {
-		mm_read_unlock(current->mm);
+		mm_read_range_unlock(current->mm, mm_range_ptr);
 	} else {
-		work->sem = &current->mm->mmap_sem;
+		work->mm = current->mm;
 		irq_work_queue(&work->irq_work);
 		/*
 		 * The irq_work will release the mmap_sem with
-		 * up_read_non_owner(). The rwsem_release() is called
-		 * here to release the lock from lockdep's perspective.
+		 * mm_read_range_unlock_non_owner(). mm_read_release() is
+		 * called here to release the lock from lockdep's perspective.
 		 */
-		rwsem_release(&current->mm->mmap_sem.dep_map, _RET_IP_);
+		mm_read_release(current->mm, _RET_IP_);
 	}
 }
 
@@ -626,6 +631,7 @@ static int __init stack_map_init(void)
 	for_each_possible_cpu(cpu) {
 		work = per_cpu_ptr(&up_read_work, cpu);
 		init_irq_work(&work->irq_work, do_up_read);
+		mm_init_coarse_lock_range(&work->mm_range);
 	}
 	return 0;
 }
diff --git kernel/fork.c kernel/fork.c
index d598f56e4b1e..3db694381ef5 100644
--- kernel/fork.c
+++ kernel/fork.c
@@ -486,6 +486,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	int retval;
 	unsigned long charge;
 	LIST_HEAD(uf);
+	struct mm_lock_range mm_range = MM_COARSE_LOCK_RANGE_INITIALIZER;
 
 	uprobe_start_dup_mmap();
 	if (mm_write_lock_killable(oldmm)) {
@@ -497,7 +498,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	/*
 	 * Not linked in yet - no deadlock potential:
 	 */
-	down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING);
+	mm_write_range_lock_nested(mm, &mm_range, SINGLE_DEPTH_NESTING);
 
 	/* No ordering required: file already has been exposed. */
 	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
@@ -612,7 +613,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	/* a new mm has just been created */
 	retval = arch_dup_mmap(oldmm, mm);
 out:
-	mm_write_unlock(mm);
+	mm_write_range_unlock(mm, &mm_range);
 	flush_tlb_mm(oldmm);
 	mm_write_unlock(oldmm);
 	dup_userfaultfd_complete(&uf);
@@ -947,6 +948,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #ifdef CONFIG_MEMCG
 	tsk->active_memcg = NULL;
 #endif
+
+	mm_init_coarse_lock_range(&tsk->mm_coarse_lock_range);
 	return tsk;
 
 free_stack:
diff --git mm/Kconfig mm/Kconfig
index ab80933be65f..574fb51789a5 100644
--- mm/Kconfig
+++ mm/Kconfig
@@ -739,4 +739,22 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+choice
+	prompt "MM lock implementation (mmap_sem)"
+	default MM_LOCK_RWSEM_CHECKED
+
+config MM_LOCK_RWSEM_INLINE
+	bool "rwsem, inline"
+	help
+	  This option preserves the traditional MM lock implementation as
+	  inline read-write semaphone operations.
+
+config MM_LOCK_RWSEM_CHECKED
+	bool "rwsem, checked"
+	help
+	  This option implements the MM lock using a read-write semaphore,
+	  ignoring the passed address range but checking its validity.
+
+endchoice
+
 endmenu
diff --git mm/Makefile mm/Makefile
index 1937cc251883..9f46376c6407 100644
--- mm/Makefile
+++ mm/Makefile
@@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o
 obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
+obj-$(CONFIG_MM_LOCK_RWSEM_CHECKED) += mm_lock_rwsem_checked.o
diff --git mm/mm_lock_rwsem_checked.c mm/mm_lock_rwsem_checked.c
new file mode 100644
index 000000000000..3551deb85e3d
--- /dev/null
+++ mm/mm_lock_rwsem_checked.c
@@ -0,0 +1,131 @@
+#include <linux/mm_lock.h>
+#include <linux/printk.h>
+
+static int mm_lock_debug = 1;
+
+static void mm_lock_dump(char *msg) {
+	if (!mm_lock_debug) {
+		return;
+	}
+	mm_lock_debug = 0;
+	pr_err("mm_lock_dump: %s\n", msg);
+	dump_stack();
+	pr_err("mm_lock_dump: done\n");
+}
+
+void mm_write_range_lock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	if (range->mm != NULL)
+		mm_lock_dump("mm_write_range_lock");
+	down_write(&mm->mmap_sem);
+	range->mm = mm;
+}
+EXPORT_SYMBOL(mm_write_range_lock);
+
+#ifdef CONFIG_LOCKDEP
+void mm_write_range_lock_nested(struct mm_struct *mm,
+				struct mm_lock_range *range, int subclass)
+{
+	if (range->mm != NULL)
+		mm_lock_dump("mm_write_range_lock_nested");
+	down_write_nested(&mm->mmap_sem, subclass);
+	range->mm = mm;
+}
+EXPORT_SYMBOL(mm_write_range_lock_nested);
+#endif
+
+int mm_write_range_lock_killable(struct mm_struct *mm,
+				 struct mm_lock_range *range)
+{
+	int ret;
+	if (range->mm != NULL)
+		mm_lock_dump("mm_write_range_lock_killable");
+	ret = down_write_killable(&mm->mmap_sem);
+	if (!ret)
+		range->mm = mm;
+	return ret;
+}
+EXPORT_SYMBOL(mm_write_range_lock_killable);
+
+bool mm_write_range_trylock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool ret = down_write_trylock(&mm->mmap_sem) != 0;
+	if (ret) {
+		if (range->mm != NULL)
+			mm_lock_dump("mm_write_range_trylock");
+		range->mm = mm;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(mm_write_range_trylock);
+
+void mm_write_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	if (range->mm != mm)
+		mm_lock_dump("mm_write_range_unlock");
+	range->mm = NULL;
+	up_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL(mm_write_range_unlock);
+
+void mm_downgrade_write_range_lock(struct mm_struct *mm,
+				   struct mm_lock_range *range)
+{
+	if (range->mm != mm)
+		mm_lock_dump("mm_downgrade_write_range_lock");
+	downgrade_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL(mm_downgrade_write_range_lock);
+
+void mm_read_range_lock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	if (range->mm != NULL)
+		mm_lock_dump("mm_read_range_lock");
+	down_read(&mm->mmap_sem);
+	range->mm = mm;
+}
+EXPORT_SYMBOL(mm_read_range_lock);
+
+int mm_read_range_lock_killable(struct mm_struct *mm,
+				struct mm_lock_range *range)
+{
+	int ret;
+	if (range->mm != NULL)
+		mm_lock_dump("mm_read_range_lock_killable");
+	ret = down_read_killable(&mm->mmap_sem);
+	if (!ret)
+		range->mm = mm;
+	return ret;
+}
+EXPORT_SYMBOL(mm_read_range_lock_killable);
+
+bool mm_read_range_trylock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool ret;
+	if (range->mm != NULL)
+		mm_lock_dump("mm_read_range_trylock");
+	ret = down_read_trylock(&mm->mmap_sem) != 0;
+	if (ret)
+		range->mm = mm;
+	return ret;
+}
+EXPORT_SYMBOL(mm_read_range_trylock);
+
+void mm_read_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	if (range->mm != mm)
+		mm_lock_dump("mm_read_range_unlock");
+	range->mm = NULL;
+	up_read(&mm->mmap_sem);
+}
+EXPORT_SYMBOL(mm_read_range_unlock);
+
+void mm_read_range_unlock_non_owner(struct mm_struct *mm,
+				    struct mm_lock_range *range)
+{
+	if (range->mm != mm)
+		mm_lock_dump("mm_read_range_unlock_non_owner");
+	range->mm = NULL;
+	up_read_non_owner(&mm->mmap_sem);
+}
+EXPORT_SYMBOL(mm_read_range_unlock_non_owner);
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 05/24] MM locking API: allow for sleeping during unlock
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (3 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 04/24] MM locking API: add range arguments Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 06/24] MM locking API: implement fine grained range locks Michel Lespinasse
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Following changes will implement fine-grained range locks for the
MM locking API, using a data structure to represent the existing
locks, and a mutex to protect that data structure.

As a result, we need to prepare for the possibility that unlocking
a memory range may need to sleep when acquiring that mutex.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 kernel/bpf/stackmap.c      | 6 +++++-
 kernel/exit.c              | 2 +-
 mm/mm_lock_rwsem_checked.c | 3 +++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git kernel/bpf/stackmap.c kernel/bpf/stackmap.c
index ba2399ce00e4..0f483abeb94c 100644
--- kernel/bpf/stackmap.c
+++ kernel/bpf/stackmap.c
@@ -308,8 +308,12 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	 *
 	 * Same fallback is used for kernel stack (!user) on a stackmap
 	 * with build_id.
+	 *
+	 * FIXME - currently disabling the build_id lookup feature
+	 * as mm_read_range_unlock() may block, which is not always
+	 * possible to do here.
 	 */
-	if (!user || !current || !current->mm || irq_work_busy ||
+	if (true || !user || !current || !current->mm || irq_work_busy ||
 	    !mm_read_range_trylock(current->mm, mm_range_ptr)) {
 		/* cannot access current->mm, fall back to ips */
 		for (i = 0; i < trace_nr; i++) {
diff --git kernel/exit.c kernel/exit.c
index 9a0b72562adb..60ec6efb4e2c 100644
--- kernel/exit.c
+++ kernel/exit.c
@@ -478,9 +478,9 @@ static void exit_mm(void)
 	/* more a memory barrier than a real lock */
 	task_lock(current);
 	current->mm = NULL;
-	mm_read_unlock(mm);
 	enter_lazy_tlb(mm, current);
 	task_unlock(current);
+	mm_read_unlock(mm);
 	mm_update_next_owner(mm);
 	mmput(mm);
 	if (test_thread_flag(TIF_MEMDIE))
diff --git mm/mm_lock_rwsem_checked.c mm/mm_lock_rwsem_checked.c
index 3551deb85e3d..e45d1a598c87 100644
--- mm/mm_lock_rwsem_checked.c
+++ mm/mm_lock_rwsem_checked.c
@@ -61,6 +61,7 @@ EXPORT_SYMBOL(mm_write_range_trylock);
 
 void mm_write_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
 {
+	might_sleep();
 	if (range->mm != mm)
 		mm_lock_dump("mm_write_range_unlock");
 	range->mm = NULL;
@@ -113,6 +114,7 @@ EXPORT_SYMBOL(mm_read_range_trylock);
 
 void mm_read_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
 {
+	might_sleep();
 	if (range->mm != mm)
 		mm_lock_dump("mm_read_range_unlock");
 	range->mm = NULL;
@@ -123,6 +125,7 @@ EXPORT_SYMBOL(mm_read_range_unlock);
 void mm_read_range_unlock_non_owner(struct mm_struct *mm,
 				    struct mm_lock_range *range)
 {
+	might_sleep();
 	if (range->mm != mm)
 		mm_lock_dump("mm_read_range_unlock_non_owner");
 	range->mm = NULL;
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/24] MM locking API: implement fine grained range locks
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (4 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 05/24] MM locking API: allow for sleeping during unlock Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 07/24] mm/memory: add range field to struct vm_fault Michel Lespinasse
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change implements fine grained reader-writer range locks.

Existing locked ranges are represented as an augmented rbtree
protected by a mutex. The locked ranges hold information about
two overlapping interval trees, representing the reader and writer
locks respectively. This data structure allows quickly searching for
existing readers, writers, or both, intersecting a given address range.

When locking a range, a count of all existing conflicting
ranges (either already locked, or queued) is added to mm_lock_range
struct. If the count is non-zero, the locking task is put to sleep
until all conflicting lock ranges are released.

When unlocking a range, the conflict count for all existing (queued)
conflicting ranges is decremented. If the count reaches zero, the
locker task is woken up - it now has a lock on its desired address range.

The general approach for this range locking implementation was first
proposed by Jan Kara back in 2013, and later worked on by at least
Laurent Dufour and Davidlohr Bueso. I have extended on the approach
by using separate indexes for the reader and writer range locks.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/kernel/tboot.c       |   2 +-
 drivers/firmware/efi/efi.c    |   2 +-
 include/linux/mm_lock.h       |  96 ++++-
 include/linux/mm_types.h      |  20 +
 include/linux/mm_types_task.h |  15 +
 mm/Kconfig                    |   9 +-
 mm/Makefile                   |   1 +
 mm/init-mm.c                  |   3 +-
 mm/mm_lock_range.c            | 691 ++++++++++++++++++++++++++++++++++
 9 files changed, 827 insertions(+), 12 deletions(-)
 create mode 100644 mm/mm_lock_range.c

diff --git arch/x86/kernel/tboot.c arch/x86/kernel/tboot.c
index 4c61f0713832..68bb5e9b0324 100644
--- arch/x86/kernel/tboot.c
+++ arch/x86/kernel/tboot.c
@@ -90,7 +90,7 @@ static struct mm_struct tboot_mm = {
 	.pgd            = swapper_pg_dir,
 	.mm_users       = ATOMIC_INIT(2),
 	.mm_count       = ATOMIC_INIT(1),
-	.mmap_sem       = __RWSEM_INITIALIZER(init_mm.mmap_sem),
+	.mmap_sem       = MM_LOCK_INITIALIZER(init_mm.mmap_sem),
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
 	.mmlist         = LIST_HEAD_INIT(init_mm.mmlist),
 };
diff --git drivers/firmware/efi/efi.c drivers/firmware/efi/efi.c
index 2b02cb165f16..fb5c9d53ceb2 100644
--- drivers/firmware/efi/efi.c
+++ drivers/firmware/efi/efi.c
@@ -60,7 +60,7 @@ struct mm_struct efi_mm = {
 	.mm_rb			= RB_ROOT,
 	.mm_users		= ATOMIC_INIT(2),
 	.mm_count		= ATOMIC_INIT(1),
-	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+	.mmap_sem		= MM_LOCK_INITIALIZER(efi_mm.mmap_sem),
 	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
 	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
 	.cpu_bitmap		= { [BITS_TO_LONGS(NR_CPUS)] = 0},
diff --git include/linux/mm_lock.h include/linux/mm_lock.h
index 8ed92ebe58a1..a4d60bd56899 100644
--- include/linux/mm_lock.h
+++ include/linux/mm_lock.h
@@ -2,17 +2,26 @@
 #define _LINUX_MM_LOCK_H
 
 #include <linux/sched.h>
-
-static inline void mm_init_lock(struct mm_struct *mm)
-{
-	init_rwsem(&mm->mmap_sem);
-}
+#include <linux/lockdep.h>
 
 #ifdef CONFIG_MM_LOCK_RWSEM_INLINE
 
+#define MM_LOCK_INITIALIZER __RWSEM_INITIALIZER
 #define MM_COARSE_LOCK_RANGE_INITIALIZER {}
 
+static inline void mm_init_lock(struct mm_struct *mm)
+{
+       init_rwsem(&mm->mmap_sem);
+}
+
 static inline void mm_init_coarse_lock_range(struct mm_lock_range *range) {}
+static inline void mm_init_lock_range(struct mm_lock_range *range,
+		unsigned long start, unsigned long end) {}
+
+static inline bool mm_range_is_coarse(struct mm_lock_range *range)
+{
+	return true;
+}
 
 static inline void mm_write_range_lock(struct mm_struct *mm,
 				       struct mm_lock_range *range)
@@ -86,15 +95,80 @@ static inline struct mm_lock_range *mm_coarse_lock_range(void)
 	return NULL;
 }
 
-#else /* CONFIG_MM_LOCK_RWSEM_CHECKED */
+#else	/* !CONFIG_MM_LOCK_RWSEM_INLINE */
+
+#ifdef CONFIG_MM_LOCK_RWSEM_CHECKED
 
+#define MM_LOCK_INITIALIZER __RWSEM_INITIALIZER
 #define MM_COARSE_LOCK_RANGE_INITIALIZER { .mm = NULL }
 
+static inline void mm_init_lock(struct mm_struct *mm)
+{
+       init_rwsem(&mm->mmap_sem);
+}
+
 static inline void mm_init_coarse_lock_range(struct mm_lock_range *range)
 {
 	range->mm = NULL;
 }
 
+static inline void mm_init_lock_range(struct mm_lock_range *range,
+		unsigned long start, unsigned long end) {
+	mm_init_coarse_lock_range(range);
+}
+
+static inline bool mm_range_is_coarse(struct mm_lock_range *range)
+{
+	return true;
+}
+
+#else	/* CONFIG_MM_LOCK_RANGE */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define __DEP_MAP_MM_LOCK_INITIALIZER(lockname)		\
+	.dep_map = { .name = #lockname },
+#else
+#define __DEP_MAP_MM_LOCK_INITIALIZER(lockname)
+#endif
+
+#define MM_LOCK_INITIALIZER(name) {			\
+	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
+	.rb_root = RB_ROOT,				\
+	__DEP_MAP_MM_LOCK_INITIALIZER(name)		\
+}
+
+#define MM_COARSE_LOCK_RANGE_INITIALIZER {		\
+	.start = 0,					\
+	.end = ~0UL,					\
+}
+
+static inline void mm_init_lock(struct mm_struct *mm)
+{
+	static struct lock_class_key __key;
+
+	mutex_init(&mm->mmap_sem.mutex);
+	mm->mmap_sem.rb_root = RB_ROOT;
+	lockdep_init_map(&mm->mmap_sem.dep_map, "&mm->mmap_sem", &__key, 0);
+}
+
+static inline void mm_init_lock_range(struct mm_lock_range *range,
+		unsigned long start, unsigned long end) {
+	range->start = start;
+	range->end = end;
+}
+
+static inline void mm_init_coarse_lock_range(struct mm_lock_range *range)
+{
+	mm_init_lock_range(range, 0, ~0UL);
+}
+
+static inline bool mm_range_is_coarse(struct mm_lock_range *range)
+{
+	return range->start == 0 && range->end == ~0UL;
+}
+
+#endif	/* CONFIG_MM_LOCK_RANGE */
+
 extern void mm_write_range_lock(struct mm_struct *mm,
 				struct mm_lock_range *range);
 #ifdef CONFIG_LOCKDEP
@@ -129,11 +203,11 @@ static inline struct mm_lock_range *mm_coarse_lock_range(void)
 	return &current->mm_coarse_lock_range;
 }
 
-#endif
+#endif	/* !CONFIG_MM_LOCK_RWSEM_INLINE */
 
 static inline void mm_read_release(struct mm_struct *mm, unsigned long ip)
 {
-	rwsem_release(&mm->mmap_sem.dep_map, ip);
+	lock_release(&mm->mmap_sem.dep_map, ip);
 }
 
 static inline void mm_write_lock(struct mm_struct *mm)
@@ -183,7 +257,13 @@ static inline void mm_read_unlock(struct mm_struct *mm)
 
 static inline bool mm_is_locked(struct mm_struct *mm)
 {
+#ifndef CONFIG_MM_LOCK_RANGE
 	return rwsem_is_locked(&mm->mmap_sem) != 0;
+#elseif defined(CONFIG_LOCKDEP)
+	return lockdep_is_held(&mm->mmap_sem);	/* Close enough for asserts */
+#else
+	return true;
+#endif
 }
 
 #endif /* _LINUX_MM_LOCK_H */
diff --git include/linux/mm_types.h include/linux/mm_types.h
index 270aa8fd2800..941610c906b3 100644
--- include/linux/mm_types.h
+++ include/linux/mm_types.h
@@ -283,6 +283,21 @@ struct vm_userfaultfd_ctx {
 struct vm_userfaultfd_ctx {};
 #endif /* CONFIG_USERFAULTFD */
 
+/*
+ * struct mm_lock stores locked address ranges for a given mm,
+ * implementing a fine-grained replacement for the mmap_sem rwsem.
+ */
+#ifdef CONFIG_MM_LOCK_RANGE
+struct mm_lock {
+	struct mutex mutex;
+	struct rb_root rb_root;
+	unsigned long seq;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map dep_map;
+#endif
+};
+#endif
+
 /*
  * This struct defines a memory VMM memory area. There is one of these
  * per VM-area/task.  A VM area is any part of the process virtual memory
@@ -426,7 +441,12 @@ struct mm_struct {
 		spinlock_t page_table_lock; /* Protects page tables and some
 					     * counters
 					     */
+
+#ifndef CONFIG_MM_LOCK_RANGE
 		struct rw_semaphore mmap_sem;
+#else
+		struct mm_lock mmap_sem;
+#endif
 
 		struct list_head mmlist; /* List of maybe swapped mm's.	These
 					  * are globally strung together off
diff --git include/linux/mm_types_task.h include/linux/mm_types_task.h
index d98c2a2293c1..e5652fe6a53c 100644
--- include/linux/mm_types_task.h
+++ include/linux/mm_types_task.h
@@ -12,6 +12,7 @@
 #include <linux/threads.h>
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
+#include <linux/rbtree.h>
 
 #include <asm/page.h>
 
@@ -100,6 +101,20 @@ struct mm_lock_range {
 #ifdef CONFIG_MM_LOCK_RWSEM_CHECKED
 	struct mm_struct *mm;
 #endif
+#ifdef CONFIG_MM_LOCK_RANGE
+	/* First cache line - used in insert / remove / iter */
+	struct rb_node rb;
+	long flags_count;
+	unsigned long start;		/* First address of the range. */
+	unsigned long end;		/* First address after the range. */
+	struct {
+		unsigned long read_end;	  /* Largest end in reader nodes. */
+		unsigned long write_end;  /* Largest end in writer nodes. */
+	} __subtree;			/* Subtree augmented information. */
+	/* Second cache line - used in wait and wake. */
+	unsigned long seq;		/* Killable wait sequence number. */
+	struct task_struct *task;	/* Task trying to lock this range. */
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_TASK_H */
diff --git mm/Kconfig mm/Kconfig
index 574fb51789a5..3273ddb5839f 100644
--- mm/Kconfig
+++ mm/Kconfig
@@ -741,7 +741,7 @@ config MAPPING_DIRTY_HELPERS
 
 choice
 	prompt "MM lock implementation (mmap_sem)"
-	default MM_LOCK_RWSEM_CHECKED
+	default MM_LOCK_RANGE
 
 config MM_LOCK_RWSEM_INLINE
 	bool "rwsem, inline"
@@ -755,6 +755,13 @@ config MM_LOCK_RWSEM_CHECKED
 	  This option implements the MM lock using a read-write semaphore,
 	  ignoring the passed address range but checking its validity.
 
+config MM_LOCK_RANGE
+	bool "range lock"
+	help
+	  This option implements the MM lock as a read-write range lock,
+	  thus avoiding false conflicts between operations that operate
+	  on non-overlapping address ranges.
+
 endchoice
 
 endmenu
diff --git mm/Makefile mm/Makefile
index 9f46376c6407..71197fc20eda 100644
--- mm/Makefile
+++ mm/Makefile
@@ -109,3 +109,4 @@ obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
 obj-$(CONFIG_MM_LOCK_RWSEM_CHECKED) += mm_lock_rwsem_checked.o
+obj-$(CONFIG_MM_LOCK_RANGE) += mm_lock_range.o
diff --git mm/init-mm.c mm/init-mm.c
index 19603302a77f..0ba8ba5c07f4 100644
--- mm/init-mm.c
+++ mm/init-mm.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/mm_types.h>
+#include <linux/mm_lock.h>
 #include <linux/rbtree.h>
 #include <linux/rwsem.h>
 #include <linux/spinlock.h>
@@ -31,7 +32,7 @@ struct mm_struct init_mm = {
 	.pgd		= swapper_pg_dir,
 	.mm_users	= ATOMIC_INIT(2),
 	.mm_count	= ATOMIC_INIT(1),
-	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
+	.mmap_sem	= MM_LOCK_INITIALIZER(init_mm.mmap_sem),
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
 	.arg_lock	=  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
 	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
diff --git mm/mm_lock_range.c mm/mm_lock_range.c
new file mode 100644
index 000000000000..da3c70e0809a
--- /dev/null
+++ mm/mm_lock_range.c
@@ -0,0 +1,691 @@
+#include <linux/mm_lock.h>
+#include <linux/rbtree_augmented.h>
+#include <linux/mutex.h>
+#include <linux/lockdep.h>
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/wake_q.h>
+
+/* range->flags_count definitions */
+#define MM_LOCK_RANGE_WRITE 1
+#define MM_LOCK_RANGE_COUNT_ONE 2
+
+static inline bool rbcompute(struct mm_lock_range *range, bool exit)
+{
+	struct mm_lock_range *child;
+	unsigned long subtree_read_end = range->end, subtree_write_end = 0;
+	if (range->flags_count & MM_LOCK_RANGE_WRITE) {
+		subtree_read_end = 0;
+		subtree_write_end = range->end;
+	}
+	if (range->rb.rb_left) {
+		child = rb_entry(range->rb.rb_left, struct mm_lock_range, rb);
+		if (child->__subtree.read_end > subtree_read_end)
+			subtree_read_end = child->__subtree.read_end;
+		if (child->__subtree.write_end > subtree_write_end)
+			subtree_write_end = child->__subtree.write_end;
+	}
+	if (range->rb.rb_right) {
+		child = rb_entry(range->rb.rb_right, struct mm_lock_range, rb);
+		if (child->__subtree.read_end > subtree_read_end)
+			subtree_read_end = child->__subtree.read_end;
+		if (child->__subtree.write_end > subtree_write_end)
+			subtree_write_end = child->__subtree.write_end;
+	}
+	if (exit && range->__subtree.read_end == subtree_read_end &&
+		range->__subtree.write_end == subtree_write_end)
+		return true;
+	range->__subtree.read_end = subtree_read_end;
+	range->__subtree.write_end = subtree_write_end;
+	return false;
+}
+
+RB_DECLARE_CALLBACKS(static, augment, struct mm_lock_range, rb,
+		     __subtree, rbcompute);
+
+static void insert_read(struct mm_lock_range *range, struct rb_root *root)
+{
+	struct rb_node **link = &root->rb_node, *rb_parent = NULL;
+	unsigned long start = range->start, end = range->end;
+	struct mm_lock_range *parent;
+
+	while (*link) {
+		rb_parent = *link;
+		parent = rb_entry(rb_parent, struct mm_lock_range, rb);
+		if (parent->__subtree.read_end < end)
+			parent->__subtree.read_end = end;
+		if (start < parent->start)
+			link = &parent->rb.rb_left;
+		else
+			link = &parent->rb.rb_right;
+	}
+
+	range->__subtree.read_end = end;
+	range->__subtree.write_end = 0;
+	rb_link_node(&range->rb, rb_parent, link);
+	rb_insert_augmented(&range->rb, root, &augment);
+}
+
+static void insert_write(struct mm_lock_range *range, struct rb_root *root)
+{
+	struct rb_node **link = &root->rb_node, *rb_parent = NULL;
+	unsigned long start = range->start, end = range->end;
+	struct mm_lock_range *parent;
+
+	while (*link) {
+		rb_parent = *link;
+		parent = rb_entry(rb_parent, struct mm_lock_range, rb);
+		if (parent->__subtree.write_end < end)
+			parent->__subtree.write_end = end;
+		if (start < parent->start)
+			link = &parent->rb.rb_left;
+		else
+			link = &parent->rb.rb_right;
+	}
+
+	range->__subtree.read_end = 0;
+	range->__subtree.write_end = end;
+	rb_link_node(&range->rb, rb_parent, link);
+	rb_insert_augmented(&range->rb, root, &augment);
+}
+
+static void remove(struct mm_lock_range *range, struct rb_root *root)
+{
+	rb_erase_augmented(&range->rb, root, &augment);
+}
+
+/*
+ * Iterate over ranges intersecting [start;end)
+ *
+ * Note that a range intersects [start;end) iff:
+ *   Cond1: range->start < end
+ * and
+ *   Cond2: start < range->end
+ */
+
+static struct mm_lock_range *
+subtree_search(struct mm_lock_range *range,
+	       unsigned long start, unsigned long end)
+{
+	while (true) {
+		/*
+		 * Loop invariant: start < range->__subtree.read_end
+		 *              or start < range->__subtree.write_end
+		 * (Cond2 is satisfied by one of the subtree ranges)
+		 */
+		if (range->rb.rb_left) {
+			struct mm_lock_range *left = rb_entry(
+				range->rb.rb_left, struct mm_lock_range, rb);
+			if (start < left->__subtree.read_end ||
+			    start < left->__subtree.write_end) {
+				/*
+				 * Some ranges in left subtree satisfy Cond2.
+				 * Iterate to find the leftmost such range R.
+				 * If it also satisfies Cond1, that's the
+				 * match we are looking for. Otherwise, there
+				 * is no matching interval as ranges to the
+				 * right of R can't satisfy Cond1 either.
+				 */
+				range = left;
+				continue;
+			}
+		}
+		if (range->start < end) {		/* Cond1 */
+			if (start < range->end)		/* Cond2 */
+				return range;	/* range is leftmost match */
+			if (range->rb.rb_right) {
+				range = rb_entry(range->rb.rb_right,
+						 struct mm_lock_range, rb);
+				if (start < range->__subtree.read_end ||
+				    start < range->__subtree.write_end)
+					continue;
+			}
+		}
+		return NULL;	/* No match */
+	}
+}
+
+static struct mm_lock_range *
+iter_first(struct rb_root *root, unsigned long start, unsigned long end)
+{
+	struct mm_lock_range *range;
+
+	if (!root->rb_node)
+		return NULL;
+	range = rb_entry(root->rb_node, struct mm_lock_range, rb);
+	if (range->__subtree.read_end <= start &&
+	    range->__subtree.write_end <= start)
+		return NULL;
+	return subtree_search(range, start, end);
+}
+
+static struct mm_lock_range *
+iter_next(struct mm_lock_range *range, unsigned long start, unsigned long end)
+{
+	struct rb_node *rb = range->rb.rb_right, *prev;
+
+	while (true) {
+		/*
+		 * Loop invariants:
+		 *   Cond1: range->start < end
+		 *   rb == range->rb.rb_right
+		 *
+		 * First, search right subtree if suitable
+		 */
+		if (rb) {
+			struct mm_lock_range *right = rb_entry(
+				rb, struct mm_lock_range, rb);
+			if (start < right->__subtree.read_end ||
+			    start < right->__subtree.write_end)
+				return subtree_search(right, start, end);
+		}
+
+		/* Move up the tree until we come from a range's left child */
+		do {
+			rb = rb_parent(&range->rb);
+			if (!rb)
+				return NULL;
+			prev = &range->rb;
+			range = rb_entry(rb, struct mm_lock_range, rb);
+			rb = range->rb.rb_right;
+		} while (prev == rb);
+
+		/* Check if the range intersects [start;end) */
+		if (end <= range->start)		/* !Cond1 */
+			return NULL;
+		else if (start < range->end)		/* Cond2 */
+			return range;
+	}
+}
+
+#define FOR_EACH_RANGE(mm, start, end, tmp)				\
+for (tmp = iter_first(&mm->mmap_sem.rb_root, start, end); tmp;		\
+     tmp = iter_next(tmp, start, end))
+
+static struct mm_lock_range *
+subtree_search_read(struct mm_lock_range *range,
+		    unsigned long start, unsigned long end)
+{
+	while (true) {
+		/*
+		 * Loop invariant: start < range->__subtree.read_end
+		 * (Cond2 is satisfied by one of the subtree ranges)
+		 */
+		if (range->rb.rb_left) {
+			struct mm_lock_range *left = rb_entry(
+				range->rb.rb_left, struct mm_lock_range, rb);
+			if (start < left->__subtree.read_end) {
+				/*
+				 * Some ranges in left subtree satisfy Cond2.
+				 * Iterate to find the leftmost such range R.
+				 * If it also satisfies Cond1, that's the
+				 * match we are looking for. Otherwise, there
+				 * is no matching interval as ranges to the
+				 * right of R can't satisfy Cond1 either.
+				 */
+				range = left;
+				continue;
+			}
+		}
+		if (range->start < end) {		/* Cond1 */
+			if (start < range->end &&	/* Cond2 */
+			    !(range->flags_count & MM_LOCK_RANGE_WRITE))
+				return range;	/* range is leftmost match */
+			if (range->rb.rb_right) {
+				range = rb_entry(range->rb.rb_right,
+						 struct mm_lock_range, rb);
+				if (start < range->__subtree.read_end)
+					continue;
+			}
+		}
+		return NULL;	/* No match */
+	}
+}
+
+static struct mm_lock_range *
+iter_first_read(struct rb_root *root, unsigned long start, unsigned long end)
+{
+	struct mm_lock_range *range;
+
+	if (!root->rb_node)
+		return NULL;
+	range = rb_entry(root->rb_node, struct mm_lock_range, rb);
+	if (range->__subtree.read_end <= start)
+		return NULL;
+	return subtree_search_read(range, start, end);
+}
+
+static struct mm_lock_range *
+iter_next_read(struct mm_lock_range *range,
+	       unsigned long start, unsigned long end)
+{
+	struct rb_node *rb = range->rb.rb_right, *prev;
+
+	while (true) {
+		/*
+		 * Loop invariants:
+		 *   Cond1: range->start < end
+		 *   rb == range->rb.rb_right
+		 *
+		 * First, search right subtree if suitable
+		 */
+		if (rb) {
+			struct mm_lock_range *right = rb_entry(
+				rb, struct mm_lock_range, rb);
+			if (start < right->__subtree.read_end)
+				return subtree_search_read(right, start, end);
+		}
+
+		/* Move up the tree until we come from a range's left child */
+		do {
+			rb = rb_parent(&range->rb);
+			if (!rb)
+				return NULL;
+			prev = &range->rb;
+			range = rb_entry(rb, struct mm_lock_range, rb);
+			rb = range->rb.rb_right;
+		} while (prev == rb);
+
+		/* Check if the range intersects [start;end) */
+		if (end <= range->start)		/* !Cond1 */
+			return NULL;
+		else if (start < range->end &&		/* Cond2 */
+			 !(range->flags_count & MM_LOCK_RANGE_WRITE))
+			return range;
+	}
+}
+
+#define FOR_EACH_RANGE_READ(mm, start, end, tmp)			\
+for (tmp = iter_first_read(&mm->mmap_sem.rb_root, start, end); tmp;	\
+     tmp = iter_next_read(tmp, start, end))
+
+static struct mm_lock_range *
+subtree_search_write(struct mm_lock_range *range,
+		     unsigned long start, unsigned long end)
+{
+	while (true) {
+		/*
+		 * Loop invariant: start < range->__subtree.write_end
+		 * (Cond2 is satisfied by one of the subtree ranges)
+		 */
+		if (range->rb.rb_left) {
+			struct mm_lock_range *left = rb_entry(
+				range->rb.rb_left, struct mm_lock_range, rb);
+			if (start < left->__subtree.write_end) {
+				/*
+				 * Some ranges in left subtree satisfy Cond2.
+				 * Iterate to find the leftmost such range R.
+				 * If it also satisfies Cond1, that's the
+				 * match we are looking for. Otherwise, there
+				 * is no matching interval as ranges to the
+				 * right of R can't satisfy Cond1 either.
+				 */
+				range = left;
+				continue;
+			}
+		}
+		if (range->start < end) {		/* Cond1 */
+			if (start < range->end &&	/* Cond2 */
+			    range->flags_count & MM_LOCK_RANGE_WRITE)
+				return range;	/* range is leftmost match */
+			if (range->rb.rb_right) {
+				range = rb_entry(range->rb.rb_right,
+						 struct mm_lock_range, rb);
+				if (start < range->__subtree.write_end)
+					continue;
+			}
+		}
+		return NULL;	/* No match */
+	}
+}
+
+static struct mm_lock_range *
+iter_first_write(struct rb_root *root, unsigned long start, unsigned long end)
+{
+	struct mm_lock_range *range;
+
+	if (!root->rb_node)
+		return NULL;
+	range = rb_entry(root->rb_node, struct mm_lock_range, rb);
+	if (range->__subtree.write_end <= start)
+		return NULL;
+	return subtree_search_write(range, start, end);
+}
+
+static struct mm_lock_range *
+iter_next_write(struct mm_lock_range *range,
+		unsigned long start, unsigned long end)
+{
+	struct rb_node *rb = range->rb.rb_right, *prev;
+
+	while (true) {
+		/*
+		 * Loop invariants:
+		 *   Cond1: range->start < end
+		 *   rb == range->rb.rb_right
+		 *
+		 * First, search right subtree if suitable
+		 */
+		if (rb) {
+			struct mm_lock_range *right = rb_entry(
+				rb, struct mm_lock_range, rb);
+			if (start < right->__subtree.write_end)
+				return subtree_search_write(right, start, end);
+		}
+
+		/* Move up the tree until we come from a range's left child */
+		do {
+			rb = rb_parent(&range->rb);
+			if (!rb)
+				return NULL;
+			prev = &range->rb;
+			range = rb_entry(rb, struct mm_lock_range, rb);
+			rb = range->rb.rb_right;
+		} while (prev == rb);
+
+		/* Check if the range intersects [start;end) */
+		if (end <= range->start)		/* !Cond1 */
+			return NULL;
+		else if (start < range->end &&		/* Cond2 */
+			 range->flags_count & MM_LOCK_RANGE_WRITE)
+			return range;
+	}
+}
+
+#define FOR_EACH_RANGE_WRITE(mm, start, end, tmp)			\
+for (tmp = iter_first_write(&mm->mmap_sem.rb_root, start, end); tmp;	\
+     tmp = iter_next_write(tmp, start, end))
+
+static bool queue_read(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	struct mm_lock_range *conflict;
+	long flags_count = 0;
+
+	FOR_EACH_RANGE_WRITE(mm, range->start, range->end, conflict)
+		flags_count -= MM_LOCK_RANGE_COUNT_ONE;
+	range->flags_count = flags_count;
+	insert_read(range, &mm->mmap_sem.rb_root);
+	return flags_count < 0;
+}
+
+static bool queue_write(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	struct mm_lock_range *conflict;
+	long flags_count = MM_LOCK_RANGE_WRITE;
+
+	FOR_EACH_RANGE(mm, range->start, range->end, conflict)
+		flags_count -= MM_LOCK_RANGE_COUNT_ONE;
+	range->flags_count = flags_count;
+	insert_write(range, &mm->mmap_sem.rb_root);
+	return flags_count < 0;
+}
+
+static inline void prepare_wait(struct mm_lock_range *range, unsigned long seq)
+{
+	range->seq = seq;
+	range->task = current;
+}
+
+static void wait(struct mm_lock_range *range)
+{
+	while (true) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (range->flags_count >= 0)
+			break;
+		schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+}
+
+static bool wait_killable(struct mm_lock_range *range)
+{
+	while (true) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (range->flags_count >= 0) {
+			__set_current_state(TASK_RUNNING);
+			return true;
+		}
+		if (signal_pending(current)) {
+			__set_current_state(TASK_RUNNING);
+			return false;
+		}
+		schedule();
+	}
+}
+
+static inline void unlock_conflict(struct mm_lock_range *range,
+				   struct wake_q_head *wake_q)
+{
+	if ((range->flags_count += MM_LOCK_RANGE_COUNT_ONE) >= 0)
+		wake_q_add(wake_q, range->task);
+}
+
+void mm_write_range_lock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool contended;
+
+	lock_acquire_exclusive(&mm->mmap_sem.dep_map, 0, 0, NULL, _RET_IP_);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	if ((contended = queue_write(mm, range)))
+		prepare_wait(range, mm->mmap_sem.seq);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	if (contended) {
+		lock_contended(&mm->mmap_sem.dep_map, _RET_IP_);
+		wait(range);
+	}
+	lock_acquired(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_write_range_lock);
+
+#ifdef CONFIG_LOCKDEP
+void mm_write_range_lock_nested(struct mm_struct *mm,
+				struct mm_lock_range *range, int subclass)
+{
+	bool contended;
+
+	lock_acquire_exclusive(&mm->mmap_sem.dep_map, subclass, 0, NULL,
+			       _RET_IP_);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	if ((contended = queue_write(mm, range)))
+		prepare_wait(range, mm->mmap_sem.seq);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	if (contended) {
+		lock_contended(&mm->mmap_sem.dep_map, _RET_IP_);
+		wait(range);
+	}
+	lock_acquired(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_write_range_lock_nested);
+#endif
+
+int mm_write_range_lock_killable(struct mm_struct *mm,
+				 struct mm_lock_range *range)
+{
+	bool contended;
+
+	lock_acquire_exclusive(&mm->mmap_sem.dep_map, 0, 0, NULL, _RET_IP_);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	if ((contended = queue_write(mm, range)))
+		prepare_wait(range, ++(mm->mmap_sem.seq));
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	if (contended) {
+		lock_contended(&mm->mmap_sem.dep_map, _RET_IP_);
+		if (!wait_killable(range)) {
+			struct mm_lock_range *conflict;
+			DEFINE_WAKE_Q(wake_q);
+
+			mutex_lock(&mm->mmap_sem.mutex);
+			remove(range, &mm->mmap_sem.rb_root);
+			FOR_EACH_RANGE(mm, range->start, range->end, conflict)
+				if (conflict->flags_count < 0 &&
+				    conflict->seq - range->seq <= (~0UL >> 1))
+					unlock_conflict(conflict, &wake_q);
+			mutex_unlock(&mm->mmap_sem.mutex);
+
+			wake_up_q(&wake_q);
+			lock_release(&mm->mmap_sem.dep_map, _RET_IP_);
+			return -EINTR;
+		}
+	}
+	lock_acquired(&mm->mmap_sem.dep_map, _RET_IP_);
+	return 0;
+}
+EXPORT_SYMBOL(mm_write_range_lock_killable);
+
+bool mm_write_range_trylock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool locked = false;
+
+	if (!mutex_trylock(&mm->mmap_sem.mutex))
+		goto exit;
+	if (iter_first(&mm->mmap_sem.rb_root, range->start, range->end))
+		goto unlock;
+	lock_acquire_exclusive(&mm->mmap_sem.dep_map, 0, 1, NULL,
+			       _RET_IP_);
+	range->flags_count = MM_LOCK_RANGE_WRITE;
+	insert_write(range, &mm->mmap_sem.rb_root);
+	locked = true;
+unlock:
+	mutex_unlock(&mm->mmap_sem.mutex);
+exit:
+	return locked;
+}
+EXPORT_SYMBOL(mm_write_range_trylock);
+
+void mm_write_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	struct mm_lock_range *conflict;
+	DEFINE_WAKE_Q(wake_q);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	remove(range, &mm->mmap_sem.rb_root);
+        FOR_EACH_RANGE(mm, range->start, range->end, conflict)
+		unlock_conflict(conflict, &wake_q);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	wake_up_q(&wake_q);
+	lock_release(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_write_range_unlock);
+
+void mm_downgrade_write_range_lock(struct mm_struct *mm,
+				   struct mm_lock_range *range)
+{
+	struct mm_lock_range *conflict;
+	DEFINE_WAKE_Q(wake_q);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+        FOR_EACH_RANGE_READ(mm, range->start, range->end, conflict)
+		unlock_conflict(conflict, &wake_q);
+	range->flags_count -= MM_LOCK_RANGE_WRITE;
+	augment_propagate(&range->rb, NULL);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	wake_up_q(&wake_q);
+	lock_downgrade(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_downgrade_write_range_lock);
+
+void mm_read_range_lock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool contended;
+
+	lock_acquire_shared(&mm->mmap_sem.dep_map, 0, 0, NULL, _RET_IP_);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	if ((contended = queue_read(mm, range)))
+		prepare_wait(range, mm->mmap_sem.seq);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	if (contended) {
+		lock_contended(&mm->mmap_sem.dep_map, _RET_IP_);
+		wait(range);
+	}
+	lock_acquired(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_read_range_lock);
+
+int mm_read_range_lock_killable(struct mm_struct *mm,
+				struct mm_lock_range *range)
+{
+	bool contended;
+
+	lock_acquire_shared(&mm->mmap_sem.dep_map, 0, 0, NULL, _RET_IP_);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	if ((contended = queue_read(mm, range)))
+		prepare_wait(range, ++(mm->mmap_sem.seq));
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	if (contended) {
+		lock_contended(&mm->mmap_sem.dep_map, _RET_IP_);
+		if (!wait_killable(range)) {
+			struct mm_lock_range *conflict;
+			DEFINE_WAKE_Q(wake_q);
+
+			mutex_lock(&mm->mmap_sem.mutex);
+			remove(range, &mm->mmap_sem.rb_root);
+			FOR_EACH_RANGE_WRITE(mm, range->start, range->end,
+					     conflict)
+				if (conflict->flags_count < 0 &&
+				    conflict->seq - range->seq <= (~0UL >> 1))
+					unlock_conflict(conflict, &wake_q);
+			mutex_unlock(&mm->mmap_sem.mutex);
+
+			wake_up_q(&wake_q);
+			lock_release(&mm->mmap_sem.dep_map, _RET_IP_);
+			return -EINTR;
+		}
+	}
+	lock_acquired(&mm->mmap_sem.dep_map, _RET_IP_);
+	return 0;
+}
+EXPORT_SYMBOL(mm_read_range_lock_killable);
+
+bool mm_read_range_trylock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	bool locked = false;
+
+	if (!mutex_trylock(&mm->mmap_sem.mutex))
+		goto exit;
+	if (iter_first_write(&mm->mmap_sem.rb_root, range->start, range->end))
+		goto unlock;
+	lock_acquire_shared(&mm->mmap_sem.dep_map, 0, 1, NULL, _RET_IP_);
+	range->flags_count = 0;
+	insert_read(range, &mm->mmap_sem.rb_root);
+	locked = true;
+unlock:
+	mutex_unlock(&mm->mmap_sem.mutex);
+exit:
+	return locked;
+}
+EXPORT_SYMBOL(mm_read_range_trylock);
+
+void mm_read_range_unlock_non_owner(struct mm_struct *mm,
+				    struct mm_lock_range *range)
+{
+	struct mm_lock_range *conflict;
+	DEFINE_WAKE_Q(wake_q);
+
+	mutex_lock(&mm->mmap_sem.mutex);
+	remove(range, &mm->mmap_sem.rb_root);
+        FOR_EACH_RANGE_WRITE(mm, range->start, range->end, conflict)
+		unlock_conflict(conflict, &wake_q);
+	mutex_unlock(&mm->mmap_sem.mutex);
+
+	wake_up_q(&wake_q);
+}
+EXPORT_SYMBOL(mm_read_range_unlock_non_owner);
+
+void mm_read_range_unlock(struct mm_struct *mm, struct mm_lock_range *range)
+{
+	mm_read_range_unlock_non_owner(mm, range);
+	lock_release(&mm->mmap_sem.dep_map, _RET_IP_);
+}
+EXPORT_SYMBOL(mm_read_range_unlock);
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 07/24] mm/memory: add range field to struct vm_fault
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (5 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 06/24] MM locking API: implement fine grained range locks Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 08/24] mm/memory: allow specifying MM lock range to handle_mm_fault() Michel Lespinasse
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Add a range field to struct vm_fault. This carries the range that was
locked for the given fault.

Faults that release the mmap_sem should pass the specified range.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/mm.h | 1 +
 mm/hugetlb.c       | 1 +
 mm/khugepaged.c    | 1 +
 mm/memory.c        | 1 +
 4 files changed, 4 insertions(+)

diff --git include/linux/mm.h include/linux/mm.h
index 052f423d7f67..a1c9a0aa898b 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -451,6 +451,7 @@ struct vm_fault {
 					 * page table to avoid allocation from
 					 * atomic context.
 					 */
+	struct mm_lock_range *range;	/* MM read lock range. */
 };
 
 /* page entry size for vm->huge_fault() */
diff --git mm/hugetlb.c mm/hugetlb.c
index dd8737a94bec..662f34b6c869 100644
--- mm/hugetlb.c
+++ mm/hugetlb.c
@@ -3831,6 +3831,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 				.vma = vma,
 				.address = haddr,
 				.flags = flags,
+				.range = mm_coarse_lock_range(),
 				/*
 				 * Hard to debug if it ends up being
 				 * used by a callee that assumes
diff --git mm/khugepaged.c mm/khugepaged.c
index 7ee8ae64824b..a7807bb0d631 100644
--- mm/khugepaged.c
+++ mm/khugepaged.c
@@ -900,6 +900,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 		.flags = FAULT_FLAG_ALLOW_RETRY,
 		.pmd = pmd,
 		.pgoff = linear_page_index(vma, address),
+		.range = mm_coarse_lock_range(),
 	};
 
 	/* we only decide to swapin, if there is enough young ptes */
diff --git mm/memory.c mm/memory.c
index 45b42fa02a2e..6cb3359f0857 100644
--- mm/memory.c
+++ mm/memory.c
@@ -4047,6 +4047,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		.flags = flags,
 		.pgoff = linear_page_index(vma, address),
 		.gfp_mask = __get_fault_gfp_mask(vma),
+		.range = mm_coarse_lock_range(),
 	};
 	unsigned int dirty = flags & FAULT_FLAG_WRITE;
 	struct mm_struct *mm = vma->vm_mm;
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 08/24] mm/memory: allow specifying MM lock range to handle_mm_fault()
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (6 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 07/24] mm/memory: add range field to struct vm_fault Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 09/24] do_swap_page: use the vmf->range field when dropping mmap_sem Michel Lespinasse
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change adds a new handle_mm_fault_range() function, which behaves
like handle_mm_fault() but specifies an explicit MM lock range.

handle_mm_fault() remains as an inline wrapper which passes the default
coarse locking range.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/hugetlb.h |  5 +++--
 include/linux/mm.h      | 11 +++++++++--
 mm/hugetlb.c            | 14 +++++++++-----
 mm/memory.c             | 16 +++++++++-------
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git include/linux/hugetlb.h include/linux/hugetlb.h
index 31d4920994b9..75992d78289e 100644
--- include/linux/hugetlb.h
+++ include/linux/hugetlb.h
@@ -88,7 +88,8 @@ int hugetlb_report_node_meminfo(int, char *);
 void hugetlb_show_meminfo(void);
 unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, unsigned int flags);
+			unsigned long address, unsigned int flags,
+			struct mm_lock_range *range);
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				struct vm_area_struct *dst_vma,
 				unsigned long dst_addr,
@@ -307,7 +308,7 @@ static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
 
 static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
-			unsigned int flags)
+			unsigned int flags, struct mm_lock_range *range)
 {
 	BUG();
 	return 0;
diff --git include/linux/mm.h include/linux/mm.h
index a1c9a0aa898b..1b6b022064b4 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -1460,8 +1460,15 @@ int generic_error_remove_page(struct address_space *mapping, struct page *page);
 int invalidate_inode_page(struct page *page);
 
 #ifdef CONFIG_MMU
-extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
-			unsigned long address, unsigned int flags);
+extern vm_fault_t handle_mm_fault_range(struct vm_area_struct *vma,
+			unsigned long address, unsigned int flags,
+			struct mm_lock_range *range);
+static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
+			unsigned long address, unsigned int flags)
+{
+	return handle_mm_fault_range(vma, address, flags,
+				     mm_coarse_lock_range());
+}
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			    unsigned long address, unsigned int fault_flags,
 			    bool *unlocked);
diff --git mm/hugetlb.c mm/hugetlb.c
index 662f34b6c869..9d6fe9f291a7 100644
--- mm/hugetlb.c
+++ mm/hugetlb.c
@@ -3788,7 +3788,8 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
 static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			struct vm_area_struct *vma,
 			struct address_space *mapping, pgoff_t idx,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long address, pte_t *ptep, unsigned int flags,
+			struct mm_lock_range *range)
 {
 	struct hstate *h = hstate_vma(vma);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
@@ -3831,7 +3832,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 				.vma = vma,
 				.address = haddr,
 				.flags = flags,
-				.range = mm_coarse_lock_range(),
+				.range = range,
 				/*
 				 * Hard to debug if it ends up being
 				 * used by a callee that assumes
@@ -3997,7 +3998,8 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx)
 #endif
 
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
-			unsigned long address, unsigned int flags)
+			unsigned long address, unsigned int flags,
+			struct mm_lock_range *range)
 {
 	pte_t *ptep, entry;
 	spinlock_t *ptl;
@@ -4039,7 +4041,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	entry = huge_ptep_get(ptep);
 	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags);
+		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
+				      flags, range);
 		goto out_mutex;
 	}
 
@@ -4348,7 +4351,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 						FAULT_FLAG_ALLOW_RETRY);
 				fault_flags |= FAULT_FLAG_TRIED;
 			}
-			ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
+			ret = hugetlb_fault(mm, vma, vaddr, fault_flags,
+					    mm_coarse_lock_range());
 			if (ret & VM_FAULT_ERROR) {
 				err = vm_fault_to_errno(ret, flags);
 				remainder = 0;
diff --git mm/memory.c mm/memory.c
index 6cb3359f0857..bc24a6bdaa06 100644
--- mm/memory.c
+++ mm/memory.c
@@ -4039,7 +4039,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
  * return value.  See filemap_fault() and __lock_page_or_retry().
  */
 static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
-		unsigned long address, unsigned int flags)
+		unsigned long address, unsigned int flags,
+		struct mm_lock_range *range)
 {
 	struct vm_fault vmf = {
 		.vma = vma,
@@ -4047,7 +4048,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 		.flags = flags,
 		.pgoff = linear_page_index(vma, address),
 		.gfp_mask = __get_fault_gfp_mask(vma),
-		.range = mm_coarse_lock_range(),
+		.range = range,
 	};
 	unsigned int dirty = flags & FAULT_FLAG_WRITE;
 	struct mm_struct *mm = vma->vm_mm;
@@ -4134,8 +4135,9 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
  * The mmap_sem may have been released depending on flags and our
  * return value.  See filemap_fault() and __lock_page_or_retry().
  */
-vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
-		unsigned int flags)
+vm_fault_t handle_mm_fault_range(struct vm_area_struct *vma,
+		unsigned long address, unsigned int flags,
+		struct mm_lock_range *range)
 {
 	vm_fault_t ret;
 
@@ -4160,9 +4162,9 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 		mem_cgroup_enter_user_fault();
 
 	if (unlikely(is_vm_hugetlb_page(vma)))
-		ret = hugetlb_fault(vma->vm_mm, vma, address, flags);
+		ret = hugetlb_fault(vma->vm_mm, vma, address, flags, range);
 	else
-		ret = __handle_mm_fault(vma, address, flags);
+		ret = __handle_mm_fault(vma, address, flags, range);
 
 	if (flags & FAULT_FLAG_USER) {
 		mem_cgroup_exit_user_fault();
@@ -4178,7 +4180,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(handle_mm_fault);
+EXPORT_SYMBOL_GPL(handle_mm_fault_range);
 
 #ifndef __PAGETABLE_P4D_FOLDED
 /*
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 09/24] do_swap_page: use the vmf->range field when dropping mmap_sem
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (7 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 08/24] mm/memory: allow specifying MM lock range to handle_mm_fault() Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 10/24] handle_userfault: " Michel Lespinasse
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change do_swap_page() and lock_page_or_retry() so that the proper range
will be released when swapping in.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/pagemap.h | 7 ++++---
 mm/filemap.c            | 6 +++---
 mm/memory.c             | 2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git include/linux/pagemap.h include/linux/pagemap.h
index 37a4d9e32cd3..93520477c481 100644
--- include/linux/pagemap.h
+++ include/linux/pagemap.h
@@ -458,7 +458,7 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
-				unsigned int flags);
+		unsigned int flags, struct mm_lock_range *range);
 extern void unlock_page(struct page *page);
 
 /*
@@ -501,10 +501,11 @@ static inline int lock_page_killable(struct page *page)
  * __lock_page_or_retry().
  */
 static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
-				     unsigned int flags)
+		unsigned int flags, struct mm_lock_range *range)
 {
 	might_sleep();
-	return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+	return trylock_page(page) || __lock_page_or_retry(page, mm, flags,
+							  range);
 }
 
 /*
diff --git mm/filemap.c mm/filemap.c
index eb6487065ca0..3afb5a3f0b9c 100644
--- mm/filemap.c
+++ mm/filemap.c
@@ -1406,7 +1406,7 @@ EXPORT_SYMBOL_GPL(__lock_page_killable);
  * with the page locked and the mmap_sem unperturbed.
  */
 int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
-			 unsigned int flags)
+			 unsigned int flags, struct mm_lock_range *range)
 {
 	if (flags & FAULT_FLAG_ALLOW_RETRY) {
 		/*
@@ -1416,7 +1416,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 		if (flags & FAULT_FLAG_RETRY_NOWAIT)
 			return 0;
 
-		mm_read_unlock(mm);
+		mm_read_range_unlock(mm, range);
 		if (flags & FAULT_FLAG_KILLABLE)
 			wait_on_page_locked_killable(page);
 		else
@@ -1428,7 +1428,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
 
 			ret = __lock_page_killable(page);
 			if (ret) {
-				mm_read_unlock(mm);
+				mm_read_range_unlock(mm, range);
 				return 0;
 			}
 		} else
diff --git mm/memory.c mm/memory.c
index bc24a6bdaa06..3da4ae504957 100644
--- mm/memory.c
+++ mm/memory.c
@@ -2964,7 +2964,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
+	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags, vmf->range);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 	if (!locked) {
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 10/24] handle_userfault: use the vmf->range field when dropping mmap_sem
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (8 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 09/24] do_swap_page: use the vmf->range field when dropping mmap_sem Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 11/24] x86 fault handler: merge bad_area() functions Michel Lespinasse
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change handle_userfault to drop the proper memory range
as indicated in the vmf.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git fs/userfaultfd.c fs/userfaultfd.c
index f38095a7ebcd..2b8ee3eaacd7 100644
--- fs/userfaultfd.c
+++ fs/userfaultfd.c
@@ -489,7 +489,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 		must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
 						       vmf->address,
 						       vmf->flags, reason);
-	mm_read_unlock(mm);
+	mm_read_range_unlock(mm, vmf->range);
 
 	if (likely(must_wait && !READ_ONCE(ctx->released) &&
 		   (return_to_userland ? !signal_pending(current) :
@@ -543,7 +543,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 			 * and there's no need to retake the mmap_sem
 			 * in such case.
 			 */
-			mm_read_lock(mm);
+			mm_read_range_lock(mm, vmf->range);
 			ret = VM_FAULT_NOPAGE;
 		}
 	}
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 11/24] x86 fault handler: merge bad_area() functions
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (9 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 10/24] handle_userfault: " Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 12/24] x86 fault handler: use an explicit MM lock range Michel Lespinasse
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This merges the bad_area(), bad_area_access_error() and the underlying
__bad_area() functions into one single unified function.

Passing a NULL vma triggers the prior bad_area() behavior, while
passing a non-NULL vma triggers the prior bad_area_access_error() behavior.

The control flow is very similar in all cases, and we now release the
mmap_sem read lock in one single place rather than 3.

Text size is reduced by 356 bytes here.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/fault.c | 54 ++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index a8ce9e160b72..adbd2b03fcf9 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -919,26 +919,6 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	__bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR);
 }
 
-static void
-__bad_area(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address, u32 pkey, int si_code)
-{
-	struct mm_struct *mm = current->mm;
-	/*
-	 * Something tried to access memory that isn't in our memory map..
-	 * Fix it, but check if it's kernel or user first..
-	 */
-	mm_read_unlock(mm);
-
-	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
-}
-
-static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
-{
-	__bad_area(regs, error_code, address, 0, SEGV_MAPERR);
-}
-
 static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 		struct vm_area_struct *vma)
 {
@@ -957,9 +937,15 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 }
 
 static noinline void
-bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
-		      unsigned long address, struct vm_area_struct *vma)
+bad_area(struct pt_regs *regs, unsigned long error_code,
+	 unsigned long address, struct vm_area_struct *vma)
 {
+	u32 pkey = 0;
+	int si_code = SEGV_MAPERR;
+
+	if (!vma)
+		goto unlock;
+
 	/*
 	 * This OSPKE check is not strictly necessary at runtime.
 	 * But, doing it this way allows compiler optimizations
@@ -986,12 +972,20 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
 		 * 6. T1   : reaches here, sees vma_pkey(vma)=5, when we really
 		 *	     faulted on a pte with its pkey=4.
 		 */
-		u32 pkey = vma_pkey(vma);
-
-		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+		pkey = vma_pkey(vma);
+		si_code = SEGV_PKUERR;
 	} else {
-		__bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+		si_code = SEGV_ACCERR;
 	}
+
+unlock:
+	/*
+	 * Something tried to access memory that isn't in our memory map..
+	 * Fix it, but check if it's kernel or user first..
+	 */
+	mm_read_unlock(current->mm);
+
+	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
 
 static void
@@ -1401,17 +1395,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
-		bad_area(regs, hw_error_code, address);
+		bad_area(regs, hw_error_code, address, NULL);
 		return;
 	}
 	if (likely(vma->vm_start <= address))
 		goto good_area;
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, hw_error_code, address);
+		bad_area(regs, hw_error_code, address, NULL);
 		return;
 	}
 	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, hw_error_code, address);
+		bad_area(regs, hw_error_code, address, NULL);
 		return;
 	}
 
@@ -1421,7 +1415,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 */
 good_area:
 	if (unlikely(access_error(hw_error_code, vma))) {
-		bad_area_access_error(regs, hw_error_code, address, vma);
+		bad_area(regs, hw_error_code, address, vma);
 		return;
 	}
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 12/24] x86 fault handler: use an explicit MM lock range
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (10 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 11/24] x86 fault handler: merge bad_area() functions Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 13/24] mm/memory: add prepare_mm_fault() function Michel Lespinasse
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Use an explicit memory range throughthe fault handler and any called functions.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/fault.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index adbd2b03fcf9..700da3cc3db9 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -938,7 +938,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 static noinline void
 bad_area(struct pt_regs *regs, unsigned long error_code,
-	 unsigned long address, struct vm_area_struct *vma)
+	 unsigned long address, struct vm_area_struct *vma,
+	 struct mm_lock_range *range)
 {
 	u32 pkey = 0;
 	int si_code = SEGV_MAPERR;
@@ -983,7 +984,7 @@ bad_area(struct pt_regs *regs, unsigned long error_code,
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	mm_read_unlock(current->mm);
+	mm_read_range_unlock(current->mm, range);
 
 	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -1277,6 +1278,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			unsigned long hw_error_code,
 			unsigned long address)
 {
+	struct mm_lock_range *range;
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
@@ -1361,6 +1363,8 @@ void do_user_addr_fault(struct pt_regs *regs,
 	}
 #endif
 
+	range = mm_coarse_lock_range();
+
 	/*
 	 * Kernel-mode access to the user address space should only occur
 	 * on well-defined single instructions listed in the exception
@@ -1373,7 +1377,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * 1. Failed to acquire mmap_sem, and
 	 * 2. The access did not originate in userspace.
 	 */
-	if (unlikely(!mm_read_trylock(mm))) {
+	if (unlikely(!mm_read_range_trylock(mm, range))) {
 		if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
 			/*
 			 * Fault from code in kernel from
@@ -1383,7 +1387,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			return;
 		}
 retry:
-		mm_read_lock(mm);
+		mm_read_range_lock(mm, range);
 	} else {
 		/*
 		 * The above down_read_trylock() might have succeeded in
@@ -1395,17 +1399,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
-		bad_area(regs, hw_error_code, address, NULL);
+		bad_area(regs, hw_error_code, address, NULL, range);
 		return;
 	}
 	if (likely(vma->vm_start <= address))
 		goto good_area;
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, hw_error_code, address, NULL);
+		bad_area(regs, hw_error_code, address, NULL, range);
 		return;
 	}
 	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, hw_error_code, address, NULL);
+		bad_area(regs, hw_error_code, address, NULL, range);
 		return;
 	}
 
@@ -1415,7 +1419,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 */
 good_area:
 	if (unlikely(access_error(hw_error_code, vma))) {
-		bad_area(regs, hw_error_code, address, vma);
+		bad_area(regs, hw_error_code, address, vma, range);
 		return;
 	}
 
@@ -1432,7 +1436,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * userland). The return to userland is identified whenever
 	 * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
 	 */
-	fault = handle_mm_fault(vma, address, flags);
+	fault = handle_mm_fault_range(vma, address, flags, range);
 	major |= fault & VM_FAULT_MAJOR;
 
 	/*
@@ -1458,7 +1462,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
-	mm_read_unlock(mm);
+	mm_read_range_unlock(mm, range);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		mm_fault_error(regs, hw_error_code, address, fault);
 		return;
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 13/24] mm/memory: add prepare_mm_fault() function
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (11 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 12/24] x86 fault handler: use an explicit MM lock range Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 14/24] mm/swap_state: disable swap vma readahead Michel Lespinasse
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Add a prepare_mm_fault() function, which may allocate an anon_vma if
required for the incoming fault.

This is because the anon_vma must be allocated in the vma of record,
while in the range locked case, the fault will operate on a pseudo-vma.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/mm.h | 14 ++++++++++++++
 mm/memory.c        | 26 ++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git include/linux/mm.h include/linux/mm.h
index 1b6b022064b4..43b7121ae005 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -1460,6 +1460,15 @@ int generic_error_remove_page(struct address_space *mapping, struct page *page);
 int invalidate_inode_page(struct page *page);
 
 #ifdef CONFIG_MMU
+extern vm_fault_t __prepare_mm_fault(struct vm_area_struct *vma,
+		unsigned int flags);
+static inline vm_fault_t prepare_mm_fault(struct vm_area_struct *vma,
+		unsigned int flags)
+{
+	if (likely(vma->anon_vma))
+		return 0;
+	return __prepare_mm_fault(vma, flags);
+}
 extern vm_fault_t handle_mm_fault_range(struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags,
 			struct mm_lock_range *range);
@@ -1477,6 +1486,11 @@ void unmap_mapping_pages(struct address_space *mapping,
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows);
 #else
+static inline vm_fault_t prepare_mm_fault(struct vm_area_struct *vma,
+		unsigned int flags)
+{
+	return 0;
+}
 static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags)
 {
diff --git mm/memory.c mm/memory.c
index 3da4ae504957..9d0b761833fe 100644
--- mm/memory.c
+++ mm/memory.c
@@ -4129,6 +4129,32 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	return handle_pte_fault(&vmf);
 }
 
+vm_fault_t __prepare_mm_fault(struct vm_area_struct *vma, unsigned int flags)
+{
+	vm_fault_t ret = 0;
+
+	if (vma_is_anonymous(vma) ||
+	    ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) ||
+	    (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_MAYSHARE))) {
+		if (flags & FAULT_FLAG_USER)
+			mem_cgroup_enter_user_fault();
+		if (unlikely(__anon_vma_prepare(vma)))
+			ret = VM_FAULT_OOM;
+		if (flags & FAULT_FLAG_USER) {
+			mem_cgroup_exit_user_fault();
+			/*
+			 * The task may have entered a memcg OOM situation but
+			 * if the allocation error was handled gracefully (no
+			 * VM_FAULT_OOM), there is no need to kill anything.
+			 * Just clean up the OOM state peacefully.
+			 */
+			if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
+				mem_cgroup_oom_synchronize(false);
+		}
+	}
+	return ret;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 14/24] mm/swap_state: disable swap vma readahead
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (12 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 13/24] mm/memory: add prepare_mm_fault() function Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 15/24] x86 fault handler: use a pseudo-vma when operating on anonymous vmas Michel Lespinasse
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change disables swap vma readahead. This is because swap_ra_info()
updates vma->swap_readahead_info, which is not feasible when operating
on pseudo-vmas.

This is a crude temporary solution. It may be possible to use a per-mm
swap_readahead_info instead, or if not, to explicitly fetch the vma of
record when updating the swap readahead statistics.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 mm/swap_state.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git mm/swap_state.c mm/swap_state.c
index 8e7ce9a9bc5e..c9cdfd9c785e 100644
--- mm/swap_state.c
+++ mm/swap_state.c
@@ -298,6 +298,12 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
 
 static inline bool swap_use_vma_readahead(void)
 {
+	/*
+	 * vma readahead overwrites vma->swap_readahead_info,
+	 * which requires some form of vma locking...
+	 */
+	return false;
+
 	return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap);
 }
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 15/24] x86 fault handler: use a pseudo-vma when operating on anonymous vmas.
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (13 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 14/24] mm/swap_state: disable swap vma readahead Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 16/24] MM locking API: add vma locking API Michel Lespinasse
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Update the fault handler to use a pseudo-vma when the original vma is
anonymous. This is in preparation to handling such faults with a fine
grained range lock in a later change.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/fault.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index 700da3cc3db9..52333272e14e 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -1279,7 +1279,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			unsigned long address)
 {
 	struct mm_lock_range *range;
-	struct vm_area_struct *vma;
+	struct vm_area_struct pvma, *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	vm_fault_t fault, major = 0;
@@ -1423,6 +1423,23 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
+	if (vma_is_anonymous(vma)) {
+		/*
+		 * Allocate anon_vma if needed.
+		 * This needs to operate on the vma of record.
+		 */
+		fault = prepare_mm_fault(vma, flags);
+		if (fault)
+			goto got_fault;
+
+		/*
+		 * Copy vma attributes into a pseudo-vma.
+		 * This will be required when using fine grained locks.
+		 */
+		pvma = *vma;
+		vma = &pvma;
+	}
+
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
@@ -1437,6 +1454,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
 	 */
 	fault = handle_mm_fault_range(vma, address, flags, range);
+got_fault:
 	major |= fault & VM_FAULT_MAJOR;
 
 	/*
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 16/24] MM locking API: add vma locking API
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (14 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 15/24] x86 fault handler: use a pseudo-vma when operating on anonymous vmas Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 17/24] x86 fault handler: implement range locking Michel Lespinasse
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

This change adds the mm_vma_lock() and mm_vma_unlock() functions,
which are to be used to protect per-mm global structures (such as the
vma rbtree) when writers only hold a range lock.

The functions are no-ops when CONFIG_MM_LOCK_RANGE is not enabled,
as mmap_sem already protects such structures in that case.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 include/linux/mm_lock.h  | 24 ++++++++++++++++++++----
 include/linux/mm_types.h |  2 ++
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git include/linux/mm_lock.h include/linux/mm_lock.h
index a4d60bd56899..ebcc46bba211 100644
--- include/linux/mm_lock.h
+++ include/linux/mm_lock.h
@@ -14,6 +14,9 @@ static inline void mm_init_lock(struct mm_struct *mm)
        init_rwsem(&mm->mmap_sem);
 }
 
+static inline void mm_vma_lock(struct mm_struct *mm) {}
+static inline void mm_vma_unlock(struct mm_struct *mm) {}
+
 static inline void mm_init_coarse_lock_range(struct mm_lock_range *range) {}
 static inline void mm_init_lock_range(struct mm_lock_range *range,
 		unsigned long start, unsigned long end) {}
@@ -107,6 +110,9 @@ static inline void mm_init_lock(struct mm_struct *mm)
        init_rwsem(&mm->mmap_sem);
 }
 
+static inline void mm_vma_lock(struct mm_struct *mm) {}
+static inline void mm_vma_unlock(struct mm_struct *mm) {}
+
 static inline void mm_init_coarse_lock_range(struct mm_lock_range *range)
 {
 	range->mm = NULL;
@@ -131,10 +137,11 @@ static inline bool mm_range_is_coarse(struct mm_lock_range *range)
 #define __DEP_MAP_MM_LOCK_INITIALIZER(lockname)
 #endif
 
-#define MM_LOCK_INITIALIZER(name) {			\
-	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
-	.rb_root = RB_ROOT,				\
-	__DEP_MAP_MM_LOCK_INITIALIZER(name)		\
+#define MM_LOCK_INITIALIZER(name) {				\
+	.mutex = __MUTEX_INITIALIZER(name.mutex),		\
+	.rb_root = RB_ROOT,					\
+	.vma_mutex = __MUTEX_INITIALIZER(name.vma_mutex),	\
+	__DEP_MAP_MM_LOCK_INITIALIZER(name)			\
 }
 
 #define MM_COARSE_LOCK_RANGE_INITIALIZER {		\
@@ -148,9 +155,18 @@ static inline void mm_init_lock(struct mm_struct *mm)
 
 	mutex_init(&mm->mmap_sem.mutex);
 	mm->mmap_sem.rb_root = RB_ROOT;
+	mutex_init(&mm->mmap_sem.vma_mutex);
 	lockdep_init_map(&mm->mmap_sem.dep_map, "&mm->mmap_sem", &__key, 0);
 }
 
+static inline void mm_vma_lock(struct mm_struct *mm) {
+	mutex_lock(&mm->mmap_sem.vma_mutex);
+}
+
+static inline void mm_vma_unlock(struct mm_struct *mm) {
+	mutex_unlock(&mm->mmap_sem.vma_mutex);
+}
+
 static inline void mm_init_lock_range(struct mm_lock_range *range,
 		unsigned long start, unsigned long end) {
 	range->start = start;
diff --git include/linux/mm_types.h include/linux/mm_types.h
index 941610c906b3..c40341d851cb 100644
--- include/linux/mm_types.h
+++ include/linux/mm_types.h
@@ -292,9 +292,11 @@ struct mm_lock {
 	struct mutex mutex;
 	struct rb_root rb_root;
 	unsigned long seq;
+	struct mutex vma_mutex;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map dep_map;
 #endif
+
 };
 #endif
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 17/24] x86 fault handler: implement range locking
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (15 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 16/24] MM locking API: add vma locking API Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 18/24] shared file mappings: use the vmf->range field when dropping mmap_sem Michel Lespinasse
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change the x86 fault handler to implement range locking.

Initially we try to lock a pmd sized range around the faulting address,
which is appropriate for anon vmas. After finding the correct vma for
the faulting address, we verify that it is anonymous and fall back to
a coarse grained lock if necessary. If a fine grained lock is workable,
we copy the vma of record into a pseudo-vma and release the mm_vma_lock
before handling the fault.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/fault.c | 40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index 52333272e14e..1e37284d373c 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -941,6 +941,7 @@ bad_area(struct pt_regs *regs, unsigned long error_code,
 	 unsigned long address, struct vm_area_struct *vma,
 	 struct mm_lock_range *range)
 {
+	struct mm_struct *mm;
 	u32 pkey = 0;
 	int si_code = SEGV_MAPERR;
 
@@ -984,7 +985,10 @@ bad_area(struct pt_regs *regs, unsigned long error_code,
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	mm_read_range_unlock(current->mm, range);
+	mm = current->mm;
+	if (!mm_range_is_coarse(range))
+		mm_vma_unlock(mm);
+	mm_read_range_unlock(mm, range);
 
 	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -1278,7 +1282,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			unsigned long hw_error_code,
 			unsigned long address)
 {
-	struct mm_lock_range *range;
+	struct mm_lock_range pmd_range, *range;
 	struct vm_area_struct pvma, *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
@@ -1363,7 +1367,10 @@ void do_user_addr_fault(struct pt_regs *regs,
 	}
 #endif
 
-	range = mm_coarse_lock_range();
+	mm_init_lock_range(&pmd_range,
+			   address & PMD_MASK,
+			   (address & PMD_MASK) + PMD_SIZE);
+	range = &pmd_range;
 
 	/*
 	 * Kernel-mode access to the user address space should only occur
@@ -1397,6 +1404,8 @@ void do_user_addr_fault(struct pt_regs *regs,
 		might_sleep();
 	}
 
+	if (!mm_range_is_coarse(range))
+		mm_vma_lock(mm);
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
 		bad_area(regs, hw_error_code, address, NULL, range);
@@ -1408,6 +1417,10 @@ void do_user_addr_fault(struct pt_regs *regs,
 		bad_area(regs, hw_error_code, address, NULL, range);
 		return;
 	}
+	/*
+	 * Note that if range is fine grained, we can still safely call
+	 * expand_stack as we are protected by the mm_vma_lock().
+	 */
 	if (unlikely(expand_stack(vma, address))) {
 		bad_area(regs, hw_error_code, address, NULL, range);
 		return;
@@ -1423,23 +1436,34 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 	}
 
-	if (vma_is_anonymous(vma)) {
+	if (!mm_range_is_coarse(range)) {
 		/*
 		 * Allocate anon_vma if needed.
 		 * This needs to operate on the vma of record.
 		 */
 		fault = prepare_mm_fault(vma, flags);
-		if (fault)
-			goto got_fault;
 
 		/*
 		 * Copy vma attributes into a pseudo-vma.
-		 * This will be required when using fine grained locks.
+		 * The vma of record is only valid until mm_vma_unlock().
 		 */
 		pvma = *vma;
 		vma = &pvma;
-	}
+		mm_vma_unlock(mm);
 
+		if (fault)
+			goto got_fault;
+
+		/*
+		 * Fall back to locking the entire MM
+		 * when operating on file vma.
+		 */
+		if (!vma_is_anonymous(vma)) {
+			mm_read_range_unlock(mm, range);
+			range = mm_coarse_lock_range();
+			goto retry;
+		}
+	}
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 18/24] shared file mappings: use the vmf->range field when dropping mmap_sem
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (16 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 17/24] x86 fault handler: implement range locking Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 19/24] mm: add field to annotate vm_operations that support range locking Michel Lespinasse
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Modify lock_page_maybe_drop_mmap() and maybe_unlock_mmap_for_io()
to use the vmf->range field when dropping mmap_sem.

This covers dropping mmap_sem during:
- filemap_fault()
- shmem_fault()
- do_fault() write to shared file mapping
  [ through do_shared_fault and fault_dirty_shared_page() ]
- do_wp_page() write to shared file mapping
  [ through wp_page_shared() and fault_dirty_shared_page() ]

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 mm/filemap.c  | 3 ++-
 mm/internal.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git mm/filemap.c mm/filemap.c
index 3afb5a3f0b9c..7827de7b356c 100644
--- mm/filemap.c
+++ mm/filemap.c
@@ -2364,7 +2364,8 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
 			 * mmap_sem here and return 0 if we don't have a fpin.
 			 */
 			if (*fpin == NULL)
-				mm_read_unlock(vmf->vma->vm_mm);
+				mm_read_range_unlock(vmf->vma->vm_mm,
+						     vmf->range);
 			return 0;
 		}
 	} else
diff --git mm/internal.h mm/internal.h
index 22f361a1e284..9bfff428c5da 100644
--- mm/internal.h
+++ mm/internal.h
@@ -382,7 +382,7 @@ static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
 	if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
 	    FAULT_FLAG_ALLOW_RETRY) {
 		fpin = get_file(vmf->vma->vm_file);
-		mm_read_unlock(vmf->vma->vm_mm);
+		mm_read_range_unlock(vmf->vma->vm_mm, vmf->range);
 	}
 	return fpin;
 }
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 19/24] mm: add field to annotate vm_operations that support range locking
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (17 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 18/24] shared file mappings: use the vmf->range field when dropping mmap_sem Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 20/24] x86 fault handler: extend range locking to supported file vmas Michel Lespinasse
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Add a fine_grained field to struct vm_operations_struct,
and set it in the filesystems we have converted to support range locking.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 fs/ext4/file.c     |  1 +
 include/linux/mm.h | 16 ++++++++++++++++
 mm/filemap.c       |  1 +
 mm/shmem.c         |  1 +
 4 files changed, 19 insertions(+)

diff --git fs/ext4/file.c fs/ext4/file.c
index 6a7293a5cda2..8167fc7cc6ca 100644
--- fs/ext4/file.c
+++ fs/ext4/file.c
@@ -626,6 +626,7 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 	.fault		= ext4_filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite   = ext4_page_mkwrite,
+	.fine_grained	= true,
 };
 
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
diff --git include/linux/mm.h include/linux/mm.h
index 43b7121ae005..28b6af200214 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -526,6 +526,22 @@ struct vm_operations_struct {
 	 */
 	struct page *(*find_special_page)(struct vm_area_struct *vma,
 					  unsigned long addr);
+
+	/*
+	 * fine_grained indicates that the vm_operations support
+	 * fine grained mm locking.
+	 * - The methods may be called with a fine grained range lock
+	 *   covering a PMD sized region around the fault address;
+	 * - The range lock does not  protect against concurrent access
+	 *   to per-mmmm structures, so an appropriate lock must be used
+	 *   for such cases
+	 *   (such as mm_vma_lock() for accessing the vma rbtree);
+	 * - if dropping mmap_sem, the vmf->range must be used
+	 *   to release the specific locked range only;
+	 * - vmf->vma only holds a copy of the original vma.
+	 *   Any persistent vma updates must first look up the actual vma.
+	 */
+	bool fine_grained;
 };
 
 static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
diff --git mm/filemap.c mm/filemap.c
index 7827de7b356c..c9f95ca5737c 100644
--- mm/filemap.c
+++ mm/filemap.c
@@ -2699,6 +2699,7 @@ const struct vm_operations_struct generic_file_vm_ops = {
 	.fault		= filemap_fault,
 	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= filemap_page_mkwrite,
+	.fine_grained	= true,
 };
 
 /* This is used for a general mmap of a disk file */
diff --git mm/shmem.c mm/shmem.c
index 8793e8cc1a48..32ec4ad05df5 100644
--- mm/shmem.c
+++ mm/shmem.c
@@ -3865,6 +3865,7 @@ static const struct vm_operations_struct shmem_vm_ops = {
 	.set_policy     = shmem_set_policy,
 	.get_policy     = shmem_get_policy,
 #endif
+	.fine_grained	= true,
 };
 
 int shmem_init_fs_context(struct fs_context *fc)
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 20/24] x86 fault handler: extend range locking to supported file vmas
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (18 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 19/24] mm: add field to annotate vm_operations that support range locking Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 21/24] do_mmap: add locked argument Michel Lespinasse
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change the fault handler to operate with a fine grained range lock when
operating on any of the explicitly supported file types.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/fault.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git arch/x86/mm/fault.c arch/x86/mm/fault.c
index 1e37284d373c..ca30952896e1 100644
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -1456,9 +1456,9 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 		/*
 		 * Fall back to locking the entire MM
-		 * when operating on file vma.
+		 * when the vm_ops do not support fine grained range locking.
 		 */
-		if (!vma_is_anonymous(vma)) {
+		if (!vma_is_anonymous(vma) && !vma->vm_ops->fine_grained) {
 			mm_read_range_unlock(mm, range);
 			range = mm_coarse_lock_range();
 			goto retry;
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 21/24] do_mmap: add locked argument
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (19 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 20/24] x86 fault handler: extend range locking to supported file vmas Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 22/24] do_mmap: implement " Michel Lespinasse
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change the do_mmap() prototype to add a "locked" boolean argument.
For now all call sites set it to true.

Also remove the do_mmap_pgoff() API, which was just wrapping do_mmap()
with a forced vm_flags == 0 argument.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 arch/x86/mm/mpx.c  |  3 ++-
 fs/aio.c           |  6 +++---
 include/linux/mm.h | 13 ++-----------
 ipc/shm.c          |  3 ++-
 mm/mmap.c          |  8 ++++----
 mm/nommu.c         |  1 +
 mm/util.c          |  4 ++--
 7 files changed, 16 insertions(+), 22 deletions(-)

diff --git arch/x86/mm/mpx.c arch/x86/mm/mpx.c
index 3835c18020b8..f83cdf80f210 100644
--- arch/x86/mm/mpx.c
+++ arch/x86/mm/mpx.c
@@ -54,7 +54,8 @@ static unsigned long mpx_mmap(unsigned long len)
 
 	mm_write_lock(mm);
 	addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE,
-		       MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL);
+			MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0,
+			true, &populate, NULL);
 	mm_write_unlock(mm);
 	if (populate)
 		mm_populate(addr, populate);
diff --git fs/aio.c fs/aio.c
index 704766588df4..018bd24d6204 100644
--- fs/aio.c
+++ fs/aio.c
@@ -525,9 +525,9 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 		return -EINTR;
 	}
 
-	ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
-				       PROT_READ | PROT_WRITE,
-				       MAP_SHARED, 0, &unused, NULL);
+	ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size,
+				PROT_READ | PROT_WRITE,
+				MAP_SHARED, 0, 0, true, &unused, NULL);
 	mm_write_unlock(mm);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		ctx->mmap_size = 0;
diff --git include/linux/mm.h include/linux/mm.h
index 28b6af200214..8427e1d07b59 100644
--- include/linux/mm.h
+++ include/linux/mm.h
@@ -2361,22 +2361,13 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	struct list_head *uf);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
-	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
-	struct list_head *uf);
+	vm_flags_t vm_flags, unsigned long pgoff, bool locked,
+	unsigned long *populate, struct list_head *uf);
 extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
 		       struct list_head *uf, bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
 		     struct list_head *uf);
 
-static inline unsigned long
-do_mmap_pgoff(struct file *file, unsigned long addr,
-	unsigned long len, unsigned long prot, unsigned long flags,
-	unsigned long pgoff, unsigned long *populate,
-	struct list_head *uf)
-{
-	return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf);
-}
-
 #ifdef CONFIG_MMU
 extern int __mm_populate(unsigned long addr, unsigned long len,
 			 int ignore_errors);
diff --git ipc/shm.c ipc/shm.c
index c04fc21cbe46..90d24c4960b9 100644
--- ipc/shm.c
+++ ipc/shm.c
@@ -1558,7 +1558,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
 			goto invalid;
 	}
 
-	addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL);
+	addr = do_mmap(file, addr, size, prot, flags, 0, 0,
+			true, &populate, NULL);
 	*raddr = addr;
 	err = 0;
 	if (IS_ERR_VALUE(addr))
diff --git mm/mmap.c mm/mmap.c
index 0f95300c2788..2868e61927a1 100644
--- mm/mmap.c
+++ mm/mmap.c
@@ -1369,8 +1369,8 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode,
 unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long len, unsigned long prot,
 			unsigned long flags, vm_flags_t vm_flags,
-			unsigned long pgoff, unsigned long *populate,
-			struct list_head *uf)
+			unsigned long pgoff, bool locked,
+			unsigned long *populate, struct list_head *uf)
 {
 	struct mm_struct *mm = current->mm;
 	int pkey = 0;
@@ -2954,8 +2954,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	}
 
 	file = get_file(vma->vm_file);
-	ret = do_mmap_pgoff(vma->vm_file, start, size,
-			prot, flags, pgoff, &populate, NULL);
+	ret = do_mmap(vma->vm_file, start, size,
+			prot, flags, 0, pgoff, true, &populate, NULL);
 	fput(file);
 out:
 	mm_write_unlock(mm);
diff --git mm/nommu.c mm/nommu.c
index c137db1923bd..a2c2bf8d7676 100644
--- mm/nommu.c
+++ mm/nommu.c
@@ -1102,6 +1102,7 @@ unsigned long do_mmap(struct file *file,
 			unsigned long prot,
 			unsigned long flags,
 			vm_flags_t vm_flags,
+			bool locked,
 			unsigned long pgoff,
 			unsigned long *populate,
 			struct list_head *uf)
diff --git mm/util.c mm/util.c
index 511e442e7329..337b006aef6d 100644
--- mm/util.c
+++ mm/util.c
@@ -503,8 +503,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	if (!ret) {
 		if (mm_write_lock_killable(mm))
 			return -EINTR;
-		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
-				    &populate, &uf);
+		ret = do_mmap(file, addr, len, prot, flag, 0, pgoff,
+				true, &populate, &uf);
 		mm_write_unlock(mm);
 		userfaultfd_unmap_complete(mm, &uf);
 		if (populate)
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 22/24] do_mmap: implement locked argument
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (20 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 21/24] do_mmap: add locked argument Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 23/24] do_mmap: use locked=false in vm_mmap_pgoff() and aio_setup_ring() Michel Lespinasse
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

When locked is true, preserve the current behavior - the do_mmap()
caller is expected to already hold a coarse write lock on current->mmap_sem

When locked is false, change do_mmap() to acquiring the appropriate
MM locks. do_mmap() still acquires a coarse lock in this change, but can
now be locally changed to acquire a fine grained lock in the future.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 mm/mmap.c  | 106 ++++++++++++++++++++++++++++++++++++-----------------
 mm/nommu.c |  19 +++++++++-
 2 files changed, 89 insertions(+), 36 deletions(-)

diff --git mm/mmap.c mm/mmap.c
index 2868e61927a1..75755f1cbd0b 100644
--- mm/mmap.c
+++ mm/mmap.c
@@ -1406,22 +1406,29 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
 		return -EOVERFLOW;
 
+	if (!locked && mm_write_lock_killable(mm))
+		return -EINTR;
+
 	/* Too many mappings? */
-	if (mm->map_count > sysctl_max_map_count)
-		return -ENOMEM;
+	if (mm->map_count > sysctl_max_map_count) {
+		addr = -ENOMEM;
+		goto unlock;
+	}
 
 	/* Obtain the address to map to. we verify (or select) it and ensure
 	 * that it represents a valid section of the address space.
 	 */
 	addr = get_unmapped_area(file, addr, len, pgoff, flags);
 	if (IS_ERR_VALUE(addr))
-		return addr;
+		goto unlock;
 
 	if (flags & MAP_FIXED_NOREPLACE) {
 		struct vm_area_struct *vma = find_vma(mm, addr);
 
-		if (vma && vma->vm_start < addr + len)
-			return -EEXIST;
+		if (vma && vma->vm_start < addr + len) {
+			addr = -EEXIST;
+			goto unlock;
+		}
 	}
 
 	if (prot == PROT_EXEC) {
@@ -1437,19 +1444,24 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
 			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
 
-	if (flags & MAP_LOCKED)
-		if (!can_do_mlock())
-			return -EPERM;
+	if ((flags & MAP_LOCKED) && !can_do_mlock()) {
+		addr = -EPERM;
+		goto unlock;
+	}
 
-	if (mlock_future_check(mm, vm_flags, len))
-		return -EAGAIN;
+	if (mlock_future_check(mm, vm_flags, len)) {
+		addr = -EAGAIN;
+		goto unlock;
+	}
 
 	if (file) {
 		struct inode *inode = file_inode(file);
 		unsigned long flags_mask;
 
-		if (!file_mmap_ok(file, inode, pgoff, len))
-			return -EOVERFLOW;
+		if (!file_mmap_ok(file, inode, pgoff, len)) {
+			addr = -EOVERFLOW;
+			goto unlock;
+		}
 
 		flags_mask = LEGACY_MAP_MASK | file->f_op->mmap_supported_flags;
 
@@ -1465,27 +1477,37 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			flags &= LEGACY_MAP_MASK;
 			/* fall through */
 		case MAP_SHARED_VALIDATE:
-			if (flags & ~flags_mask)
-				return -EOPNOTSUPP;
+			if (flags & ~flags_mask) {
+				addr = -EOPNOTSUPP;
+				goto unlock;
+			}
 			if (prot & PROT_WRITE) {
-				if (!(file->f_mode & FMODE_WRITE))
-					return -EACCES;
-				if (IS_SWAPFILE(file->f_mapping->host))
-					return -ETXTBSY;
+				if (!(file->f_mode & FMODE_WRITE)) {
+					addr = -EACCES;
+					goto unlock;
+				}
+				if (IS_SWAPFILE(file->f_mapping->host)) {
+					addr = -ETXTBSY;
+					goto unlock;
+				}
 			}
 
 			/*
 			 * Make sure we don't allow writing to an append-only
 			 * file..
 			 */
-			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE))
-				return -EACCES;
+			if (IS_APPEND(inode) && (file->f_mode & FMODE_WRITE)) {
+				addr = -EACCES;
+				goto unlock;
+			}
 
 			/*
 			 * Make sure there are no mandatory locks on the file.
 			 */
-			if (locks_verify_locked(file))
-				return -EAGAIN;
+			if (locks_verify_locked(file)) {
+				addr = -EAGAIN;
+				goto unlock;
+			}
 
 			vm_flags |= VM_SHARED | VM_MAYSHARE;
 			if (!(file->f_mode & FMODE_WRITE))
@@ -1493,28 +1515,39 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 
 			/* fall through */
 		case MAP_PRIVATE:
-			if (!(file->f_mode & FMODE_READ))
-				return -EACCES;
+			if (!(file->f_mode & FMODE_READ)) {
+				addr = -EACCES;
+				goto unlock;
+			}
 			if (path_noexec(&file->f_path)) {
-				if (vm_flags & VM_EXEC)
-					return -EPERM;
+				if (vm_flags & VM_EXEC) {
+					addr = -EPERM;
+					goto unlock;
+				}
 				vm_flags &= ~VM_MAYEXEC;
 			}
 
-			if (!file->f_op->mmap)
-				return -ENODEV;
-			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-				return -EINVAL;
+			if (!file->f_op->mmap) {
+				addr = -ENODEV;
+				goto unlock;
+			}
+			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) {
+				addr = -EINVAL;
+				goto unlock;
+			}
 			break;
 
 		default:
-			return -EINVAL;
+			addr = -EINVAL;
+			goto unlock;
 		}
 	} else {
 		switch (flags & MAP_TYPE) {
 		case MAP_SHARED:
-			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP))
-				return -EINVAL;
+			if (vm_flags & (VM_GROWSDOWN|VM_GROWSUP)) {
+				addr = -EINVAL;
+				goto unlock;
+			}
 			/*
 			 * Ignore pgoff.
 			 */
@@ -1528,7 +1561,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			pgoff = addr >> PAGE_SHIFT;
 			break;
 		default:
-			return -EINVAL;
+			addr = -EINVAL;
+			goto unlock;
 		}
 	}
 
@@ -1551,6 +1585,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	    ((vm_flags & VM_LOCKED) ||
 	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
 		*populate = len;
+
+unlock:
+	if (!locked)
+		mm_write_unlock(mm);
 	return addr;
 }
 
diff --git mm/nommu.c mm/nommu.c
index a2c2bf8d7676..7fb1db89d4f8 100644
--- mm/nommu.c
+++ mm/nommu.c
@@ -1107,6 +1107,7 @@ unsigned long do_mmap(struct file *file,
 			unsigned long *populate,
 			struct list_head *uf)
 {
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	struct vm_region *region;
 	struct rb_node *rb;
@@ -1115,12 +1116,18 @@ unsigned long do_mmap(struct file *file,
 
 	*populate = 0;
 
+	if (!locked && mm_write_lock_killable(mm))
+		return -EINTR;
+
 	/* decide whether we should attempt the mapping, and if so what sort of
 	 * mapping */
 	ret = validate_mmap_request(file, addr, len, prot, flags, pgoff,
 				    &capabilities);
-	if (ret < 0)
+	if (ret < 0) {
+		if (!locked)
+			mm_write_unlock(mm);
 		return ret;
+	}
 
 	/* we ignore the address hint */
 	addr = 0;
@@ -1135,7 +1142,7 @@ unsigned long do_mmap(struct file *file,
 	if (!region)
 		goto error_getting_region;
 
-	vma = vm_area_alloc(current->mm);
+	vma = vm_area_alloc(mm);
 	if (!vma)
 		goto error_getting_vma;
 
@@ -1289,6 +1296,8 @@ unsigned long do_mmap(struct file *file,
 	}
 
 	up_write(&nommu_region_sem);
+	if (!locked)
+		mm_write_unlock(mm);
 
 	return result;
 
@@ -1301,6 +1310,8 @@ unsigned long do_mmap(struct file *file,
 	if (vma->vm_file)
 		fput(vma->vm_file);
 	vm_area_free(vma);
+	if (!locked)
+		mm_write_unlock(mm);
 	return ret;
 
 sharing_violation:
@@ -1314,12 +1325,16 @@ unsigned long do_mmap(struct file *file,
 	pr_warn("Allocation of vma for %lu byte allocation from process %d failed\n",
 			len, current->pid);
 	show_free_areas(0, NULL);
+	if (!locked)
+		mm_write_unlock(mm);
 	return -ENOMEM;
 
 error_getting_region:
 	pr_warn("Allocation of vm region for %lu byte allocation from process %d failed\n",
 			len, current->pid);
 	show_free_areas(0, NULL);
+	if (!locked)
+		mm_write_unlock(mm);
 	return -ENOMEM;
 }
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 23/24] do_mmap: use locked=false in vm_mmap_pgoff() and aio_setup_ring()
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (21 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 22/24] do_mmap: implement " Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2020-02-24 20:30 ` [RFC PATCH 24/24] do_mmap: implement easiest cases of fine grained locking Michel Lespinasse
  2022-03-20 22:08 ` [RFC PATCH 00/24] Fine grained MM locking Barry Song
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Change vm_mmap_pgoff() and aio_setup_ring() to call do_mmap()
with locked=false.

Moving the mmap_sem acquisition to within do_mmap()
enables it to acquire a fine grained lock in the future.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 fs/aio.c  | 12 ++----------
 mm/util.c |  8 +++-----
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git fs/aio.c fs/aio.c
index 018bd24d6204..0092855326eb 100644
--- fs/aio.c
+++ fs/aio.c
@@ -460,7 +460,6 @@ static const struct address_space_operations aio_ctx_aops = {
 static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 {
 	struct aio_ring *ring;
-	struct mm_struct *mm = current->mm;
 	unsigned long size, unused;
 	int nr_pages;
 	int i;
@@ -519,20 +518,13 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
-	if (mm_write_lock_killable(mm)) {
-		ctx->mmap_size = 0;
-		aio_free_ring(ctx);
-		return -EINTR;
-	}
-
 	ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size,
 				PROT_READ | PROT_WRITE,
-				MAP_SHARED, 0, 0, true, &unused, NULL);
-	mm_write_unlock(mm);
+				MAP_SHARED, 0, 0, false, &unused, NULL);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		ctx->mmap_size = 0;
 		aio_free_ring(ctx);
-		return -ENOMEM;
+		return (ctx->mmap_base == -EINTR) ? -EINTR : -ENOMEM;
 	}
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
diff --git mm/util.c mm/util.c
index 337b006aef6d..916bc7ac9bf2 100644
--- mm/util.c
+++ mm/util.c
@@ -501,12 +501,10 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		if (mm_write_lock_killable(mm))
-			return -EINTR;
 		ret = do_mmap(file, addr, len, prot, flag, 0, pgoff,
-				true, &populate, &uf);
-		mm_write_unlock(mm);
-		userfaultfd_unmap_complete(mm, &uf);
+				false, &populate, &uf);
+		if (ret != -EINTR)
+			userfaultfd_unmap_complete(mm, &uf);
 		if (populate)
 			mm_populate(ret, populate);
 	}
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 24/24] do_mmap: implement easiest cases of fine grained locking
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (22 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 23/24] do_mmap: use locked=false in vm_mmap_pgoff() and aio_setup_ring() Michel Lespinasse
@ 2020-02-24 20:30 ` Michel Lespinasse
  2022-03-20 22:08 ` [RFC PATCH 00/24] Fine grained MM locking Barry Song
  24 siblings, 0 replies; 28+ messages in thread
From: Michel Lespinasse @ 2020-02-24 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Laurent Dufour, Vlastimil Babka,
	Matthew Wilcox, Liam R . Howlett, Jerome Glisse, Davidlohr Bueso,
	David Rientjes
  Cc: linux-mm, Michel Lespinasse

Use a range lock in the easiest possible mmap case:
- the mmap address is known;
- there are no existing vmas within the mmap range;
- there is no file being mapped.

When these conditions are met, we can trivially support a fine grained
range lock by just holding the mm_vma_lock accross the entire mmap
operation. This is safe because the mmap only registers the new
mapping using O(log N) operations, and does not have to call back into
arbitrary code (such as file mmap handlers) or iterate over existing
vmas and mapped pages.

Signed-off-by: Michel Lespinasse <walken@google.com>
---
 mm/mmap.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git mm/mmap.c mm/mmap.c
index 75755f1cbd0b..5fa23f300e72 100644
--- mm/mmap.c
+++ mm/mmap.c
@@ -1372,6 +1372,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long pgoff, bool locked,
 			unsigned long *populate, struct list_head *uf)
 {
+	struct mm_lock_range mmap_range, *range = NULL;
 	struct mm_struct *mm = current->mm;
 	int pkey = 0;
 
@@ -1406,8 +1407,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
 		return -EOVERFLOW;
 
-	if (!locked && mm_write_lock_killable(mm))
-		return -EINTR;
+	if (!locked) {
+		if (addr && !file) {
+			mm_init_lock_range(&mmap_range, addr, addr + len);
+			range = &mmap_range;
+		} else
+			range = mm_coarse_lock_range();
+	retry:
+		if (mm_write_range_lock_killable(mm, range))
+			return -EINTR;
+		if (!mm_range_is_coarse(range))
+			mm_vma_lock(mm);
+	}
 
 	/* Too many mappings? */
 	if (mm->map_count > sysctl_max_map_count) {
@@ -1422,12 +1433,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (IS_ERR_VALUE(addr))
 		goto unlock;
 
-	if (flags & MAP_FIXED_NOREPLACE) {
+	if ((flags & MAP_FIXED_NOREPLACE) ||
+	    (!locked && !mm_range_is_coarse(range))) {
 		struct vm_area_struct *vma = find_vma(mm, addr);
 
 		if (vma && vma->vm_start < addr + len) {
-			addr = -EEXIST;
-			goto unlock;
+			if (flags & MAP_FIXED_NOREPLACE) {
+				addr = -EEXIST;
+				goto unlock;
+			} else {
+				mm_vma_unlock(mm);
+				mm_write_range_unlock(mm, range);
+				range = mm_coarse_lock_range();
+				goto retry;
+			}
 		}
 	}
 
@@ -1587,8 +1606,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		*populate = len;
 
 unlock:
-	if (!locked)
-		mm_write_unlock(mm);
+	if (!locked) {
+		if (!mm_range_is_coarse(range))
+			mm_vma_unlock(mm);
+		mm_write_range_unlock(mm, range);
+	}
 	return addr;
 }
 
-- 
2.25.0.341.g760bfbb309-goog



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 00/24] Fine grained MM locking
  2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
                   ` (23 preceding siblings ...)
  2020-02-24 20:30 ` [RFC PATCH 24/24] do_mmap: implement easiest cases of fine grained locking Michel Lespinasse
@ 2022-03-20 22:08 ` Barry Song
  2022-03-20 23:14   ` Matthew Wilcox
  24 siblings, 1 reply; 28+ messages in thread
From: Barry Song @ 2022-03-20 22:08 UTC (permalink / raw)
  To: walken
  Cc: Liam.Howlett, akpm, dave, jglisse, ldufour, linux-mm, peterz,
	rientjes, vbabka, willy

> Hi,
> 
> This is the first version of my work towards fine grained MM locking.
> This is still early work - I am happy with my page fault changes,
> but want to expand on the mmap/munmap side of things before I send the
> next version. I have previously shared this with some of the copied folks
> (for those who received that, there are no additional changes in this
> public resend). Please expect a v2 within a few weeks, with further
> changes for fine grained range locking in the mmap and munmap paths.

hello, Michel. I noticed rwsem has been renamed to mmap_lock and
some apis were created for taking the lock.
but is the original fine grained mm locking series still under
development? maybe i missed something but i failed to find v2
for it.

Thanks
Barry


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 00/24] Fine grained MM locking
  2022-03-20 22:08 ` [RFC PATCH 00/24] Fine grained MM locking Barry Song
@ 2022-03-20 23:14   ` Matthew Wilcox
  2022-03-21  0:20     ` Barry Song
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2022-03-20 23:14 UTC (permalink / raw)
  To: Barry Song
  Cc: walken, Liam.Howlett, akpm, dave, jglisse, ldufour, linux-mm,
	peterz, rientjes, vbabka

On Mon, Mar 21, 2022 at 11:08:48AM +1300, Barry Song wrote:
> hello, Michel. I noticed rwsem has been renamed to mmap_lock and
> some apis were created for taking the lock.
> but is the original fine grained mm locking series still under
> development? maybe i missed something but i failed to find v2
> for it.

Most recently posted as 20220128131006.67712-1-michel@lespinasse.org


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 00/24] Fine grained MM locking
  2022-03-20 23:14   ` Matthew Wilcox
@ 2022-03-21  0:20     ` Barry Song
  0 siblings, 0 replies; 28+ messages in thread
From: Barry Song @ 2022-03-21  0:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: walken, Liam.Howlett, Andrew Morton, dave, jglisse, ldufour,
	Linux-MM, Peter Zijlstra, rientjes, Vlastimil Babka

On Mon, Mar 21, 2022 at 12:14 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Mar 21, 2022 at 11:08:48AM +1300, Barry Song wrote:
> > hello, Michel. I noticed rwsem has been renamed to mmap_lock and
> > some apis were created for taking the lock.
> > but is the original fine grained mm locking series still under
> > development? maybe i missed something but i failed to find v2
> > for it.
>
> Most recently posted as 20220128131006.67712-1-michel@lespinasse.org

thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-03-21  0:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 20:30 [RFC PATCH 00/24] Fine grained MM locking Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 01/24] MM locking API: initial implementation as rwsem wrappers Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 02/24] MM locking API: use coccinelle to convert mmap_sem rwsem call sites Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 03/24] MM locking API: manual conversion of mmap_sem call sites missed by coccinelle Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 04/24] MM locking API: add range arguments Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 05/24] MM locking API: allow for sleeping during unlock Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 06/24] MM locking API: implement fine grained range locks Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 07/24] mm/memory: add range field to struct vm_fault Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 08/24] mm/memory: allow specifying MM lock range to handle_mm_fault() Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 09/24] do_swap_page: use the vmf->range field when dropping mmap_sem Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 10/24] handle_userfault: " Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 11/24] x86 fault handler: merge bad_area() functions Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 12/24] x86 fault handler: use an explicit MM lock range Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 13/24] mm/memory: add prepare_mm_fault() function Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 14/24] mm/swap_state: disable swap vma readahead Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 15/24] x86 fault handler: use a pseudo-vma when operating on anonymous vmas Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 16/24] MM locking API: add vma locking API Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 17/24] x86 fault handler: implement range locking Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 18/24] shared file mappings: use the vmf->range field when dropping mmap_sem Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 19/24] mm: add field to annotate vm_operations that support range locking Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 20/24] x86 fault handler: extend range locking to supported file vmas Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 21/24] do_mmap: add locked argument Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 22/24] do_mmap: implement " Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 23/24] do_mmap: use locked=false in vm_mmap_pgoff() and aio_setup_ring() Michel Lespinasse
2020-02-24 20:30 ` [RFC PATCH 24/24] do_mmap: implement easiest cases of fine grained locking Michel Lespinasse
2022-03-20 22:08 ` [RFC PATCH 00/24] Fine grained MM locking Barry Song
2022-03-20 23:14   ` Matthew Wilcox
2022-03-21  0:20     ` Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).