linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t
@ 2019-04-02 20:41 Daniel Jordan
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alan Tull, Alexey Kardashevskiy,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Davidlohr Bueso, Michael Ellerman, Moritz Fischer,
	Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc, linuxppc-dev,
	linux-fpga, linux-kernel

Hi,

From patch 1:

  Taking and dropping mmap_sem to modify a single counter, locked_vm, is
  overkill when the counter could be synchronized separately.
  
  Make mmap_sem a little less coarse by changing locked_vm to an atomic,
  the 64-bit variety to avoid issues with overflow on 32-bit systems.

This is a more conservative alternative to [1] with no user-visible
effects.  Thanks to Alexey Kardashevskiy for pointing out the racy
atomics and to Alex Williamson, Christoph Lameter, Ira Weiny, and Jason
Gunthorpe for their comments on [1].

Davidlohr Bueso recently did a similar conversion for pinned_vm[2].

Testing
 1. passes LTP mlock[all], munlock[all], fork, mmap, and mremap tests in an
    x86 kvm guest
 2. a VFIO-enabled x86 kvm guest shows the same VmLck in
    /proc/pid/status before and after this change
 3. cross-compiles on powerpc

The series is based on v5.1-rc3.  Please consider for 5.2.

Daniel

[1] https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jordan@oracle.com/
[2] https://lore.kernel.org/linux-mm/20190206175920.31082-1-dave@stgolabs.net/

Daniel Jordan (6):
  mm: change locked_vm's type from unsigned long to atomic64_t
  vfio/type1: drop mmap_sem now that locked_vm is atomic
  vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic
  fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic
  powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  kvm/book3s: drop mmap_sem now that locked_vm is atomic

 arch/powerpc/kvm/book3s_64_vio.c    | 34 ++++++++++--------------
 arch/powerpc/mm/mmu_context_iommu.c | 28 +++++++++-----------
 drivers/fpga/dfl-afu-dma-region.c   | 40 ++++++++++++-----------------
 drivers/vfio/vfio_iommu_spapr_tce.c | 37 ++++++++++++--------------
 drivers/vfio/vfio_iommu_type1.c     | 31 +++++++++-------------
 fs/proc/task_mmu.c                  |  2 +-
 include/linux/mm_types.h            |  2 +-
 kernel/fork.c                       |  2 +-
 mm/debug.c                          |  5 ++--
 mm/mlock.c                          |  4 +--
 mm/mmap.c                           | 18 ++++++-------
 mm/mremap.c                         |  6 ++---
 12 files changed, 89 insertions(+), 120 deletions(-)


base-commit: 79a3aaa7b82e3106be97842dedfd8429248896e6
-- 
2.21.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-02 22:04   ` Andrew Morton
                     ` (2 more replies)
  2019-04-02 20:41 ` [PATCH 2/6] vfio/type1: drop mmap_sem now that locked_vm is atomic Daniel Jordan
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alan Tull, Alexey Kardashevskiy,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Davidlohr Bueso, Michael Ellerman, Moritz Fischer,
	Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc, linuxppc-dev,
	linux-fpga, linux-kernel

Taking and dropping mmap_sem to modify a single counter, locked_vm, is
overkill when the counter could be synchronized separately.

Make mmap_sem a little less coarse by changing locked_vm to an atomic,
the 64-bit variety to avoid issues with overflow on 32-bit systems.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alan Tull <atull@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: <linux-mm@kvack.org>
Cc: <kvm@vger.kernel.org>
Cc: <kvm-ppc@vger.kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: <linux-fpga@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
---
 arch/powerpc/kvm/book3s_64_vio.c    | 14 ++++++++------
 arch/powerpc/mm/mmu_context_iommu.c | 15 ++++++++-------
 drivers/fpga/dfl-afu-dma-region.c   | 18 ++++++++++--------
 drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++++--------
 drivers/vfio/vfio_iommu_type1.c     | 10 ++++++----
 fs/proc/task_mmu.c                  |  2 +-
 include/linux/mm_types.h            |  2 +-
 kernel/fork.c                       |  2 +-
 mm/debug.c                          |  5 +++--
 mm/mlock.c                          |  4 ++--
 mm/mmap.c                           | 18 +++++++++---------
 mm/mremap.c                         |  6 +++---
 12 files changed, 61 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index f02b04973710..e7fdb6d10eeb 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
 static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
 {
 	long ret = 0;
+	s64 locked_vm;
 
 	if (!current || !current->mm)
 		return ret; /* process exited */
 
 	down_write(&current->mm->mmap_sem);
 
+	locked_vm = atomic64_read(&current->mm->locked_vm);
 	if (inc) {
 		unsigned long locked, lock_limit;
 
-		locked = current->mm->locked_vm + stt_pages;
+		locked = locked_vm + stt_pages;
 		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
 			ret = -ENOMEM;
 		else
-			current->mm->locked_vm += stt_pages;
+			atomic64_add(stt_pages, &current->mm->locked_vm);
 	} else {
-		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
-			stt_pages = current->mm->locked_vm;
+		if (WARN_ON_ONCE(stt_pages > locked_vm))
+			stt_pages = locked_vm;
 
-		current->mm->locked_vm -= stt_pages;
+		atomic64_sub(stt_pages, &current->mm->locked_vm);
 	}
 
 	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
 			inc ? '+' : '-',
 			stt_pages << PAGE_SHIFT,
-			current->mm->locked_vm << PAGE_SHIFT,
+			atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK),
 			ret ? " - exceeded" : "");
 
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..8038ac24a312 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
 		unsigned long npages, bool incr)
 {
 	long ret = 0, locked, lock_limit;
+	s64 locked_vm;
 
 	if (!npages)
 		return 0;
 
 	down_write(&mm->mmap_sem);
-
+	locked_vm = atomic64_read(&mm->locked_vm);
 	if (incr) {
-		locked = mm->locked_vm + npages;
+		locked = locked_vm + npages;
 		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
 			ret = -ENOMEM;
 		else
-			mm->locked_vm += npages;
+			atomic64_add(npages, &mm->locked_vm);
 	} else {
-		if (WARN_ON_ONCE(npages > mm->locked_vm))
-			npages = mm->locked_vm;
-		mm->locked_vm -= npages;
+		if (WARN_ON_ONCE(npages > locked_vm))
+			npages = locked_vm;
+		atomic64_sub(npages, &mm->locked_vm);
 	}
 
 	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
 			current ? current->pid : 0,
 			incr ? '+' : '-',
 			npages << PAGE_SHIFT,
-			mm->locked_vm << PAGE_SHIFT,
+			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK));
 	up_write(&mm->mmap_sem);
 
diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
index e18a786fc943..08132fd9b6b7 100644
--- a/drivers/fpga/dfl-afu-dma-region.c
+++ b/drivers/fpga/dfl-afu-dma-region.c
@@ -45,6 +45,7 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
 static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
 {
 	unsigned long locked, lock_limit;
+	s64 locked_vm;
 	int ret = 0;
 
 	/* the task is exiting. */
@@ -53,24 +54,25 @@ static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
 
 	down_write(&current->mm->mmap_sem);
 
+	locked_vm = atomic64_read(&current->mm->locked_vm);
 	if (incr) {
-		locked = current->mm->locked_vm + npages;
+		locked = locked_vm + npages;
 		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
 			ret = -ENOMEM;
 		else
-			current->mm->locked_vm += npages;
+			atomic64_add(npages, &current->mm->locked_vm);
 	} else {
-		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
-			npages = current->mm->locked_vm;
-		current->mm->locked_vm -= npages;
+		if (WARN_ON_ONCE(npages > locked_vm))
+			npages = locked_vm;
+		atomic64_sub(npages, &current->mm->locked_vm);
 	}
 
-	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
+	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
 		incr ? '+' : '-', npages << PAGE_SHIFT,
-		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
-		ret ? "- exceeded" : "");
+		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
+		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
 
 	up_write(&current->mm->mmap_sem);
 
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 8dbb270998f4..e7d787e5d839 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -36,7 +36,8 @@ static void tce_iommu_detach_group(void *iommu_data,
 
 static long try_increment_locked_vm(struct mm_struct *mm, long npages)
 {
-	long ret = 0, locked, lock_limit;
+	long ret = 0, lock_limit;
+	s64 locked;
 
 	if (WARN_ON_ONCE(!mm))
 		return -EPERM;
@@ -45,16 +46,16 @@ static long try_increment_locked_vm(struct mm_struct *mm, long npages)
 		return 0;
 
 	down_write(&mm->mmap_sem);
-	locked = mm->locked_vm + npages;
+	locked = atomic64_read(&mm->locked_vm) + npages;
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
 		ret = -ENOMEM;
 	else
-		mm->locked_vm += npages;
+		atomic64_add(npages, &mm->locked_vm);
 
 	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
 			npages << PAGE_SHIFT,
-			mm->locked_vm << PAGE_SHIFT,
+			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK),
 			ret ? " - exceeded" : "");
 
@@ -69,12 +70,12 @@ static void decrement_locked_vm(struct mm_struct *mm, long npages)
 		return;
 
 	down_write(&mm->mmap_sem);
-	if (WARN_ON_ONCE(npages > mm->locked_vm))
-		npages = mm->locked_vm;
-	mm->locked_vm -= npages;
+	if (WARN_ON_ONCE(npages > atomic64_read(&mm->locked_vm)))
+		npages = atomic64_read(&mm->locked_vm);
+	atomic64_sub(npages, &mm->locked_vm);
 	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
 			npages << PAGE_SHIFT,
-			mm->locked_vm << PAGE_SHIFT,
+			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK));
 	up_write(&mm->mmap_sem);
 }
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..5b2878697286 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -270,18 +270,19 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!ret) {
 		if (npage > 0) {
 			if (!dma->lock_cap) {
+				s64 locked_vm = atomic64_read(&mm->locked_vm);
 				unsigned long limit;
 
 				limit = task_rlimit(dma->task,
 						RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-				if (mm->locked_vm + npage > limit)
+				if (locked_vm + npage > limit)
 					ret = -ENOMEM;
 			}
 		}
 
 		if (!ret)
-			mm->locked_vm += npage;
+			atomic64_add(npage, &mm->locked_vm);
 
 		up_write(&mm->mmap_sem);
 	}
@@ -401,6 +402,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	long ret, pinned = 0, lock_acct = 0;
 	bool rsvd;
 	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
+	atomic64_t *locked_vm = &current->mm->locked_vm;
 
 	/* This code path is only user initiated */
 	if (!current->mm)
@@ -418,7 +420,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	 * pages are already counted against the user.
 	 */
 	if (!rsvd && !vfio_find_vpfn(dma, iova)) {
-		if (!dma->lock_cap && current->mm->locked_vm + 1 > limit) {
+		if (!dma->lock_cap && atomic64_read(locked_vm) + 1 > limit) {
 			put_pfn(*pfn_base, dma->prot);
 			pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
 					limit << PAGE_SHIFT);
@@ -445,7 +447,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 
 		if (!rsvd && !vfio_find_vpfn(dma, iova)) {
 			if (!dma->lock_cap &&
-			    current->mm->locked_vm + lock_acct + 1 > limit) {
+			    atomic64_read(locked_vm) + lock_acct + 1 > limit) {
 				put_pfn(pfn, dma->prot);
 				pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
 					__func__, limit << PAGE_SHIFT);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 92a91e7816d8..61da4b24d0e0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -58,7 +58,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	swap = get_mm_counter(mm, MM_SWAPENTS);
 	SEQ_PUT_DEC("VmPeak:\t", hiwater_vm);
 	SEQ_PUT_DEC(" kB\nVmSize:\t", total_vm);
-	SEQ_PUT_DEC(" kB\nVmLck:\t", mm->locked_vm);
+	SEQ_PUT_DEC(" kB\nVmLck:\t", atomic64_read(&mm->locked_vm));
 	SEQ_PUT_DEC(" kB\nVmPin:\t", atomic64_read(&mm->pinned_vm));
 	SEQ_PUT_DEC(" kB\nVmHWM:\t", hiwater_rss);
 	SEQ_PUT_DEC(" kB\nVmRSS:\t", total_rss);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7eade9132f02..5059b99a0827 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -410,7 +410,7 @@ struct mm_struct {
 		unsigned long hiwater_vm;  /* High-water virtual memory usage */
 
 		unsigned long total_vm;	   /* Total pages mapped */
-		unsigned long locked_vm;   /* Pages that have PG_mlocked set */
+		atomic64_t    locked_vm;   /* Pages that have PG_mlocked set */
 		atomic64_t    pinned_vm;   /* Refcount permanently increased */
 		unsigned long data_vm;	   /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
 		unsigned long exec_vm;	   /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..56be8cdc7b4a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -979,7 +979,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm->core_state = NULL;
 	mm_pgtables_bytes_init(mm);
 	mm->map_count = 0;
-	mm->locked_vm = 0;
+	atomic64_set(&mm->locked_vm, 0);
 	atomic64_set(&mm->pinned_vm, 0);
 	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
 	spin_lock_init(&mm->page_table_lock);
diff --git a/mm/debug.c b/mm/debug.c
index eee9c221280c..b9cd71927d3c 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -136,7 +136,7 @@ void dump_mm(const struct mm_struct *mm)
 #endif
 		"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
 		"pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
-		"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
+		"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %llx\n"
 		"pinned_vm %llx data_vm %lx exec_vm %lx stack_vm %lx\n"
 		"start_code %lx end_code %lx start_data %lx end_data %lx\n"
 		"start_brk %lx brk %lx start_stack %lx\n"
@@ -167,7 +167,8 @@ void dump_mm(const struct mm_struct *mm)
 		atomic_read(&mm->mm_count),
 		mm_pgtables_bytes(mm),
 		mm->map_count,
-		mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm,
+		mm->hiwater_rss, mm->hiwater_vm, mm->total_vm,
+		(u64)atomic64_read(&mm->locked_vm),
 		(u64)atomic64_read(&mm->pinned_vm),
 		mm->data_vm, mm->exec_vm, mm->stack_vm,
 		mm->start_code, mm->end_code, mm->start_data, mm->end_data,
diff --git a/mm/mlock.c b/mm/mlock.c
index 080f3b36415b..e492a155c51a 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -562,7 +562,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		nr_pages = -nr_pages;
 	else if (old_flags & VM_LOCKED)
 		nr_pages = 0;
-	mm->locked_vm += nr_pages;
+	atomic64_add(nr_pages, &mm->locked_vm);
 
 	/*
 	 * vm_flags is protected by the mmap_sem held in write mode.
@@ -687,7 +687,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
 	if (down_write_killable(&current->mm->mmap_sem))
 		return -EINTR;
 
-	locked += current->mm->locked_vm;
+	locked += atomic64_read(&current->mm->locked_vm);
 	if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) {
 		/*
 		 * It is possible that the regions requested intersect with
diff --git a/mm/mmap.c b/mm/mmap.c
index 41eb48d9b527..03576c1d530c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1339,7 +1339,7 @@ static inline int mlock_future_check(struct mm_struct *mm,
 	/*  mlock MCL_FUTURE? */
 	if (flags & VM_LOCKED) {
 		locked = len >> PAGE_SHIFT;
-		locked += mm->locked_vm;
+		locked += atomic64_read(&mm->locked_vm);
 		lock_limit = rlimit(RLIMIT_MEMLOCK);
 		lock_limit >>= PAGE_SHIFT;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
@@ -1825,7 +1825,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 					vma == get_gate_vma(current->mm))
 			vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
 		else
-			mm->locked_vm += (len >> PAGE_SHIFT);
+			atomic64_add(len >> PAGE_SHIFT, &mm->locked_vm);
 	}
 
 	if (file)
@@ -2301,7 +2301,7 @@ static int acct_stack_growth(struct vm_area_struct *vma,
 	if (vma->vm_flags & VM_LOCKED) {
 		unsigned long locked;
 		unsigned long limit;
-		locked = mm->locked_vm + grow;
+		locked = atomic64_read(&mm->locked_vm) + grow;
 		limit = rlimit(RLIMIT_MEMLOCK);
 		limit >>= PAGE_SHIFT;
 		if (locked > limit && !capable(CAP_IPC_LOCK))
@@ -2395,7 +2395,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 				 */
 				spin_lock(&mm->page_table_lock);
 				if (vma->vm_flags & VM_LOCKED)
-					mm->locked_vm += grow;
+					atomic64_add(grow, &mm->locked_vm);
 				vm_stat_account(mm, vma->vm_flags, grow);
 				anon_vma_interval_tree_pre_update_vma(vma);
 				vma->vm_end = address;
@@ -2475,7 +2475,7 @@ int expand_downwards(struct vm_area_struct *vma,
 				 */
 				spin_lock(&mm->page_table_lock);
 				if (vma->vm_flags & VM_LOCKED)
-					mm->locked_vm += grow;
+					atomic64_add(grow, &mm->locked_vm);
 				vm_stat_account(mm, vma->vm_flags, grow);
 				anon_vma_interval_tree_pre_update_vma(vma);
 				vma->vm_start = address;
@@ -2796,11 +2796,11 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	/*
 	 * unlock any mlock()ed ranges before detaching vmas
 	 */
-	if (mm->locked_vm) {
+	if (atomic64_read(&mm->locked_vm)) {
 		struct vm_area_struct *tmp = vma;
 		while (tmp && tmp->vm_start < end) {
 			if (tmp->vm_flags & VM_LOCKED) {
-				mm->locked_vm -= vma_pages(tmp);
+				atomic64_sub(vma_pages(tmp), &mm->locked_vm);
 				munlock_vma_pages_all(tmp);
 			}
 
@@ -3043,7 +3043,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
 	mm->total_vm += len >> PAGE_SHIFT;
 	mm->data_vm += len >> PAGE_SHIFT;
 	if (flags & VM_LOCKED)
-		mm->locked_vm += (len >> PAGE_SHIFT);
+		atomic64_add(len >> PAGE_SHIFT, &mm->locked_vm);
 	vma->vm_flags |= VM_SOFTDIRTY;
 	return 0;
 }
@@ -3115,7 +3115,7 @@ void exit_mmap(struct mm_struct *mm)
 		up_write(&mm->mmap_sem);
 	}
 
-	if (mm->locked_vm) {
+	if (atomic64_read(&mm->locked_vm)) {
 		vma = mm->mmap;
 		while (vma) {
 			if (vma->vm_flags & VM_LOCKED)
diff --git a/mm/mremap.c b/mm/mremap.c
index e3edef6b7a12..9a4046bb2875 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -422,7 +422,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	}
 
 	if (vm_flags & VM_LOCKED) {
-		mm->locked_vm += new_len >> PAGE_SHIFT;
+		atomic64_add(new_len >> PAGE_SHIFT, &mm->locked_vm);
 		*locked = true;
 	}
 
@@ -473,7 +473,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
 
 	if (vma->vm_flags & VM_LOCKED) {
 		unsigned long locked, lock_limit;
-		locked = mm->locked_vm << PAGE_SHIFT;
+		locked = atomic64_read(&mm->locked_vm) << PAGE_SHIFT;
 		lock_limit = rlimit(RLIMIT_MEMLOCK);
 		locked += new_len - old_len;
 		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
@@ -679,7 +679,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 
 			vm_stat_account(mm, vma->vm_flags, pages);
 			if (vma->vm_flags & VM_LOCKED) {
-				mm->locked_vm += pages;
+				atomic64_add(pages, &mm->locked_vm);
 				locked = true;
 				new_addr = addr;
 			}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] vfio/type1: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-02 20:41 ` [PATCH 3/6] vfio/spapr_tce: " Daniel Jordan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alex Williamson, Christoph Lameter,
	Davidlohr Bueso, linux-mm, kvm, linux-kernel

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <linux-mm@kvack.org>
Cc: <kvm@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
---
 drivers/vfio/vfio_iommu_type1.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5b2878697286..a227de6d9c4c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -257,7 +257,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
 static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 {
 	struct mm_struct *mm;
-	int ret;
+	s64 locked_vm;
+	int ret = 0;
 
 	if (!npage)
 		return 0;
@@ -266,25 +267,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!mm)
 		return -ESRCH; /* process exited */
 
-	ret = down_write_killable(&mm->mmap_sem);
-	if (!ret) {
-		if (npage > 0) {
-			if (!dma->lock_cap) {
-				s64 locked_vm = atomic64_read(&mm->locked_vm);
-				unsigned long limit;
-
-				limit = task_rlimit(dma->task,
-						RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+	locked_vm = atomic64_add_return(npage, &mm->locked_vm);
 
-				if (locked_vm + npage > limit)
-					ret = -ENOMEM;
-			}
+	if (npage > 0 && !dma->lock_cap) {
+		unsigned long limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >>
+								   PAGE_SHIFT;
+		if (locked_vm > limit) {
+			atomic64_sub(npage, &mm->locked_vm);
+			ret = -ENOMEM;
 		}
-
-		if (!ret)
-			atomic64_add(npage, &mm->locked_vm);
-
-		up_write(&mm->mmap_sem);
 	}
 
 	if (async)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
  2019-04-02 20:41 ` [PATCH 2/6] vfio/type1: drop mmap_sem now that locked_vm is atomic Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-02 20:41 ` [PATCH 4/6] fpga/dlf/afu: " Daniel Jordan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alexey Kardashevskiy, Alex Williamson,
	Christoph Lameter, Davidlohr Bueso, linux-mm, kvm, linux-kernel

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <linux-mm@kvack.org>
Cc: <kvm@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 36 ++++++++++++-----------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index e7d787e5d839..7675a3b28410 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -36,8 +36,9 @@ static void tce_iommu_detach_group(void *iommu_data,
 
 static long try_increment_locked_vm(struct mm_struct *mm, long npages)
 {
-	long ret = 0, lock_limit;
+	long ret = 0;
 	s64 locked;
+	unsigned long lock_limit;
 
 	if (WARN_ON_ONCE(!mm))
 		return -EPERM;
@@ -45,39 +46,32 @@ static long try_increment_locked_vm(struct mm_struct *mm, long npages)
 	if (!npages)
 		return 0;
 
-	down_write(&mm->mmap_sem);
-	locked = atomic64_read(&mm->locked_vm) + npages;
+	locked = atomic64_add_return(npages, &mm->locked_vm);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
 		ret = -ENOMEM;
-	else
-		atomic64_add(npages, &mm->locked_vm);
-
-	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
-			npages << PAGE_SHIFT,
-			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
-			rlimit(RLIMIT_MEMLOCK),
-			ret ? " - exceeded" : "");
+		atomic64_sub(npages, &mm->locked_vm);
+	}
 
-	up_write(&mm->mmap_sem);
+	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %lld/%lu%s\n", current->pid,
+			npages << PAGE_SHIFT, locked << PAGE_SHIFT,
+			lock_limit, ret ? " - exceeded" : "");
 
 	return ret;
 }
 
 static void decrement_locked_vm(struct mm_struct *mm, long npages)
 {
+	s64 locked;
+
 	if (!mm || !npages)
 		return;
 
-	down_write(&mm->mmap_sem);
-	if (WARN_ON_ONCE(npages > atomic64_read(&mm->locked_vm)))
-		npages = atomic64_read(&mm->locked_vm);
-	atomic64_sub(npages, &mm->locked_vm);
-	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
-			npages << PAGE_SHIFT,
-			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
+	locked = atomic64_sub_return(npages, &mm->locked_vm);
+	WARN_ON_ONCE(locked < 0);
+	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %lld/%lu\n", current->pid,
+			npages << PAGE_SHIFT, locked << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK));
-	up_write(&mm->mmap_sem);
 }
 
 /*
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
                   ` (2 preceding siblings ...)
  2019-04-02 20:41 ` [PATCH 3/6] vfio/spapr_tce: " Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-02 20:41 ` [PATCH 5/6] powerpc/mmu: " Daniel Jordan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alan Tull, Christoph Lameter, Davidlohr Bueso,
	Moritz Fischer, Wu Hao, linux-mm, linux-fpga, linux-kernel

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alan Tull <atull@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: <linux-mm@kvack.org>
Cc: <linux-fpga@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
---
 drivers/fpga/dfl-afu-dma-region.c | 40 ++++++++++++-------------------
 1 file changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
index 08132fd9b6b7..81e3e3a71758 100644
--- a/drivers/fpga/dfl-afu-dma-region.c
+++ b/drivers/fpga/dfl-afu-dma-region.c
@@ -35,46 +35,36 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
  * afu_dma_adjust_locked_vm - adjust locked memory
  * @dev: port device
  * @npages: number of pages
- * @incr: increase or decrease locked memory
  *
  * Increase or decrease the locked memory size with npages input.
  *
  * Return 0 on success.
  * Return -ENOMEM if locked memory size is over the limit and no CAP_IPC_LOCK.
  */
-static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
+static int afu_dma_adjust_locked_vm(struct device *dev, long pages)
 {
-	unsigned long locked, lock_limit;
+	unsigned long lock_limit;
 	s64 locked_vm;
 	int ret = 0;
 
 	/* the task is exiting. */
-	if (!current->mm)
+	if (!current->mm || !pages)
 		return 0;
 
-	down_write(&current->mm->mmap_sem);
-
-	locked_vm = atomic64_read(&current->mm->locked_vm);
-	if (incr) {
-		locked = locked_vm + npages;
+	locked_vm = atomic64_add_return(pages, &current->mm->locked_vm);
+	WARN_ON_ONCE(locked_vm < 0);
+	if (pages > 0 && !capable(CAP_IPC_LOCK)) {
 		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+		if (locked_vm > lock_limit) {
 			ret = -ENOMEM;
-		else
-			atomic64_add(npages, &current->mm->locked_vm);
-	} else {
-		if (WARN_ON_ONCE(npages > locked_vm))
-			npages = locked_vm;
-		atomic64_sub(npages, &current->mm->locked_vm);
+			atomic64_sub(pages, &current->mm->locked_vm);
+		}
 	}
 
 	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
-		incr ? '+' : '-', npages << PAGE_SHIFT,
-		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
-		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
-
-	up_write(&current->mm->mmap_sem);
+		(pages > 0) ? '+' : '-', pages << PAGE_SHIFT,
+		locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
+		ret ? "- exceeded" : "");
 
 	return ret;
 }
@@ -94,7 +84,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
 	struct device *dev = &pdata->dev->dev;
 	int ret, pinned;
 
-	ret = afu_dma_adjust_locked_vm(dev, npages, true);
+	ret = afu_dma_adjust_locked_vm(dev, npages);
 	if (ret)
 		return ret;
 
@@ -123,7 +113,7 @@ static int afu_dma_pin_pages(struct dfl_feature_platform_data *pdata,
 free_pages:
 	kfree(region->pages);
 unlock_vm:
-	afu_dma_adjust_locked_vm(dev, npages, false);
+	afu_dma_adjust_locked_vm(dev, -npages);
 	return ret;
 }
 
@@ -143,7 +133,7 @@ static void afu_dma_unpin_pages(struct dfl_feature_platform_data *pdata,
 
 	put_all_pages(region->pages, npages);
 	kfree(region->pages);
-	afu_dma_adjust_locked_vm(dev, npages, false);
+	afu_dma_adjust_locked_vm(dev, -npages);
 
 	dev_dbg(dev, "%ld pages unpinned\n", npages);
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
                   ` (3 preceding siblings ...)
  2019-04-02 20:41 ` [PATCH 4/6] fpga/dlf/afu: " Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-03  4:58   ` Christophe Leroy
  2019-04-02 20:41 ` [PATCH 6/6] kvm/book3s: " Daniel Jordan
  2019-04-03 12:51 ` [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Steven Sistare
  6 siblings, 1 reply; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alexey Kardashevskiy, Benjamin Herrenschmidt,
	Christoph Lameter, Davidlohr Bueso, Michael Ellerman,
	Paul Mackerras, linux-mm, linuxppc-dev, linux-kernel

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <linux-mm@kvack.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: <linux-kernel@vger.kernel.org>
---
 arch/powerpc/mm/mmu_context_iommu.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index 8038ac24a312..a4ef22b67c07 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t {
 static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
 		unsigned long npages, bool incr)
 {
-	long ret = 0, locked, lock_limit;
+	long ret = 0;
+	unsigned long lock_limit;
 	s64 locked_vm;
 
 	if (!npages)
 		return 0;
 
-	down_write(&mm->mmap_sem);
-	locked_vm = atomic64_read(&mm->locked_vm);
 	if (incr) {
-		locked = locked_vm + npages;
 		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+		locked_vm = atomic64_add_return(npages, &mm->locked_vm);
+		if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
 			ret = -ENOMEM;
-		else
-			atomic64_add(npages, &mm->locked_vm);
+			atomic64_sub(npages, &mm->locked_vm);
+		}
 	} else {
-		if (WARN_ON_ONCE(npages > locked_vm))
-			npages = locked_vm;
-		atomic64_sub(npages, &mm->locked_vm);
+		locked_vm = atomic64_sub_return(npages, &mm->locked_vm);
+		WARN_ON_ONCE(locked_vm < 0);
 	}
 
-	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
-			current ? current->pid : 0,
-			incr ? '+' : '-',
-			npages << PAGE_SHIFT,
-			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
+	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n",
+			current ? current->pid : 0, incr ? '+' : '-',
+			npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT,
 			rlimit(RLIMIT_MEMLOCK));
-	up_write(&mm->mmap_sem);
 
 	return ret;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] kvm/book3s: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
                   ` (4 preceding siblings ...)
  2019-04-02 20:41 ` [PATCH 5/6] powerpc/mmu: " Daniel Jordan
@ 2019-04-02 20:41 ` Daniel Jordan
  2019-04-03 12:51 ` [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Steven Sistare
  6 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-02 20:41 UTC (permalink / raw)
  To: akpm
  Cc: daniel.m.jordan, Alexey Kardashevskiy, Benjamin Herrenschmidt,
	Christoph Lameter, Davidlohr Bueso, Michael Ellerman,
	Paul Mackerras, linux-mm, kvm-ppc, linuxppc-dev, linux-kernel

With locked_vm now an atomic, there is no need to take mmap_sem as
writer.  Delete and refactor accordingly.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: <linux-mm@kvack.org>
Cc: <kvm-ppc@vger.kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: <linux-kernel@vger.kernel.org>
---
 arch/powerpc/kvm/book3s_64_vio.c | 34 +++++++++++---------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index e7fdb6d10eeb..8e034c3a5d25 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -56,7 +56,7 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
 	return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE;
 }
 
-static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
+static long kvmppc_account_memlimit(unsigned long pages, bool inc)
 {
 	long ret = 0;
 	s64 locked_vm;
@@ -64,33 +64,23 @@ static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
 	if (!current || !current->mm)
 		return ret; /* process exited */
 
-	down_write(&current->mm->mmap_sem);
-
-	locked_vm = atomic64_read(&current->mm->locked_vm);
 	if (inc) {
-		unsigned long locked, lock_limit;
+		unsigned long lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-		locked = locked_vm + stt_pages;
-		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+		locked_vm = atomic64_add_return(pages, &current->mm->locked_vm);
+		if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
 			ret = -ENOMEM;
-		else
-			atomic64_add(stt_pages, &current->mm->locked_vm);
+			atomic64_sub(pages, &current->mm->locked_vm);
+		}
 	} else {
-		if (WARN_ON_ONCE(stt_pages > locked_vm))
-			stt_pages = locked_vm;
-
-		atomic64_sub(stt_pages, &current->mm->locked_vm);
+		locked_vm = atomic64_sub_return(pages, &current->mm->locked_vm);
+		WARN_ON_ONCE(locked_vm < 0);
 	}
 
-	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
-			inc ? '+' : '-',
-			stt_pages << PAGE_SHIFT,
-			atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
-			rlimit(RLIMIT_MEMLOCK),
-			ret ? " - exceeded" : "");
-
-	up_write(&current->mm->mmap_sem);
+	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%lu %lld/%lu%s\n", current->pid,
+			inc ? '+' : '-', pages << PAGE_SHIFT,
+			locked_vm << PAGE_SHIFT,
+			rlimit(RLIMIT_MEMLOCK), ret ? " - exceeded" : "");
 
 	return ret;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
@ 2019-04-02 22:04   ` Andrew Morton
  2019-04-02 23:43     ` Davidlohr Bueso
  2019-04-03 15:58     ` Daniel Jordan
  2019-04-03  4:46   ` Christophe Leroy
  2019-04-11  4:22   ` Alexey Kardashevskiy
  2 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2019-04-02 22:04 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Alan Tull, Alexey Kardashevskiy, Alex Williamson,
	Benjamin Herrenschmidt, Christoph Lameter, Davidlohr Bueso,
	Michael Ellerman, Moritz Fischer, Paul Mackerras, Wu Hao,
	linux-mm, kvm, kvm-ppc, linuxppc-dev, linux-fpga, linux-kernel

On Tue,  2 Apr 2019 16:41:53 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> overkill when the counter could be synchronized separately.
> 
> Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> ...
>
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
>  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
>  {
>  	long ret = 0;
> +	s64 locked_vm;
>  
>  	if (!current || !current->mm)
>  		return ret; /* process exited */
>  
>  	down_write(&current->mm->mmap_sem);
>  
> +	locked_vm = atomic64_read(&current->mm->locked_vm);
>  	if (inc) {
>  		unsigned long locked, lock_limit;
>  
> -		locked = current->mm->locked_vm + stt_pages;
> +		locked = locked_vm + stt_pages;
>  		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>  			ret = -ENOMEM;
>  		else
> -			current->mm->locked_vm += stt_pages;
> +			atomic64_add(stt_pages, &current->mm->locked_vm);
>  	} else {
> -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> -			stt_pages = current->mm->locked_vm;
> +		if (WARN_ON_ONCE(stt_pages > locked_vm))
> +			stt_pages = locked_vm;
>  
> -		current->mm->locked_vm -= stt_pages;
> +		atomic64_sub(stt_pages, &current->mm->locked_vm);
>  	}

With the current code, current->mm->locked_vm cannot go negative. 
After the patch, it can go negative.  If someone else decreased
current->mm->locked_vm between this function's atomic64_read() and
atomic64_sub().

I guess this is a can't-happen in this case because the racing code
which performed the modification would have taken it negative anyway.

But this all makes me rather queazy.


Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
thinking that the benefit of removing a few mmap_sem-takings from a few
obscure drivers (sorry ;)) is pretty small.


Also, the argument for switching 32-bit arches to a 64-bit counter was
suspiciously vague.  What overflow issues?  Or are we just being lazy?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 22:04   ` Andrew Morton
@ 2019-04-02 23:43     ` Davidlohr Bueso
  2019-04-03 16:07       ` Daniel Jordan
  2019-04-03 15:58     ` Daniel Jordan
  1 sibling, 1 reply; 26+ messages in thread
From: Davidlohr Bueso @ 2019-04-02 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, Alan Tull, Alexey Kardashevskiy, Alex Williamson,
	Benjamin Herrenschmidt, Christoph Lameter, Michael Ellerman,
	Moritz Fischer, Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc,
	linuxppc-dev, linux-fpga, linux-kernel

On Tue, 02 Apr 2019, Andrew Morton wrote:

>Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
>thinking that the benefit of removing a few mmap_sem-takings from a few
>obscure drivers (sorry ;)) is pretty small.

afaik porting the remaining incorrect users of locked_vm to pinned_vm was
the next step before this one, which made converting locked_vm to atomic
hardly worth it. Daniel?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
  2019-04-02 22:04   ` Andrew Morton
@ 2019-04-03  4:46   ` Christophe Leroy
  2019-04-03 16:09     ` Daniel Jordan
  2019-04-11  4:22   ` Alexey Kardashevskiy
  2 siblings, 1 reply; 26+ messages in thread
From: Christophe Leroy @ 2019-04-03  4:46 UTC (permalink / raw)
  To: Daniel Jordan, akpm
  Cc: Davidlohr Bueso, kvm, Alan Tull, Alexey Kardashevskiy,
	linux-fpga, linux-kernel, kvm-ppc, linux-mm, Alex Williamson,
	Moritz Fischer, Christoph Lameter, linuxppc-dev, Wu Hao



Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> overkill when the counter could be synchronized separately.
> 
> Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> the 64-bit variety to avoid issues with overflow on 32-bit systems.

Can you elaborate on the above ? Previously it was 'unsigned long', what 
were the issues ? If there was such issues, shouldn't there be a first 
patch moving it from unsigned long to u64 before this atomic64_t change 
? Or at least it should be clearly explain here what the issues are and 
how switching to a 64 bit counter fixes them.

Christophe

> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Tull <atull@kernel.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Moritz Fischer <mdf@kernel.org>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Wu Hao <hao.wu@intel.com>
> Cc: <linux-mm@kvack.org>
> Cc: <kvm@vger.kernel.org>
> Cc: <kvm-ppc@vger.kernel.org>
> Cc: <linuxppc-dev@lists.ozlabs.org>
> Cc: <linux-fpga@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> ---
>   arch/powerpc/kvm/book3s_64_vio.c    | 14 ++++++++------
>   arch/powerpc/mm/mmu_context_iommu.c | 15 ++++++++-------
>   drivers/fpga/dfl-afu-dma-region.c   | 18 ++++++++++--------
>   drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++++--------
>   drivers/vfio/vfio_iommu_type1.c     | 10 ++++++----
>   fs/proc/task_mmu.c                  |  2 +-
>   include/linux/mm_types.h            |  2 +-
>   kernel/fork.c                       |  2 +-
>   mm/debug.c                          |  5 +++--
>   mm/mlock.c                          |  4 ++--
>   mm/mmap.c                           | 18 +++++++++---------
>   mm/mremap.c                         |  6 +++---
>   12 files changed, 61 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index f02b04973710..e7fdb6d10eeb 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
>   static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
>   {
>   	long ret = 0;
> +	s64 locked_vm;
>   
>   	if (!current || !current->mm)
>   		return ret; /* process exited */
>   
>   	down_write(&current->mm->mmap_sem);
>   
> +	locked_vm = atomic64_read(&current->mm->locked_vm);
>   	if (inc) {
>   		unsigned long locked, lock_limit;
>   
> -		locked = current->mm->locked_vm + stt_pages;
> +		locked = locked_vm + stt_pages;
>   		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   			ret = -ENOMEM;
>   		else
> -			current->mm->locked_vm += stt_pages;
> +			atomic64_add(stt_pages, &current->mm->locked_vm);
>   	} else {
> -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> -			stt_pages = current->mm->locked_vm;
> +		if (WARN_ON_ONCE(stt_pages > locked_vm))
> +			stt_pages = locked_vm;
>   
> -		current->mm->locked_vm -= stt_pages;
> +		atomic64_sub(stt_pages, &current->mm->locked_vm);
>   	}
>   
>   	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
>   			inc ? '+' : '-',
>   			stt_pages << PAGE_SHIFT,
> -			current->mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
>   			rlimit(RLIMIT_MEMLOCK),
>   			ret ? " - exceeded" : "");
>   
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index e7a9c4f6bfca..8038ac24a312 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>   		unsigned long npages, bool incr)
>   {
>   	long ret = 0, locked, lock_limit;
> +	s64 locked_vm;
>   
>   	if (!npages)
>   		return 0;
>   
>   	down_write(&mm->mmap_sem);
> -
> +	locked_vm = atomic64_read(&mm->locked_vm);
>   	if (incr) {
> -		locked = mm->locked_vm + npages;
> +		locked = locked_vm + npages;
>   		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   			ret = -ENOMEM;
>   		else
> -			mm->locked_vm += npages;
> +			atomic64_add(npages, &mm->locked_vm);
>   	} else {
> -		if (WARN_ON_ONCE(npages > mm->locked_vm))
> -			npages = mm->locked_vm;
> -		mm->locked_vm -= npages;
> +		if (WARN_ON_ONCE(npages > locked_vm))
> +			npages = locked_vm;
> +		atomic64_sub(npages, &mm->locked_vm);
>   	}
>   
>   	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
>   			current ? current->pid : 0,
>   			incr ? '+' : '-',
>   			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
>   			rlimit(RLIMIT_MEMLOCK));
>   	up_write(&mm->mmap_sem);
>   
> diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
> index e18a786fc943..08132fd9b6b7 100644
> --- a/drivers/fpga/dfl-afu-dma-region.c
> +++ b/drivers/fpga/dfl-afu-dma-region.c
> @@ -45,6 +45,7 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
>   static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
>   {
>   	unsigned long locked, lock_limit;
> +	s64 locked_vm;
>   	int ret = 0;
>   
>   	/* the task is exiting. */
> @@ -53,24 +54,25 @@ static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
>   
>   	down_write(&current->mm->mmap_sem);
>   
> +	locked_vm = atomic64_read(&current->mm->locked_vm);
>   	if (incr) {
> -		locked = current->mm->locked_vm + npages;
> +		locked = locked_vm + npages;
>   		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   
>   		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   			ret = -ENOMEM;
>   		else
> -			current->mm->locked_vm += npages;
> +			atomic64_add(npages, &current->mm->locked_vm);
>   	} else {
> -		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> -			npages = current->mm->locked_vm;
> -		current->mm->locked_vm -= npages;
> +		if (WARN_ON_ONCE(npages > locked_vm))
> +			npages = locked_vm;
> +		atomic64_sub(npages, &current->mm->locked_vm);
>   	}
>   
> -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
>   		incr ? '+' : '-', npages << PAGE_SHIFT,
> -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> -		ret ? "- exceeded" : "");
> +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
>   
>   	up_write(&current->mm->mmap_sem);
>   
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 8dbb270998f4..e7d787e5d839 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -36,7 +36,8 @@ static void tce_iommu_detach_group(void *iommu_data,
>   
>   static long try_increment_locked_vm(struct mm_struct *mm, long npages)
>   {
> -	long ret = 0, locked, lock_limit;
> +	long ret = 0, lock_limit;
> +	s64 locked;
>   
>   	if (WARN_ON_ONCE(!mm))
>   		return -EPERM;
> @@ -45,16 +46,16 @@ static long try_increment_locked_vm(struct mm_struct *mm, long npages)
>   		return 0;
>   
>   	down_write(&mm->mmap_sem);
> -	locked = mm->locked_vm + npages;
> +	locked = atomic64_read(&mm->locked_vm) + npages;
>   	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>   		ret = -ENOMEM;
>   	else
> -		mm->locked_vm += npages;
> +		atomic64_add(npages, &mm->locked_vm);
>   
>   	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
>   			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
>   			rlimit(RLIMIT_MEMLOCK),
>   			ret ? " - exceeded" : "");
>   
> @@ -69,12 +70,12 @@ static void decrement_locked_vm(struct mm_struct *mm, long npages)
>   		return;
>   
>   	down_write(&mm->mmap_sem);
> -	if (WARN_ON_ONCE(npages > mm->locked_vm))
> -		npages = mm->locked_vm;
> -	mm->locked_vm -= npages;
> +	if (WARN_ON_ONCE(npages > atomic64_read(&mm->locked_vm)))
> +		npages = atomic64_read(&mm->locked_vm);
> +	atomic64_sub(npages, &mm->locked_vm);
>   	pr_debug("[%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n", current->pid,
>   			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
>   			rlimit(RLIMIT_MEMLOCK));
>   	up_write(&mm->mmap_sem);
>   }
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 73652e21efec..5b2878697286 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -270,18 +270,19 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>   	if (!ret) {
>   		if (npage > 0) {
>   			if (!dma->lock_cap) {
> +				s64 locked_vm = atomic64_read(&mm->locked_vm);
>   				unsigned long limit;
>   
>   				limit = task_rlimit(dma->task,
>   						RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>   
> -				if (mm->locked_vm + npage > limit)
> +				if (locked_vm + npage > limit)
>   					ret = -ENOMEM;
>   			}
>   		}
>   
>   		if (!ret)
> -			mm->locked_vm += npage;
> +			atomic64_add(npage, &mm->locked_vm);
>   
>   		up_write(&mm->mmap_sem);
>   	}
> @@ -401,6 +402,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>   	long ret, pinned = 0, lock_acct = 0;
>   	bool rsvd;
>   	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
> +	atomic64_t *locked_vm = &current->mm->locked_vm;
>   
>   	/* This code path is only user initiated */
>   	if (!current->mm)
> @@ -418,7 +420,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>   	 * pages are already counted against the user.
>   	 */
>   	if (!rsvd && !vfio_find_vpfn(dma, iova)) {
> -		if (!dma->lock_cap && current->mm->locked_vm + 1 > limit) {
> +		if (!dma->lock_cap && atomic64_read(locked_vm) + 1 > limit) {
>   			put_pfn(*pfn_base, dma->prot);
>   			pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
>   					limit << PAGE_SHIFT);
> @@ -445,7 +447,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>   
>   		if (!rsvd && !vfio_find_vpfn(dma, iova)) {
>   			if (!dma->lock_cap &&
> -			    current->mm->locked_vm + lock_acct + 1 > limit) {
> +			    atomic64_read(locked_vm) + lock_acct + 1 > limit) {
>   				put_pfn(pfn, dma->prot);
>   				pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
>   					__func__, limit << PAGE_SHIFT);
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 92a91e7816d8..61da4b24d0e0 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -58,7 +58,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>   	swap = get_mm_counter(mm, MM_SWAPENTS);
>   	SEQ_PUT_DEC("VmPeak:\t", hiwater_vm);
>   	SEQ_PUT_DEC(" kB\nVmSize:\t", total_vm);
> -	SEQ_PUT_DEC(" kB\nVmLck:\t", mm->locked_vm);
> +	SEQ_PUT_DEC(" kB\nVmLck:\t", atomic64_read(&mm->locked_vm));
>   	SEQ_PUT_DEC(" kB\nVmPin:\t", atomic64_read(&mm->pinned_vm));
>   	SEQ_PUT_DEC(" kB\nVmHWM:\t", hiwater_rss);
>   	SEQ_PUT_DEC(" kB\nVmRSS:\t", total_rss);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 7eade9132f02..5059b99a0827 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -410,7 +410,7 @@ struct mm_struct {
>   		unsigned long hiwater_vm;  /* High-water virtual memory usage */
>   
>   		unsigned long total_vm;	   /* Total pages mapped */
> -		unsigned long locked_vm;   /* Pages that have PG_mlocked set */
> +		atomic64_t    locked_vm;   /* Pages that have PG_mlocked set */
>   		atomic64_t    pinned_vm;   /* Refcount permanently increased */
>   		unsigned long data_vm;	   /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
>   		unsigned long exec_vm;	   /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9dcd18aa210b..56be8cdc7b4a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -979,7 +979,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>   	mm->core_state = NULL;
>   	mm_pgtables_bytes_init(mm);
>   	mm->map_count = 0;
> -	mm->locked_vm = 0;
> +	atomic64_set(&mm->locked_vm, 0);
>   	atomic64_set(&mm->pinned_vm, 0);
>   	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
>   	spin_lock_init(&mm->page_table_lock);
> diff --git a/mm/debug.c b/mm/debug.c
> index eee9c221280c..b9cd71927d3c 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -136,7 +136,7 @@ void dump_mm(const struct mm_struct *mm)
>   #endif
>   		"mmap_base %lu mmap_legacy_base %lu highest_vm_end %lu\n"
>   		"pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
> -		"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
> +		"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %llx\n"
>   		"pinned_vm %llx data_vm %lx exec_vm %lx stack_vm %lx\n"
>   		"start_code %lx end_code %lx start_data %lx end_data %lx\n"
>   		"start_brk %lx brk %lx start_stack %lx\n"
> @@ -167,7 +167,8 @@ void dump_mm(const struct mm_struct *mm)
>   		atomic_read(&mm->mm_count),
>   		mm_pgtables_bytes(mm),
>   		mm->map_count,
> -		mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm,
> +		mm->hiwater_rss, mm->hiwater_vm, mm->total_vm,
> +		(u64)atomic64_read(&mm->locked_vm),
>   		(u64)atomic64_read(&mm->pinned_vm),
>   		mm->data_vm, mm->exec_vm, mm->stack_vm,
>   		mm->start_code, mm->end_code, mm->start_data, mm->end_data,
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 080f3b36415b..e492a155c51a 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -562,7 +562,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
>   		nr_pages = -nr_pages;
>   	else if (old_flags & VM_LOCKED)
>   		nr_pages = 0;
> -	mm->locked_vm += nr_pages;
> +	atomic64_add(nr_pages, &mm->locked_vm);
>   
>   	/*
>   	 * vm_flags is protected by the mmap_sem held in write mode.
> @@ -687,7 +687,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
>   	if (down_write_killable(&current->mm->mmap_sem))
>   		return -EINTR;
>   
> -	locked += current->mm->locked_vm;
> +	locked += atomic64_read(&current->mm->locked_vm);
>   	if ((locked > lock_limit) && (!capable(CAP_IPC_LOCK))) {
>   		/*
>   		 * It is possible that the regions requested intersect with
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 41eb48d9b527..03576c1d530c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1339,7 +1339,7 @@ static inline int mlock_future_check(struct mm_struct *mm,
>   	/*  mlock MCL_FUTURE? */
>   	if (flags & VM_LOCKED) {
>   		locked = len >> PAGE_SHIFT;
> -		locked += mm->locked_vm;
> +		locked += atomic64_read(&mm->locked_vm);
>   		lock_limit = rlimit(RLIMIT_MEMLOCK);
>   		lock_limit >>= PAGE_SHIFT;
>   		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> @@ -1825,7 +1825,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>   					vma == get_gate_vma(current->mm))
>   			vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
>   		else
> -			mm->locked_vm += (len >> PAGE_SHIFT);
> +			atomic64_add(len >> PAGE_SHIFT, &mm->locked_vm);
>   	}
>   
>   	if (file)
> @@ -2301,7 +2301,7 @@ static int acct_stack_growth(struct vm_area_struct *vma,
>   	if (vma->vm_flags & VM_LOCKED) {
>   		unsigned long locked;
>   		unsigned long limit;
> -		locked = mm->locked_vm + grow;
> +		locked = atomic64_read(&mm->locked_vm) + grow;
>   		limit = rlimit(RLIMIT_MEMLOCK);
>   		limit >>= PAGE_SHIFT;
>   		if (locked > limit && !capable(CAP_IPC_LOCK))
> @@ -2395,7 +2395,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>   				 */
>   				spin_lock(&mm->page_table_lock);
>   				if (vma->vm_flags & VM_LOCKED)
> -					mm->locked_vm += grow;
> +					atomic64_add(grow, &mm->locked_vm);
>   				vm_stat_account(mm, vma->vm_flags, grow);
>   				anon_vma_interval_tree_pre_update_vma(vma);
>   				vma->vm_end = address;
> @@ -2475,7 +2475,7 @@ int expand_downwards(struct vm_area_struct *vma,
>   				 */
>   				spin_lock(&mm->page_table_lock);
>   				if (vma->vm_flags & VM_LOCKED)
> -					mm->locked_vm += grow;
> +					atomic64_add(grow, &mm->locked_vm);
>   				vm_stat_account(mm, vma->vm_flags, grow);
>   				anon_vma_interval_tree_pre_update_vma(vma);
>   				vma->vm_start = address;
> @@ -2796,11 +2796,11 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
>   	/*
>   	 * unlock any mlock()ed ranges before detaching vmas
>   	 */
> -	if (mm->locked_vm) {
> +	if (atomic64_read(&mm->locked_vm)) {
>   		struct vm_area_struct *tmp = vma;
>   		while (tmp && tmp->vm_start < end) {
>   			if (tmp->vm_flags & VM_LOCKED) {
> -				mm->locked_vm -= vma_pages(tmp);
> +				atomic64_sub(vma_pages(tmp), &mm->locked_vm);
>   				munlock_vma_pages_all(tmp);
>   			}
>   
> @@ -3043,7 +3043,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
>   	mm->total_vm += len >> PAGE_SHIFT;
>   	mm->data_vm += len >> PAGE_SHIFT;
>   	if (flags & VM_LOCKED)
> -		mm->locked_vm += (len >> PAGE_SHIFT);
> +		atomic64_add(len >> PAGE_SHIFT, &mm->locked_vm);
>   	vma->vm_flags |= VM_SOFTDIRTY;
>   	return 0;
>   }
> @@ -3115,7 +3115,7 @@ void exit_mmap(struct mm_struct *mm)
>   		up_write(&mm->mmap_sem);
>   	}
>   
> -	if (mm->locked_vm) {
> +	if (atomic64_read(&mm->locked_vm)) {
>   		vma = mm->mmap;
>   		while (vma) {
>   			if (vma->vm_flags & VM_LOCKED)
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e3edef6b7a12..9a4046bb2875 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -422,7 +422,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>   	}
>   
>   	if (vm_flags & VM_LOCKED) {
> -		mm->locked_vm += new_len >> PAGE_SHIFT;
> +		atomic64_add(new_len >> PAGE_SHIFT, &mm->locked_vm);
>   		*locked = true;
>   	}
>   
> @@ -473,7 +473,7 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>   
>   	if (vma->vm_flags & VM_LOCKED) {
>   		unsigned long locked, lock_limit;
> -		locked = mm->locked_vm << PAGE_SHIFT;
> +		locked = atomic64_read(&mm->locked_vm) << PAGE_SHIFT;
>   		lock_limit = rlimit(RLIMIT_MEMLOCK);
>   		locked += new_len - old_len;
>   		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> @@ -679,7 +679,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>   
>   			vm_stat_account(mm, vma->vm_flags, pages);
>   			if (vma->vm_flags & VM_LOCKED) {
> -				mm->locked_vm += pages;
> +				atomic64_add(pages, &mm->locked_vm);
>   				locked = true;
>   				new_addr = addr;
>   			}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-02 20:41 ` [PATCH 5/6] powerpc/mmu: " Daniel Jordan
@ 2019-04-03  4:58   ` Christophe Leroy
  2019-04-03 16:40     ` Daniel Jordan
  0 siblings, 1 reply; 26+ messages in thread
From: Christophe Leroy @ 2019-04-03  4:58 UTC (permalink / raw)
  To: Daniel Jordan, akpm
  Cc: Davidlohr Bueso, Alexey Kardashevskiy, linux-kernel, linux-mm,
	Paul Mackerras, Christoph Lameter, linuxppc-dev



Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> With locked_vm now an atomic, there is no need to take mmap_sem as
> writer.  Delete and refactor accordingly.

Could you please detail the change ? It looks like this is not the only 
change. I'm wondering what the consequences are.

Before we did:
- lock
- calculate future value
- check the future value is acceptable
- update value if future value acceptable
- return error if future value non acceptable
- unlock

Now we do:
- atomic update with future (possibly too high) value
- check the new value is acceptable
- atomic update back with older value if new value not acceptable and 
return error

So if a concurrent action wants to increase locked_vm with an acceptable 
step while another one has temporarily set it too high, it will now fail.

I think we should keep the previous approach and do a cmpxchg after 
validating the new value.

Christophe

> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: <linux-mm@kvack.org>
> Cc: <linuxppc-dev@lists.ozlabs.org>
> Cc: <linux-kernel@vger.kernel.org>
> ---
>   arch/powerpc/mm/mmu_context_iommu.c | 27 +++++++++++----------------
>   1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index 8038ac24a312..a4ef22b67c07 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -54,34 +54,29 @@ struct mm_iommu_table_group_mem_t {
>   static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>   		unsigned long npages, bool incr)
>   {
> -	long ret = 0, locked, lock_limit;
> +	long ret = 0;
> +	unsigned long lock_limit;
>   	s64 locked_vm;
>   
>   	if (!npages)
>   		return 0;
>   
> -	down_write(&mm->mmap_sem);
> -	locked_vm = atomic64_read(&mm->locked_vm);
>   	if (incr) {
> -		locked = locked_vm + npages;
>   		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> +		locked_vm = atomic64_add_return(npages, &mm->locked_vm);
> +		if (locked_vm > lock_limit && !capable(CAP_IPC_LOCK)) {
>   			ret = -ENOMEM;
> -		else
> -			atomic64_add(npages, &mm->locked_vm);
> +			atomic64_sub(npages, &mm->locked_vm);
> +		}
>   	} else {
> -		if (WARN_ON_ONCE(npages > locked_vm))
> -			npages = locked_vm;
> -		atomic64_sub(npages, &mm->locked_vm);
> +		locked_vm = atomic64_sub_return(npages, &mm->locked_vm);
> +		WARN_ON_ONCE(locked_vm < 0);
>   	}
>   
> -	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
> -			current ? current->pid : 0,
> -			incr ? '+' : '-',
> -			npages << PAGE_SHIFT,
> -			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
> +	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%lu %lld/%lu\n",
> +			current ? current->pid : 0, incr ? '+' : '-',
> +			npages << PAGE_SHIFT, locked_vm << PAGE_SHIFT,
>   			rlimit(RLIMIT_MEMLOCK));
> -	up_write(&mm->mmap_sem);
>   
>   	return ret;
>   }
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t
  2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
                   ` (5 preceding siblings ...)
  2019-04-02 20:41 ` [PATCH 6/6] kvm/book3s: " Daniel Jordan
@ 2019-04-03 12:51 ` Steven Sistare
  2019-04-03 16:52   ` Daniel Jordan
  6 siblings, 1 reply; 26+ messages in thread
From: Steven Sistare @ 2019-04-03 12:51 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, linux_lkml_grp, Alan Tull, Alexey Kardashevskiy,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Davidlohr Bueso, Michael Ellerman, Moritz Fischer,
	Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc, linuxppc-dev,
	linux-fpga, linux-kernel

On 4/2/2019 4:41 PM, Daniel Jordan wrote:
> Hi,
> 
> From patch 1:
> 
>   Taking and dropping mmap_sem to modify a single counter, locked_vm, is
>   overkill when the counter could be synchronized separately.
>   
>   Make mmap_sem a little less coarse by changing locked_vm to an atomic,
>   the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> This is a more conservative alternative to [1] with no user-visible
> effects.  Thanks to Alexey Kardashevskiy for pointing out the racy
> atomics and to Alex Williamson, Christoph Lameter, Ira Weiny, and Jason
> Gunthorpe for their comments on [1].
> 
> Davidlohr Bueso recently did a similar conversion for pinned_vm[2].
> 
> Testing
>  1. passes LTP mlock[all], munlock[all], fork, mmap, and mremap tests in an
>     x86 kvm guest
>  2. a VFIO-enabled x86 kvm guest shows the same VmLck in
>     /proc/pid/status before and after this change
>  3. cross-compiles on powerpc
> 
> The series is based on v5.1-rc3.  Please consider for 5.2.
> 
> Daniel
> 
> [1] https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jordan@oracle.com/
> [2] https://lore.kernel.org/linux-mm/20190206175920.31082-1-dave@stgolabs.net/
> 
> Daniel Jordan (6):
>   mm: change locked_vm's type from unsigned long to atomic64_t
>   vfio/type1: drop mmap_sem now that locked_vm is atomic
>   vfio/spapr_tce: drop mmap_sem now that locked_vm is atomic
>   fpga/dlf/afu: drop mmap_sem now that locked_vm is atomic
>   powerpc/mmu: drop mmap_sem now that locked_vm is atomic
>   kvm/book3s: drop mmap_sem now that locked_vm is atomic
> 
>  arch/powerpc/kvm/book3s_64_vio.c    | 34 ++++++++++--------------
>  arch/powerpc/mm/mmu_context_iommu.c | 28 +++++++++-----------
>  drivers/fpga/dfl-afu-dma-region.c   | 40 ++++++++++++-----------------
>  drivers/vfio/vfio_iommu_spapr_tce.c | 37 ++++++++++++--------------
>  drivers/vfio/vfio_iommu_type1.c     | 31 +++++++++-------------
>  fs/proc/task_mmu.c                  |  2 +-
>  include/linux/mm_types.h            |  2 +-
>  kernel/fork.c                       |  2 +-
>  mm/debug.c                          |  5 ++--
>  mm/mlock.c                          |  4 +--
>  mm/mmap.c                           | 18 ++++++-------
>  mm/mremap.c                         |  6 ++---
>  12 files changed, 89 insertions(+), 120 deletions(-)
> 
> base-commit: 79a3aaa7b82e3106be97842dedfd8429248896e6

Hi Daniel,
  You could clean all 6 patches up nicely with a common subroutine that
increases locked_vm subject to the rlimit.  Pass a bool arg that is true if
the  limit should be enforced, !dma->lock_cap for one call site, and
!capable(CAP_IPC_LOCK) for the rest.  Push the warnings and debug statements
to the subroutine as well.  One patch could refactor, and a second could
change the locking method.

- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 22:04   ` Andrew Morton
  2019-04-02 23:43     ` Davidlohr Bueso
@ 2019-04-03 15:58     ` Daniel Jordan
  1 sibling, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-03 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, Alan Tull, Alexey Kardashevskiy, Alex Williamson,
	Benjamin Herrenschmidt, Christoph Lameter, Davidlohr Bueso,
	Michael Ellerman, Moritz Fischer, Paul Mackerras, Wu Hao,
	linux-mm, kvm, kvm-ppc, linuxppc-dev, linux-fpga, linux-kernel

On Tue, Apr 02, 2019 at 03:04:24PM -0700, Andrew Morton wrote:
> On Tue,  2 Apr 2019 16:41:53 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> >  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
> >  {
> >  	long ret = 0;
> > +	s64 locked_vm;
> >  
> >  	if (!current || !current->mm)
> >  		return ret; /* process exited */
> >  
> >  	down_write(&current->mm->mmap_sem);
> >  
> > +	locked_vm = atomic64_read(&current->mm->locked_vm);
> >  	if (inc) {
> >  		unsigned long locked, lock_limit;
> >  
> > -		locked = current->mm->locked_vm + stt_pages;
> > +		locked = locked_vm + stt_pages;
> >  		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >  		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> >  			ret = -ENOMEM;
> >  		else
> > -			current->mm->locked_vm += stt_pages;
> > +			atomic64_add(stt_pages, &current->mm->locked_vm);
> >  	} else {
> > -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> > -			stt_pages = current->mm->locked_vm;
> > +		if (WARN_ON_ONCE(stt_pages > locked_vm))
> > +			stt_pages = locked_vm;
> >  
> > -		current->mm->locked_vm -= stt_pages;
> > +		atomic64_sub(stt_pages, &current->mm->locked_vm);
> >  	}
> 
> With the current code, current->mm->locked_vm cannot go negative. 
> After the patch, it can go negative.  If someone else decreased
> current->mm->locked_vm between this function's atomic64_read() and
> atomic64_sub().
> 
> I guess this is a can't-happen in this case because the racing code
> which performed the modification would have taken it negative anyway.
> 
> But this all makes me rather queazy.

mmap_sem is still held in this patch, so updates to locked_vm are still
serialized and I don't think what you describe can happen.  A later patch
removes mmap_sem, of course, but it also rewrites the code to do something
different.  This first patch is just a mechanical type change from unsigned
long to atomic64_t.

So...does this alleviate your symptoms?

> Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
> thinking that the benefit of removing a few mmap_sem-takings from a few
> obscure drivers (sorry ;)) is pretty small.

Not sure about the other drivers, but vfio type1 isn't obscure.  We use it
extensively in our cloud, and from Andrea's __GFP_THISNODE thread a few months
back it seems Red Hat also uses it:

  https://lore.kernel.org/linux-mm/20180820032204.9591-3-aarcange@redhat.com/

> Also, the argument for switching 32-bit arches to a 64-bit counter was
> suspiciously vague.  What overflow issues?  Or are we just being lazy?

If user-controlled values are used to increase locked_vm, multiple threads
doing it at once on a 32-bit system could theoretically cause overflow, so in
the absence of atomic overflow checking, the 64-bit counter on 32b is defensive
programming.

I wouldn't have thought to do it, but Jason Gunthorpe raised the same issue in
the pinned_vm series:

  https://lore.kernel.org/linux-mm/20190115205311.GD22031@mellanox.com/

I'm fine with changing it to atomic_long_t if the scenario is too theoretical
for people.


Anyway, thanks for looking at this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 23:43     ` Davidlohr Bueso
@ 2019-04-03 16:07       ` Daniel Jordan
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-03 16:07 UTC (permalink / raw)
  To: Andrew Morton, Daniel Jordan, Alan Tull, Alexey Kardashevskiy,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Michael Ellerman, Moritz Fischer, Paul Mackerras, Wu Hao,
	linux-mm, kvm, kvm-ppc, linuxppc-dev, linux-fpga, linux-kernel

On Tue, Apr 02, 2019 at 04:43:57PM -0700, Davidlohr Bueso wrote:
> On Tue, 02 Apr 2019, Andrew Morton wrote:
> 
> > Also, we didn't remove any down_write(mmap_sem)s from core code so I'm
> > thinking that the benefit of removing a few mmap_sem-takings from a few
> > obscure drivers (sorry ;)) is pretty small.
> 
> afaik porting the remaining incorrect users of locked_vm to pinned_vm was
> the next step before this one, which made converting locked_vm to atomic
> hardly worth it. Daniel?
 
Right, as you know I tried those incorrect users first, but there were concerns
about user-visible changes regarding RLIMIT_MEMLOCK and pinned_vm/locked_vm
without the accounting problem between all three being solved.

To my knowledge no one has a solution for that, so in the meantime I'm taking
the incremental step of getting rid of mmap_sem for locked_vm users.  The
locked_vm -> pinned_vm conversion can happen later.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-03  4:46   ` Christophe Leroy
@ 2019-04-03 16:09     ` Daniel Jordan
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-03 16:09 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Daniel Jordan, akpm, Davidlohr Bueso, kvm, Alan Tull,
	Alexey Kardashevskiy, linux-fpga, linux-kernel, kvm-ppc,
	linux-mm, Alex Williamson, Moritz Fischer, Christoph Lameter,
	linuxppc-dev, Wu Hao

On Wed, Apr 03, 2019 at 06:46:07AM +0200, Christophe Leroy wrote:
> 
> 
> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> > overkill when the counter could be synchronized separately.
> > 
> > Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> > the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> Can you elaborate on the above ? Previously it was 'unsigned long', what
> were the issues ?

Sure, I responded to this in another thread from this series.

> If there was such issues, shouldn't there be a first patch
> moving it from unsigned long to u64 before this atomic64_t change ? Or at
> least it should be clearly explain here what the issues are and how
> switching to a 64 bit counter fixes them.

Yes, I can explain the motivation in the next version.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-03  4:58   ` Christophe Leroy
@ 2019-04-03 16:40     ` Daniel Jordan
  2019-04-24  2:15       ` Davidlohr Bueso
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jordan @ 2019-04-03 16:40 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Daniel Jordan, akpm, Davidlohr Bueso, Alexey Kardashevskiy,
	linux-kernel, linux-mm, Paul Mackerras, Christoph Lameter,
	linuxppc-dev

On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > With locked_vm now an atomic, there is no need to take mmap_sem as
> > writer.  Delete and refactor accordingly.
> 
> Could you please detail the change ?

Ok, I'll be more specific in the next version, using some of your language in
fact.  :)

> It looks like this is not the only
> change. I'm wondering what the consequences are.
> 
> Before we did:
> - lock
> - calculate future value
> - check the future value is acceptable
> - update value if future value acceptable
> - return error if future value non acceptable
> - unlock
> 
> Now we do:
> - atomic update with future (possibly too high) value
> - check the new value is acceptable
> - atomic update back with older value if new value not acceptable and return
> error
> 
> So if a concurrent action wants to increase locked_vm with an acceptable
> step while another one has temporarily set it too high, it will now fail.
> 
> I think we should keep the previous approach and do a cmpxchg after
> validating the new value.

That's a good idea, and especially worth doing considering that an arbitrary
number of threads that charge a low amount of locked_vm can fail just because
one thread charges lots of it.

pinned_vm appears to be broken the same way, so I can fix it too unless someone
beats me to it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t
  2019-04-03 12:51 ` [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Steven Sistare
@ 2019-04-03 16:52   ` Daniel Jordan
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-03 16:52 UTC (permalink / raw)
  To: Steven Sistare
  Cc: Daniel Jordan, akpm, linux_lkml_grp, Alan Tull,
	Alexey Kardashevskiy, Alex Williamson, Benjamin Herrenschmidt,
	Christoph Lameter, Davidlohr Bueso, Michael Ellerman,
	Moritz Fischer, Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc,
	linuxppc-dev, linux-fpga, linux-kernel

On Wed, Apr 03, 2019 at 08:51:13AM -0400, Steven Sistare wrote:
> On 4/2/2019 4:41 PM, Daniel Jordan wrote:
> > [1] https://lore.kernel.org/linux-mm/20190211224437.25267-1-daniel.m.jordan@oracle.com/
> 
>   You could clean all 6 patches up nicely with a common subroutine that
> increases locked_vm subject to the rlimit.  Pass a bool arg that is true if
> the  limit should be enforced, !dma->lock_cap for one call site, and
> !capable(CAP_IPC_LOCK) for the rest.  Push the warnings and debug statements
> to the subroutine as well.  One patch could refactor, and a second could
> change the locking method.

Yes, I tried writing, but didn't end up including, such a subroutine for [1].
The devil was in the details, but with the cmpxchg business, it's more
worthwhile to iron all those out.  I'll give it a try.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
  2019-04-02 22:04   ` Andrew Morton
  2019-04-03  4:46   ` Christophe Leroy
@ 2019-04-11  4:22   ` Alexey Kardashevskiy
  2019-04-11  9:55     ` Mark Rutland
  2 siblings, 1 reply; 26+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-11  4:22 UTC (permalink / raw)
  To: Daniel Jordan, akpm
  Cc: Alan Tull, Alex Williamson, Benjamin Herrenschmidt,
	Christoph Lameter, Davidlohr Bueso, Michael Ellerman,
	Moritz Fischer, Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc,
	linuxppc-dev, linux-fpga, linux-kernel



On 03/04/2019 07:41, Daniel Jordan wrote:
> Taking and dropping mmap_sem to modify a single counter, locked_vm, is
> overkill when the counter could be synchronized separately.
> 
> Make mmap_sem a little less coarse by changing locked_vm to an atomic,
> the 64-bit variety to avoid issues with overflow on 32-bit systems.
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Tull <atull@kernel.org>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Moritz Fischer <mdf@kernel.org>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Wu Hao <hao.wu@intel.com>
> Cc: <linux-mm@kvack.org>
> Cc: <kvm@vger.kernel.org>
> Cc: <kvm-ppc@vger.kernel.org>
> Cc: <linuxppc-dev@lists.ozlabs.org>
> Cc: <linux-fpga@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c    | 14 ++++++++------
>  arch/powerpc/mm/mmu_context_iommu.c | 15 ++++++++-------
>  drivers/fpga/dfl-afu-dma-region.c   | 18 ++++++++++--------
>  drivers/vfio/vfio_iommu_spapr_tce.c | 17 +++++++++--------
>  drivers/vfio/vfio_iommu_type1.c     | 10 ++++++----
>  fs/proc/task_mmu.c                  |  2 +-
>  include/linux/mm_types.h            |  2 +-
>  kernel/fork.c                       |  2 +-
>  mm/debug.c                          |  5 +++--
>  mm/mlock.c                          |  4 ++--
>  mm/mmap.c                           | 18 +++++++++---------
>  mm/mremap.c                         |  6 +++---
>  12 files changed, 61 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index f02b04973710..e7fdb6d10eeb 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -59,32 +59,34 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
>  static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
>  {
>  	long ret = 0;
> +	s64 locked_vm;
>  
>  	if (!current || !current->mm)
>  		return ret; /* process exited */
>  
>  	down_write(&current->mm->mmap_sem);
>  
> +	locked_vm = atomic64_read(&current->mm->locked_vm);
>  	if (inc) {
>  		unsigned long locked, lock_limit;
>  
> -		locked = current->mm->locked_vm + stt_pages;
> +		locked = locked_vm + stt_pages;
>  		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>  			ret = -ENOMEM;
>  		else
> -			current->mm->locked_vm += stt_pages;
> +			atomic64_add(stt_pages, &current->mm->locked_vm);
>  	} else {
> -		if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm))
> -			stt_pages = current->mm->locked_vm;
> +		if (WARN_ON_ONCE(stt_pages > locked_vm))
> +			stt_pages = locked_vm;
>  
> -		current->mm->locked_vm -= stt_pages;
> +		atomic64_sub(stt_pages, &current->mm->locked_vm);
>  	}
>  
>  	pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid,
>  			inc ? '+' : '-',
>  			stt_pages << PAGE_SHIFT,
> -			current->mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
>  			rlimit(RLIMIT_MEMLOCK),
>  			ret ? " - exceeded" : "");
>  
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index e7a9c4f6bfca..8038ac24a312 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -55,30 +55,31 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
>  		unsigned long npages, bool incr)
>  {
>  	long ret = 0, locked, lock_limit;
> +	s64 locked_vm;
>  
>  	if (!npages)
>  		return 0;
>  
>  	down_write(&mm->mmap_sem);
> -
> +	locked_vm = atomic64_read(&mm->locked_vm);
>  	if (incr) {
> -		locked = mm->locked_vm + npages;
> +		locked = locked_vm + npages;
>  		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>  			ret = -ENOMEM;
>  		else
> -			mm->locked_vm += npages;
> +			atomic64_add(npages, &mm->locked_vm);
>  	} else {
> -		if (WARN_ON_ONCE(npages > mm->locked_vm))
> -			npages = mm->locked_vm;
> -		mm->locked_vm -= npages;
> +		if (WARN_ON_ONCE(npages > locked_vm))
> +			npages = locked_vm;
> +		atomic64_sub(npages, &mm->locked_vm);
>  	}
>  
>  	pr_debug("[%d] RLIMIT_MEMLOCK HASH64 %c%ld %ld/%ld\n",
>  			current ? current->pid : 0,
>  			incr ? '+' : '-',
>  			npages << PAGE_SHIFT,
> -			mm->locked_vm << PAGE_SHIFT,
> +			atomic64_read(&mm->locked_vm) << PAGE_SHIFT,
>  			rlimit(RLIMIT_MEMLOCK));
>  	up_write(&mm->mmap_sem);
>  
> diff --git a/drivers/fpga/dfl-afu-dma-region.c b/drivers/fpga/dfl-afu-dma-region.c
> index e18a786fc943..08132fd9b6b7 100644
> --- a/drivers/fpga/dfl-afu-dma-region.c
> +++ b/drivers/fpga/dfl-afu-dma-region.c
> @@ -45,6 +45,7 @@ void afu_dma_region_init(struct dfl_feature_platform_data *pdata)
>  static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
>  {
>  	unsigned long locked, lock_limit;
> +	s64 locked_vm;
>  	int ret = 0;
>  
>  	/* the task is exiting. */
> @@ -53,24 +54,25 @@ static int afu_dma_adjust_locked_vm(struct device *dev, long npages, bool incr)
>  
>  	down_write(&current->mm->mmap_sem);
>  
> +	locked_vm = atomic64_read(&current->mm->locked_vm);
>  	if (incr) {
> -		locked = current->mm->locked_vm + npages;
> +		locked = locked_vm + npages;
>  		lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  
>  		if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>  			ret = -ENOMEM;
>  		else
> -			current->mm->locked_vm += npages;
> +			atomic64_add(npages, &current->mm->locked_vm);
>  	} else {
> -		if (WARN_ON_ONCE(npages > current->mm->locked_vm))
> -			npages = current->mm->locked_vm;
> -		current->mm->locked_vm -= npages;
> +		if (WARN_ON_ONCE(npages > locked_vm))
> +			npages = locked_vm;
> +		atomic64_sub(npages, &current->mm->locked_vm);
>  	}
>  
> -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
>  		incr ? '+' : '-', npages << PAGE_SHIFT,
> -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> -		ret ? "- exceeded" : "");
> +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");



atomic64_read() returns "long" which matches "%ld", why this change (and
similar below)? You did not do this in the two pr_debug()s above anyway.


-- 
Alexey

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-11  4:22   ` Alexey Kardashevskiy
@ 2019-04-11  9:55     ` Mark Rutland
  2019-04-11 20:28       ` Daniel Jordan
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2019-04-11  9:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Daniel Jordan, akpm, Alan Tull, Alex Williamson,
	Benjamin Herrenschmidt, Christoph Lameter, Davidlohr Bueso,
	Michael Ellerman, Moritz Fischer, Paul Mackerras, Wu Hao,
	linux-mm, kvm, kvm-ppc, linuxppc-dev, linux-fpga, linux-kernel

On Thu, Apr 11, 2019 at 02:22:23PM +1000, Alexey Kardashevskiy wrote:
> On 03/04/2019 07:41, Daniel Jordan wrote:

> > -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> > +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
> >  		incr ? '+' : '-', npages << PAGE_SHIFT,
> > -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> > -		ret ? "- exceeded" : "");
> > +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> > +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
> 
> 
> 
> atomic64_read() returns "long" which matches "%ld", why this change (and
> similar below)? You did not do this in the two pr_debug()s above anyway.

Unfortunately, architectures return inconsistent types for atomic64 ops.

Some return long (e..g. powerpc), some return long long (e.g. arc), and
some return s64 (e.g. x86).

I'm currently trying to clean things up so that all use s64 [1], but in
the mean time it's necessary for generic code use a cast or temporarly
variable to ensure a consistent type. Once that's cleaned up, we can
remove the redundant casts.

Thanks,
Mark.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=atomics/type-cleanup

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-11  9:55     ` Mark Rutland
@ 2019-04-11 20:28       ` Daniel Jordan
  2019-04-16 23:33         ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jordan @ 2019-04-11 20:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexey Kardashevskiy, Daniel Jordan, akpm, Alan Tull,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Davidlohr Bueso, Michael Ellerman, Moritz Fischer,
	Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc, linuxppc-dev,
	linux-fpga, linux-kernel

On Thu, Apr 11, 2019 at 10:55:43AM +0100, Mark Rutland wrote:
> On Thu, Apr 11, 2019 at 02:22:23PM +1000, Alexey Kardashevskiy wrote:
> > On 03/04/2019 07:41, Daniel Jordan wrote:
> 
> > > -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> > > +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
> > >  		incr ? '+' : '-', npages << PAGE_SHIFT,
> > > -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> > > -		ret ? "- exceeded" : "");
> > > +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> > > +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
> > 
> > 
> > 
> > atomic64_read() returns "long" which matches "%ld", why this change (and
> > similar below)? You did not do this in the two pr_debug()s above anyway.
> 
> Unfortunately, architectures return inconsistent types for atomic64 ops.
> 
> Some return long (e..g. powerpc), some return long long (e.g. arc), and
> some return s64 (e.g. x86).

Yes, Mark said it all, I'm just chiming in to confirm that's why I added the
cast.

Btw, thanks for doing this, Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-11 20:28       ` Daniel Jordan
@ 2019-04-16 23:33         ` Andrew Morton
  2019-04-22 15:54           ` Daniel Jordan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2019-04-16 23:33 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Mark Rutland, Alexey Kardashevskiy, Alan Tull, Alex Williamson,
	Benjamin Herrenschmidt, Christoph Lameter, Davidlohr Bueso,
	Michael Ellerman, Moritz Fischer, Paul Mackerras, Wu Hao,
	linux-mm, kvm, kvm-ppc, linuxppc-dev, linux-fpga, linux-kernel

On Thu, 11 Apr 2019 16:28:07 -0400 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> On Thu, Apr 11, 2019 at 10:55:43AM +0100, Mark Rutland wrote:
> > On Thu, Apr 11, 2019 at 02:22:23PM +1000, Alexey Kardashevskiy wrote:
> > > On 03/04/2019 07:41, Daniel Jordan wrote:
> > 
> > > > -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> > > > +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
> > > >  		incr ? '+' : '-', npages << PAGE_SHIFT,
> > > > -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> > > > -		ret ? "- exceeded" : "");
> > > > +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> > > > +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
> > > 
> > > 
> > > 
> > > atomic64_read() returns "long" which matches "%ld", why this change (and
> > > similar below)? You did not do this in the two pr_debug()s above anyway.
> > 
> > Unfortunately, architectures return inconsistent types for atomic64 ops.
> > 
> > Some return long (e..g. powerpc), some return long long (e.g. arc), and
> > some return s64 (e.g. x86).
> 
> Yes, Mark said it all, I'm just chiming in to confirm that's why I added the
> cast.
> 
> Btw, thanks for doing this, Mark.

What's the status of this patchset, btw?

I have a note here that
powerpc-mmu-drop-mmap_sem-now-that-locked_vm-is-atomic.patch is to be
updated.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
  2019-04-16 23:33         ` Andrew Morton
@ 2019-04-22 15:54           ` Daniel Jordan
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-22 15:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, Mark Rutland, Alexey Kardashevskiy, Alan Tull,
	Alex Williamson, Benjamin Herrenschmidt, Christoph Lameter,
	Davidlohr Bueso, Michael Ellerman, Moritz Fischer,
	Paul Mackerras, Wu Hao, linux-mm, kvm, kvm-ppc, linuxppc-dev,
	linux-fpga, linux-kernel

On Tue, Apr 16, 2019 at 04:33:51PM -0700, Andrew Morton wrote:

Sorry for the delay, I was on vacation all last week.

> What's the status of this patchset, btw?
> 
> I have a note here that
> powerpc-mmu-drop-mmap_sem-now-that-locked_vm-is-atomic.patch is to be
> updated.

Yes, the series needs a few updates.  v2 should appear in the next day or two.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-03 16:40     ` Daniel Jordan
@ 2019-04-24  2:15       ` Davidlohr Bueso
  2019-04-24  2:31         ` Davidlohr Bueso
  2019-04-24 11:10         ` Jason Gunthorpe
  0 siblings, 2 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2019-04-24  2:15 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Christophe Leroy, akpm, Alexey Kardashevskiy, linux-kernel,
	linux-mm, Paul Mackerras, Christoph Lameter, linuxppc-dev, jgg

On Wed, 03 Apr 2019, Daniel Jordan wrote:

>On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
>> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
>> > With locked_vm now an atomic, there is no need to take mmap_sem as
>> > writer.  Delete and refactor accordingly.
>>
>> Could you please detail the change ?
>
>Ok, I'll be more specific in the next version, using some of your language in
>fact.  :)
>
>> It looks like this is not the only
>> change. I'm wondering what the consequences are.
>>
>> Before we did:
>> - lock
>> - calculate future value
>> - check the future value is acceptable
>> - update value if future value acceptable
>> - return error if future value non acceptable
>> - unlock
>>
>> Now we do:
>> - atomic update with future (possibly too high) value
>> - check the new value is acceptable
>> - atomic update back with older value if new value not acceptable and return
>> error
>>
>> So if a concurrent action wants to increase locked_vm with an acceptable
>> step while another one has temporarily set it too high, it will now fail.
>>
>> I think we should keep the previous approach and do a cmpxchg after
>> validating the new value.

Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
validating the new value and the cmpxchg() and we'd bogusly fail even when there
is still just because the value changed (I'm assuming we don't hold any locks,
otherwise all this is pointless).

   current_locked = atomic_read(&mm->locked_vm);
   new_locked = current_locked + npages;
   if (new_locked < lock_limit)
      if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
      	 /* ENOMEM */

>
>That's a good idea, and especially worth doing considering that an arbitrary
>number of threads that charge a low amount of locked_vm can fail just because
>one thread charges lots of it.

Yeah but the window for this is quite small, I doubt it would be a real issue.

What if before doing the atomic_add_return(), we first did the racy new_locked
check for ENOMEM, then do the speculative add and cleanup, if necessary. This
would further reduce the scope of the window where false ENOMEM can occur.

>pinned_vm appears to be broken the same way, so I can fix it too unless someone
>beats me to it.

This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-24  2:15       ` Davidlohr Bueso
@ 2019-04-24  2:31         ` Davidlohr Bueso
  2019-04-24 11:10         ` Jason Gunthorpe
  1 sibling, 0 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2019-04-24  2:31 UTC (permalink / raw)
  To: Daniel Jordan, Christophe Leroy, akpm, Alexey Kardashevskiy,
	linux-kernel, linux-mm, Paul Mackerras, Christoph Lameter,
	linuxppc-dev, jgg

On Tue, 23 Apr 2019, Bueso wrote:

>On Wed, 03 Apr 2019, Daniel Jordan wrote:
>
>>On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
>>>Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
>>>> With locked_vm now an atomic, there is no need to take mmap_sem as
>>>> writer.  Delete and refactor accordingly.
>>>
>>>Could you please detail the change ?
>>
>>Ok, I'll be more specific in the next version, using some of your language in
>>fact.  :)
>>
>>>It looks like this is not the only
>>>change. I'm wondering what the consequences are.
>>>
>>>Before we did:
>>>- lock
>>>- calculate future value
>>>- check the future value is acceptable
>>>- update value if future value acceptable
>>>- return error if future value non acceptable
>>>- unlock
>>>
>>>Now we do:
>>>- atomic update with future (possibly too high) value
>>>- check the new value is acceptable
>>>- atomic update back with older value if new value not acceptable and return
>>>error
>>>
>>>So if a concurrent action wants to increase locked_vm with an acceptable
>>>step while another one has temporarily set it too high, it will now fail.
>>>
>>>I think we should keep the previous approach and do a cmpxchg after
>>>validating the new value.
>
>Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
>validating the new value and the cmpxchg() and we'd bogusly fail even when there
>is still just because the value changed (I'm assuming we don't hold any locks,
>otherwise all this is pointless).
>
>  current_locked = atomic_read(&mm->locked_vm);
>  new_locked = current_locked + npages;
>  if (new_locked < lock_limit)
>     if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)

Err, this being != of course.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-24  2:15       ` Davidlohr Bueso
  2019-04-24  2:31         ` Davidlohr Bueso
@ 2019-04-24 11:10         ` Jason Gunthorpe
  2019-04-25  1:47           ` Daniel Jordan
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2019-04-24 11:10 UTC (permalink / raw)
  To: Daniel Jordan, Christophe Leroy, akpm, Alexey Kardashevskiy,
	linux-kernel, linux-mm, Paul Mackerras, Christoph Lameter,
	linuxppc-dev

On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote:
> On Wed, 03 Apr 2019, Daniel Jordan wrote:
> 
> > On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
> > > Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > > > With locked_vm now an atomic, there is no need to take mmap_sem as
> > > > writer.  Delete and refactor accordingly.
> > > 
> > > Could you please detail the change ?
> > 
> > Ok, I'll be more specific in the next version, using some of your language in
> > fact.  :)
> > 
> > > It looks like this is not the only
> > > change. I'm wondering what the consequences are.
> > > 
> > > Before we did:
> > > - lock
> > > - calculate future value
> > > - check the future value is acceptable
> > > - update value if future value acceptable
> > > - return error if future value non acceptable
> > > - unlock
> > > 
> > > Now we do:
> > > - atomic update with future (possibly too high) value
> > > - check the new value is acceptable
> > > - atomic update back with older value if new value not acceptable and return
> > > error
> > > 
> > > So if a concurrent action wants to increase locked_vm with an acceptable
> > > step while another one has temporarily set it too high, it will now fail.
> > > 
> > > I think we should keep the previous approach and do a cmpxchg after
> > > validating the new value.
> 
> Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
> validating the new value and the cmpxchg() and we'd bogusly fail even when there
> is still just because the value changed (I'm assuming we don't hold any locks,
> otherwise all this is pointless).
> 
>   current_locked = atomic_read(&mm->locked_vm);
>   new_locked = current_locked + npages;
>   if (new_locked < lock_limit)
>      if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
>      	 /* ENOMEM */

Well it needs a loop..

again:
   current_locked = atomic_read(&mm->locked_vm);
   new_locked = current_locked + npages;
   if (new_locked < lock_limit)
      if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked)
            goto again;

So it won't have bogus failures as there is no unwind after
error. Basically this is a load locked/store conditional style of
locking pattern.

> > That's a good idea, and especially worth doing considering that an arbitrary
> > number of threads that charge a low amount of locked_vm can fail just because
> > one thread charges lots of it.
> 
> Yeah but the window for this is quite small, I doubt it would be a real issue.
> 
> What if before doing the atomic_add_return(), we first did the racy new_locked
> check for ENOMEM, then do the speculative add and cleanup, if necessary. This
> would further reduce the scope of the window where false ENOMEM can occur.
> 
> > pinned_vm appears to be broken the same way, so I can fix it too unless someone
> > beats me to it.
> 
> This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.

I think we accepted this tiny race as a side effect of removing the
lock, which was very beneficial. Really the time window between the
atomic failing and unwind is very small, and there are enough other
ways a hostile user could DOS locked_vm that I don't think it really
matters in practice..

However, the cmpxchg seems better, so a helper to implement that would
probably be the best thing to do.

Jason

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
  2019-04-24 11:10         ` Jason Gunthorpe
@ 2019-04-25  1:47           ` Daniel Jordan
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Jordan @ 2019-04-25  1:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Jordan, Christophe Leroy, akpm, Alexey Kardashevskiy,
	linux-kernel, linux-mm, Paul Mackerras, Christoph Lameter,
	linuxppc-dev

On Wed, Apr 24, 2019 at 11:10:24AM +0000, Jason Gunthorpe wrote:
> On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote:
> > Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
> > validating the new value and the cmpxchg() and we'd bogusly fail even when there
> > is still just because the value changed (I'm assuming we don't hold any locks,
> > otherwise all this is pointless).

That's true, I hadn't considered that we could retry even when there's enough
locked_vm.  Seems like another one is that RLIMIT_MEMLOCK could change after
it's read.  I guess nothing's going to be perfect.  :/

> Well it needs a loop..
> 
> again:
>    current_locked = atomic_read(&mm->locked_vm);
>    new_locked = current_locked + npages;
>    if (new_locked < lock_limit)
>       if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked)
>             goto again;
> 
> So it won't have bogus failures as there is no unwind after
> error. Basically this is a load locked/store conditional style of
> locking pattern.

This is basically what I have so far.

> > > That's a good idea, and especially worth doing considering that an arbitrary
> > > number of threads that charge a low amount of locked_vm can fail just because
> > > one thread charges lots of it.
> > 
> > Yeah but the window for this is quite small, I doubt it would be a real issue.
>
> > What if before doing the atomic_add_return(), we first did the racy new_locked
> > check for ENOMEM, then do the speculative add and cleanup, if necessary. This
> > would further reduce the scope of the window where false ENOMEM can occur.

So the upside of this is that there's no retry loop so tasks don't spin under
heavy contention?  Seems better to always guard against false ENOMEM, at least
from the locked_vm side if not from the rlimit changing.

> > > pinned_vm appears to be broken the same way, so I can fix it too unless someone
> > > beats me to it.
> > 
> > This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.
> 
> I think we accepted this tiny race as a side effect of removing the
> lock, which was very beneficial. Really the time window between the
> atomic failing and unwind is very small, and there are enough other
> ways a hostile user could DOS locked_vm that I don't think it really
> matters in practice..
> 
> However, the cmpxchg seems better, so a helper to implement that would
> probably be the best thing to do.

I've collapsed all the locked_vm users into such a helper and am now working on
converting the pinned_vm users to the same helper.  Taking longer than I
thought.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-04-25  1:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02 20:41 [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Daniel Jordan
2019-04-02 20:41 ` [PATCH 1/6] mm: change locked_vm's type " Daniel Jordan
2019-04-02 22:04   ` Andrew Morton
2019-04-02 23:43     ` Davidlohr Bueso
2019-04-03 16:07       ` Daniel Jordan
2019-04-03 15:58     ` Daniel Jordan
2019-04-03  4:46   ` Christophe Leroy
2019-04-03 16:09     ` Daniel Jordan
2019-04-11  4:22   ` Alexey Kardashevskiy
2019-04-11  9:55     ` Mark Rutland
2019-04-11 20:28       ` Daniel Jordan
2019-04-16 23:33         ` Andrew Morton
2019-04-22 15:54           ` Daniel Jordan
2019-04-02 20:41 ` [PATCH 2/6] vfio/type1: drop mmap_sem now that locked_vm is atomic Daniel Jordan
2019-04-02 20:41 ` [PATCH 3/6] vfio/spapr_tce: " Daniel Jordan
2019-04-02 20:41 ` [PATCH 4/6] fpga/dlf/afu: " Daniel Jordan
2019-04-02 20:41 ` [PATCH 5/6] powerpc/mmu: " Daniel Jordan
2019-04-03  4:58   ` Christophe Leroy
2019-04-03 16:40     ` Daniel Jordan
2019-04-24  2:15       ` Davidlohr Bueso
2019-04-24  2:31         ` Davidlohr Bueso
2019-04-24 11:10         ` Jason Gunthorpe
2019-04-25  1:47           ` Daniel Jordan
2019-04-02 20:41 ` [PATCH 6/6] kvm/book3s: " Daniel Jordan
2019-04-03 12:51 ` [PATCH 0/6] convert locked_vm from unsigned long to atomic64_t Steven Sistare
2019-04-03 16:52   ` Daniel Jordan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).