linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] mremap: move_vma() fixes
@ 2020-10-13  1:34 Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 1/6] mm/mremap: Account memory on do_munmap() failure Dmitry Safonov
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

1 - seems to be historical issue on a rarely taken path
2,3 - fixes related to the new mremap() flag
5 - dax device/hugetlbfs possible issue

4,6 - refactoring

As those seems to be actual issues, sending this during the merge-window.

(Changes to architecture code are in the 6 patch, but Cc'ing
maintainers on cover for the context, I hope it's fine).

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>

Cc: linux-aio@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org

Dmitry Safonov (6):
  mm/mremap: Account memory on do_munmap() failure
  mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
  mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio
  vm_ops: Rename .split() callback to .may_split()
  mremap: Check if it's possible to split original vma
  mm: Forbid splitting special mappings

 arch/arm/kernel/vdso.c                    |  9 ----
 arch/arm64/kernel/vdso.c                  | 41 ++-----------------
 arch/mips/vdso/genvdso.c                  |  4 --
 arch/s390/kernel/vdso.c                   | 11 +----
 arch/x86/entry/vdso/vma.c                 | 17 --------
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  2 +-
 drivers/dax/device.c                      |  4 +-
 fs/aio.c                                  |  5 ++-
 include/linux/mm.h                        |  5 ++-
 ipc/shm.c                                 |  8 ++--
 mm/hugetlb.c                              |  2 +-
 mm/mmap.c                                 | 22 ++++++++--
 mm/mremap.c                               | 50 +++++++++++------------
 13 files changed, 63 insertions(+), 117 deletions(-)


base-commit: bbf5c979011a099af5dc76498918ed7df445635b
-- 
2.28.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] mm/mremap: Account memory on do_munmap() failure
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm() Dmitry Safonov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

move_vma() copies VMA without adding it to account, then unmaps old part
of VMA. On failure it unmaps the new VMA. With hacks accounting in
munmap is disabled as it's a copy of existing VMA.

Account the memory on munmap() failure which was previously copied into
a new VMA.

Fixes: commit e2ea83742133 ("[PATCH] mremap: move_vma fixes and cleanup")
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 mm/mremap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 138abbae4f75..03d31a0d4c67 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -450,7 +450,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 
 	if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
 		/* OOM: unable to split vma, just get accounts right */
-		vm_unacct_memory(excess >> PAGE_SHIFT);
+		if (vm_flags & VM_ACCOUNT)
+			vm_acct_memory(new_len >> PAGE_SHIFT);
 		excess = 0;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 1/6] mm/mremap: Account memory on do_munmap() failure Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2020-12-28 18:21   ` Brian Geffon
  2020-10-13  1:34 ` [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio Dmitry Safonov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

Currently memory is accounted post-mremap() with MREMAP_DONTUNMAP, which
may break overcommit policy. So, check if there's enough memory before
doing actual VMA copy.

Don't unset VM_ACCOUNT on MREMAP_DONTUNMAP. By semantics, such mremap()
is actually a memory allocation. That also simplifies the error-path a
little.

Also, as it's memory allocation on success don't reset hiwater_vm value.

Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 mm/mremap.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 03d31a0d4c67..c248f9a52125 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -365,11 +365,19 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	if (err)
 		return err;
 
+	if (unlikely(flags & MREMAP_DONTUNMAP && vm_flags & VM_ACCOUNT)) {
+		if (security_vm_enough_memory_mm(mm, new_len >> PAGE_SHIFT))
+			return -ENOMEM;
+	}
+
 	new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
 	new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff,
 			   &need_rmap_locks);
-	if (!new_vma)
+	if (!new_vma) {
+		if (unlikely(flags & MREMAP_DONTUNMAP && vm_flags & VM_ACCOUNT))
+			vm_unacct_memory(new_len >> PAGE_SHIFT);
 		return -ENOMEM;
+	}
 
 	moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
 				     need_rmap_locks);
@@ -398,7 +406,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	}
 
 	/* Conceal VM_ACCOUNT so old reservation is not undone */
-	if (vm_flags & VM_ACCOUNT) {
+	if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
 		vma->vm_flags &= ~VM_ACCOUNT;
 		excess = vma->vm_end - vma->vm_start - old_len;
 		if (old_addr > vma->vm_start &&
@@ -423,34 +431,16 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		untrack_pfn_moved(vma);
 
 	if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
-		if (vm_flags & VM_ACCOUNT) {
-			/* Always put back VM_ACCOUNT since we won't unmap */
-			vma->vm_flags |= VM_ACCOUNT;
-
-			vm_acct_memory(new_len >> PAGE_SHIFT);
-		}
-
-		/*
-		 * VMAs can actually be merged back together in copy_vma
-		 * calling merge_vma. This can happen with anonymous vmas
-		 * which have not yet been faulted, so if we were to consider
-		 * this VMA split we'll end up adding VM_ACCOUNT on the
-		 * next VMA, which is completely unrelated if this VMA
-		 * was re-merged.
-		 */
-		if (split && new_vma == vma)
-			split = 0;
-
 		/* We always clear VM_LOCKED[ONFAULT] on the old vma */
 		vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
 
 		/* Because we won't unmap we don't need to touch locked_vm */
-		goto out;
+		return new_addr;
 	}
 
 	if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
 		/* OOM: unable to split vma, just get accounts right */
-		if (vm_flags & VM_ACCOUNT)
+		if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
 			vm_acct_memory(new_len >> PAGE_SHIFT);
 		excess = 0;
 	}
@@ -459,7 +449,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 		mm->locked_vm += new_len >> PAGE_SHIFT;
 		*locked = true;
 	}
-out:
+
 	mm->hiwater_vm = hiwater_vm;
 
 	/* Restore VM_ACCOUNT if one or two pieces of vma left */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 1/6] mm/mremap: Account memory on do_munmap() failure Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm() Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2020-12-28 18:03   ` Brian Geffon
  2020-10-13  1:34 ` [PATCH 4/6] vm_ops: Rename .split() callback to .may_split() Dmitry Safonov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

As kernel expect to see only one of such mappings, any further
operations on the VMA-copy may be unexpected by the kernel.
Maybe it's being on the safe side, but there doesn't seem to be any
expected use-case for this, so restrict it now.

Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 fs/aio.c                                  | 5 ++++-
 include/linux/mm.h                        | 2 +-
 mm/mmap.c                                 | 6 +++++-
 mm/mremap.c                               | 2 +-
 5 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 0daf2f1cf7a8..e916646adc69 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
 {
 	/* Not supported */
 	return -EINVAL;
diff --git a/fs/aio.c b/fs/aio.c
index d5ec30385566..3be3c0f77548 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -324,13 +324,16 @@ static void aio_free_ring(struct kioctx *ctx)
 	}
 }
 
-static int aio_ring_mremap(struct vm_area_struct *vma)
+static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
 {
 	struct file *file = vma->vm_file;
 	struct mm_struct *mm = vma->vm_mm;
 	struct kioctx_table *table;
 	int i, res = -EINVAL;
 
+	if (flags & MREMAP_DONTUNMAP)
+		return -EINVAL;
+
 	spin_lock(&mm->ioctx_lock);
 	rcu_read_lock();
 	table = rcu_dereference(mm->ioctx_table);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..fd51a4a1f722 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -550,7 +550,7 @@ struct vm_operations_struct {
 	void (*open)(struct vm_area_struct * area);
 	void (*close)(struct vm_area_struct * area);
 	int (*split)(struct vm_area_struct * area, unsigned long addr);
-	int (*mremap)(struct vm_area_struct * area);
+	int (*mremap)(struct vm_area_struct *area, unsigned long flags);
 	vm_fault_t (*fault)(struct vm_fault *vmf);
 	vm_fault_t (*huge_fault)(struct vm_fault *vmf,
 			enum page_entry_size pe_size);
diff --git a/mm/mmap.c b/mm/mmap.c
index bdd19f5b994e..50f853b0ec39 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3372,10 +3372,14 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
 	return ((struct vm_special_mapping *)vma->vm_private_data)->name;
 }
 
-static int special_mapping_mremap(struct vm_area_struct *new_vma)
+static int special_mapping_mremap(struct vm_area_struct *new_vma,
+				  unsigned long flags)
 {
 	struct vm_special_mapping *sm = new_vma->vm_private_data;
 
+	if (flags & MREMAP_DONTUNMAP)
+		return -EINVAL;
+
 	if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
 		return -EFAULT;
 
diff --git a/mm/mremap.c b/mm/mremap.c
index c248f9a52125..898e9818ba6d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -384,7 +384,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	if (moved_len < old_len) {
 		err = -ENOMEM;
 	} else if (vma->vm_ops && vma->vm_ops->mremap) {
-		err = vma->vm_ops->mremap(new_vma);
+		err = vma->vm_ops->mremap(new_vma, flags);
 	}
 
 	if (unlikely(err)) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] vm_ops: Rename .split() callback to .may_split()
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
                   ` (2 preceding siblings ...)
  2020-10-13  1:34 ` [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 5/6] mremap: Check if it's possible to split original vma Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 6/6] mm: Forbid splitting special mappings Dmitry Safonov
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

Rename the callback to reflect that it's not called *on* or *after*
split, but rather some time before the splitting to check if it's
possible.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/dax/device.c | 4 ++--
 include/linux/mm.h   | 3 ++-
 ipc/shm.c            | 8 ++++----
 mm/hugetlb.c         | 2 +-
 mm/mmap.c            | 4 ++--
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 1e89513f3c59..223dd1d13d5c 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -276,7 +276,7 @@ static vm_fault_t dev_dax_fault(struct vm_fault *vmf)
 	return dev_dax_huge_fault(vmf, PE_SIZE_PTE);
 }
 
-static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
+static int dev_dax_may_split(struct vm_area_struct *vma, unsigned long addr)
 {
 	struct file *filp = vma->vm_file;
 	struct dev_dax *dev_dax = filp->private_data;
@@ -299,7 +299,7 @@ static unsigned long dev_dax_pagesize(struct vm_area_struct *vma)
 static const struct vm_operations_struct dax_vm_ops = {
 	.fault = dev_dax_fault,
 	.huge_fault = dev_dax_huge_fault,
-	.split = dev_dax_split,
+	.may_split = dev_dax_may_split,
 	.pagesize = dev_dax_pagesize,
 };
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fd51a4a1f722..90887661ef44 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -549,7 +549,8 @@ enum page_entry_size {
 struct vm_operations_struct {
 	void (*open)(struct vm_area_struct * area);
 	void (*close)(struct vm_area_struct * area);
-	int (*split)(struct vm_area_struct * area, unsigned long addr);
+	/* Called any time before splitting to check if it's allowed */
+	int (*may_split)(struct vm_area_struct *area, unsigned long addr);
 	int (*mremap)(struct vm_area_struct *area, unsigned long flags);
 	vm_fault_t (*fault)(struct vm_fault *vmf);
 	vm_fault_t (*huge_fault)(struct vm_fault *vmf,
diff --git a/ipc/shm.c b/ipc/shm.c
index e25c7c6106bc..febd88daba8c 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -434,13 +434,13 @@ static vm_fault_t shm_fault(struct vm_fault *vmf)
 	return sfd->vm_ops->fault(vmf);
 }
 
-static int shm_split(struct vm_area_struct *vma, unsigned long addr)
+static int shm_may_split(struct vm_area_struct *vma, unsigned long addr)
 {
 	struct file *file = vma->vm_file;
 	struct shm_file_data *sfd = shm_file_data(file);
 
-	if (sfd->vm_ops->split)
-		return sfd->vm_ops->split(vma, addr);
+	if (sfd->vm_ops->may_split)
+		return sfd->vm_ops->may_split(vma, addr);
 
 	return 0;
 }
@@ -582,7 +582,7 @@ static const struct vm_operations_struct shm_vm_ops = {
 	.open	= shm_open,	/* callback for a new vm-area open */
 	.close	= shm_close,	/* callback for when the vm-area is released */
 	.fault	= shm_fault,
-	.split	= shm_split,
+	.may_split = shm_may_split,
 	.pagesize = shm_pagesize,
 #if defined(CONFIG_NUMA)
 	.set_policy = shm_set_policy,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 67fc6383995b..c4dc0e453be1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3759,7 +3759,7 @@ const struct vm_operations_struct hugetlb_vm_ops = {
 	.fault = hugetlb_vm_op_fault,
 	.open = hugetlb_vm_op_open,
 	.close = hugetlb_vm_op_close,
-	.split = hugetlb_vm_op_split,
+	.may_split = hugetlb_vm_op_split,
 	.pagesize = hugetlb_vm_op_pagesize,
 };
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 50f853b0ec39..a62cb3ccafce 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2693,8 +2693,8 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct vm_area_struct *new;
 	int err;
 
-	if (vma->vm_ops && vma->vm_ops->split) {
-		err = vma->vm_ops->split(vma, addr);
+	if (vma->vm_ops && vma->vm_ops->may_split) {
+		err = vma->vm_ops->may_split(vma, addr);
 		if (err)
 			return err;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] mremap: Check if it's possible to split original vma
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
                   ` (3 preceding siblings ...)
  2020-10-13  1:34 ` [PATCH 4/6] vm_ops: Rename .split() callback to .may_split() Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2020-10-13  1:34 ` [PATCH 6/6] mm: Forbid splitting special mappings Dmitry Safonov
  5 siblings, 0 replies; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

If original VMA can't be split at the desired address, do_munmap() will
fail and leave both new-copied VMA and old VMA. De-facto it's
MREMAP_DONTUNMAP behaviour, which is unexpected.

Currently, it may fail such way for hugetlbfs and dax device mappings.

Minimize such unpleasant situations to OOM by checking .may_split()
before attempting to create a VMA copy.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 mm/mremap.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 898e9818ba6d..3c4047c23673 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -343,7 +343,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	unsigned long excess = 0;
 	unsigned long hiwater_vm;
 	int split = 0;
-	int err;
+	int err = 0;
 	bool need_rmap_locks;
 
 	/*
@@ -353,6 +353,15 @@ static unsigned long move_vma(struct vm_area_struct *vma,
 	if (mm->map_count >= sysctl_max_map_count - 3)
 		return -ENOMEM;
 
+	if (vma->vm_ops && vma->vm_ops->may_split) {
+		if (vma->vm_start != old_addr)
+			err = vma->vm_ops->may_split(vma, old_addr);
+		if (!err && vma->vm_end != old_addr + old_len)
+			err = vma->vm_ops->may_split(vma, old_addr + old_len);
+		if (err)
+			return err;
+	}
+
 	/*
 	 * Advise KSM to break any KSM pages in the area to be moved:
 	 * it would be confusing if they were to turn up at the new
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] mm: Forbid splitting special mappings
  2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
                   ` (4 preceding siblings ...)
  2020-10-13  1:34 ` [PATCH 5/6] mremap: Check if it's possible to split original vma Dmitry Safonov
@ 2020-10-13  1:34 ` Dmitry Safonov
  2021-01-22 12:58   ` Will Deacon
  5 siblings, 1 reply; 16+ messages in thread
From: Dmitry Safonov @ 2020-10-13  1:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, Will Deacon,
	linux-aio, linux-fsdevel, linux-mm

Don't allow splitting of vm_special_mapping's.
It affects vdso/vvar areas. Uprobes have only one page in xol_area so
they aren't affected.

Those restrictions were enforced by checks in .mremap() callbacks.
Restrict resizing with generic .split() callback.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/arm/kernel/vdso.c    |  9 ---------
 arch/arm64/kernel/vdso.c  | 41 +++------------------------------------
 arch/mips/vdso/genvdso.c  |  4 ----
 arch/s390/kernel/vdso.c   | 11 +----------
 arch/x86/entry/vdso/vma.c | 17 ----------------
 mm/mmap.c                 | 12 ++++++++++++
 6 files changed, 16 insertions(+), 78 deletions(-)

diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c
index fddd08a6e063..3408269d19c7 100644
--- a/arch/arm/kernel/vdso.c
+++ b/arch/arm/kernel/vdso.c
@@ -50,15 +50,6 @@ static const struct vm_special_mapping vdso_data_mapping = {
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
-	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
-	unsigned long vdso_size;
-
-	/* without VVAR page */
-	vdso_size = (vdso_total_pages - 1) << PAGE_SHIFT;
-
-	if (vdso_size != new_size)
-		return -EINVAL;
-
 	current->mm->context.vdso = new_vma->vm_start;
 
 	return 0;
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index d4202a32abc9..a1a4220a405b 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -82,17 +82,9 @@ static union {
 } vdso_data_store __page_aligned_data;
 struct vdso_data *vdso_data = vdso_data_store.data;
 
-static int __vdso_remap(enum vdso_abi abi,
-			const struct vm_special_mapping *sm,
-			struct vm_area_struct *new_vma)
+static int vdso_mremap(const struct vm_special_mapping *sm,
+		struct vm_area_struct *new_vma)
 {
-	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
-	unsigned long vdso_size = vdso_info[abi].vdso_code_end -
-				  vdso_info[abi].vdso_code_start;
-
-	if (vdso_size != new_size)
-		return -EINVAL;
-
 	current->mm->context.vdso = (void *)new_vma->vm_start;
 
 	return 0;
@@ -223,17 +215,6 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 	return vmf_insert_pfn(vma, vmf->address, pfn);
 }
 
-static int vvar_mremap(const struct vm_special_mapping *sm,
-		       struct vm_area_struct *new_vma)
-{
-	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
-
-	if (new_size != VVAR_NR_PAGES * PAGE_SIZE)
-		return -EINVAL;
-
-	return 0;
-}
-
 static int __setup_additional_pages(enum vdso_abi abi,
 				    struct mm_struct *mm,
 				    struct linux_binprm *bprm,
@@ -284,14 +265,6 @@ static int __setup_additional_pages(enum vdso_abi abi,
 /*
  * Create and map the vectors page for AArch32 tasks.
  */
-#ifdef CONFIG_COMPAT_VDSO
-static int aarch32_vdso_mremap(const struct vm_special_mapping *sm,
-		struct vm_area_struct *new_vma)
-{
-	return __vdso_remap(VDSO_ABI_AA32, sm, new_vma);
-}
-#endif /* CONFIG_COMPAT_VDSO */
-
 enum aarch32_map {
 	AA32_MAP_VECTORS, /* kuser helpers */
 #ifdef CONFIG_COMPAT_VDSO
@@ -313,11 +286,10 @@ static struct vm_special_mapping aarch32_vdso_maps[] = {
 	[AA32_MAP_VVAR] = {
 		.name = "[vvar]",
 		.fault = vvar_fault,
-		.mremap = vvar_mremap,
 	},
 	[AA32_MAP_VDSO] = {
 		.name = "[vdso]",
-		.mremap = aarch32_vdso_mremap,
+		.mremap = vdso_mremap,
 	},
 #endif /* CONFIG_COMPAT_VDSO */
 	[AA32_MAP_SIGPAGE] = {
@@ -465,12 +437,6 @@ int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 }
 #endif /* CONFIG_COMPAT */
 
-static int vdso_mremap(const struct vm_special_mapping *sm,
-		struct vm_area_struct *new_vma)
-{
-	return __vdso_remap(VDSO_ABI_AA64, sm, new_vma);
-}
-
 enum aarch64_map {
 	AA64_MAP_VVAR,
 	AA64_MAP_VDSO,
@@ -480,7 +446,6 @@ static struct vm_special_mapping aarch64_vdso_maps[] __ro_after_init = {
 	[AA64_MAP_VVAR] = {
 		.name	= "[vvar]",
 		.fault = vvar_fault,
-		.mremap = vvar_mremap,
 	},
 	[AA64_MAP_VDSO] = {
 		.name	= "[vdso]",
diff --git a/arch/mips/vdso/genvdso.c b/arch/mips/vdso/genvdso.c
index abb06ae04b40..09e30eb4be86 100644
--- a/arch/mips/vdso/genvdso.c
+++ b/arch/mips/vdso/genvdso.c
@@ -263,10 +263,6 @@ int main(int argc, char **argv)
 	fprintf(out_file, "	const struct vm_special_mapping *sm,\n");
 	fprintf(out_file, "	struct vm_area_struct *new_vma)\n");
 	fprintf(out_file, "{\n");
-	fprintf(out_file, "	unsigned long new_size =\n");
-	fprintf(out_file, "	new_vma->vm_end - new_vma->vm_start;\n");
-	fprintf(out_file, "	if (vdso_image.size != new_size)\n");
-	fprintf(out_file, "		return -EINVAL;\n");
 	fprintf(out_file, "	current->mm->context.vdso =\n");
 	fprintf(out_file, "	(void *)(new_vma->vm_start);\n");
 	fprintf(out_file, "	return 0;\n");
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index c4baefaa6e34..291ead792d64 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -59,17 +59,8 @@ static vm_fault_t vdso_fault(const struct vm_special_mapping *sm,
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		       struct vm_area_struct *vma)
 {
-	unsigned long vdso_pages;
-
-	vdso_pages = vdso64_pages;
-
-	if ((vdso_pages << PAGE_SHIFT) != vma->vm_end - vma->vm_start)
-		return -EINVAL;
-
-	if (WARN_ON_ONCE(current->mm != vma->vm_mm))
-		return -EFAULT;
-
 	current->mm->context.vdso_base = vma->vm_start;
+
 	return 0;
 }
 
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 9185cb1d13b9..0c942b31825d 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -89,30 +89,14 @@ static void vdso_fix_landing(const struct vdso_image *image,
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
-	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
 	const struct vdso_image *image = current->mm->context.vdso_image;
 
-	if (image->size != new_size)
-		return -EINVAL;
-
 	vdso_fix_landing(image, new_vma);
 	current->mm->context.vdso = (void __user *)new_vma->vm_start;
 
 	return 0;
 }
 
-static int vvar_mremap(const struct vm_special_mapping *sm,
-		struct vm_area_struct *new_vma)
-{
-	const struct vdso_image *image = new_vma->vm_mm->context.vdso_image;
-	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
-
-	if (new_size != -image->sym_vvar_start)
-		return -EINVAL;
-
-	return 0;
-}
-
 #ifdef CONFIG_TIME_NS
 static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
 {
@@ -252,7 +236,6 @@ static const struct vm_special_mapping vdso_mapping = {
 static const struct vm_special_mapping vvar_mapping = {
 	.name = "[vvar]",
 	.fault = vvar_fault,
-	.mremap = vvar_mremap,
 };
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
index a62cb3ccafce..41100f6505ff 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3389,6 +3389,17 @@ static int special_mapping_mremap(struct vm_area_struct *new_vma,
 	return 0;
 }
 
+static int special_mapping_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	/*
+	 * Forbid splitting special mappings - kernel has expectations over
+	 * the number of pages in mapping. Together with VM_DONTEXPAND
+	 * the size of vma should stay the same over the special mapping's
+	 * lifetime.
+	 */
+	return -EINVAL;
+}
+
 static const struct vm_operations_struct special_mapping_vmops = {
 	.close = special_mapping_close,
 	.fault = special_mapping_fault,
@@ -3396,6 +3407,7 @@ static const struct vm_operations_struct special_mapping_vmops = {
 	.name = special_mapping_name,
 	/* vDSO code relies that VVAR can't be accessed remotely */
 	.access = NULL,
+	.may_split = special_mapping_split,
 };
 
 static const struct vm_operations_struct legacy_special_mapping_vmops = {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio
  2020-10-13  1:34 ` [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio Dmitry Safonov
@ 2020-12-28 18:03   ` Brian Geffon
  2020-12-28 19:33     ` Dmitry Safonov
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Geffon @ 2020-12-28 18:03 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

I don't think this situation can ever happen MREMAP_DONTUNMAP is
already restricted to anonymous mappings (defined as not having
vm_ops) and vma_to_resize checks that the mapping is anonymous before
move_vma is called.



On Mon, Oct 12, 2020 at 6:34 PM Dmitry Safonov <dima@arista.com> wrote:
>
> As kernel expect to see only one of such mappings, any further
> operations on the VMA-copy may be unexpected by the kernel.
> Maybe it's being on the safe side, but there doesn't seem to be any
> expected use-case for this, so restrict it now.
>
> Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
>  fs/aio.c                                  | 5 ++++-
>  include/linux/mm.h                        | 2 +-
>  mm/mmap.c                                 | 6 +++++-
>  mm/mremap.c                               | 2 +-
>  5 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index 0daf2f1cf7a8..e916646adc69 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -1458,7 +1458,7 @@ static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
>         return 0;
>  }
>
> -static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
> +static int pseudo_lock_dev_mremap(struct vm_area_struct *area, unsigned long flags)
>  {
>         /* Not supported */
>         return -EINVAL;
> diff --git a/fs/aio.c b/fs/aio.c
> index d5ec30385566..3be3c0f77548 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -324,13 +324,16 @@ static void aio_free_ring(struct kioctx *ctx)
>         }
>  }
>
> -static int aio_ring_mremap(struct vm_area_struct *vma)
> +static int aio_ring_mremap(struct vm_area_struct *vma, unsigned long flags)
>  {
>         struct file *file = vma->vm_file;
>         struct mm_struct *mm = vma->vm_mm;
>         struct kioctx_table *table;
>         int i, res = -EINVAL;
>
> +       if (flags & MREMAP_DONTUNMAP)
> +               return -EINVAL;
> +
>         spin_lock(&mm->ioctx_lock);
>         rcu_read_lock();
>         table = rcu_dereference(mm->ioctx_table);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..fd51a4a1f722 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -550,7 +550,7 @@ struct vm_operations_struct {
>         void (*open)(struct vm_area_struct * area);
>         void (*close)(struct vm_area_struct * area);
>         int (*split)(struct vm_area_struct * area, unsigned long addr);
> -       int (*mremap)(struct vm_area_struct * area);
> +       int (*mremap)(struct vm_area_struct *area, unsigned long flags);
>         vm_fault_t (*fault)(struct vm_fault *vmf);
>         vm_fault_t (*huge_fault)(struct vm_fault *vmf,
>                         enum page_entry_size pe_size);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bdd19f5b994e..50f853b0ec39 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3372,10 +3372,14 @@ static const char *special_mapping_name(struct vm_area_struct *vma)
>         return ((struct vm_special_mapping *)vma->vm_private_data)->name;
>  }
>
> -static int special_mapping_mremap(struct vm_area_struct *new_vma)
> +static int special_mapping_mremap(struct vm_area_struct *new_vma,
> +                                 unsigned long flags)
>  {
>         struct vm_special_mapping *sm = new_vma->vm_private_data;
>
> +       if (flags & MREMAP_DONTUNMAP)
> +               return -EINVAL;
> +
>         if (WARN_ON_ONCE(current->mm != new_vma->vm_mm))
>                 return -EFAULT;
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index c248f9a52125..898e9818ba6d 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -384,7 +384,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
>         if (moved_len < old_len) {
>                 err = -ENOMEM;
>         } else if (vma->vm_ops && vma->vm_ops->mremap) {
> -               err = vma->vm_ops->mremap(new_vma);
> +               err = vma->vm_ops->mremap(new_vma, flags);
>         }
>
>         if (unlikely(err)) {
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
  2020-10-13  1:34 ` [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm() Dmitry Safonov
@ 2020-12-28 18:21   ` Brian Geffon
  2020-12-28 19:12     ` Dmitry Safonov
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Geffon @ 2020-12-28 18:21 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

This looks good to me with a small comment.

>         if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
>                 /* OOM: unable to split vma, just get accounts right */
> -               if (vm_flags & VM_ACCOUNT)
> +               if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
>                         vm_acct_memory(new_len >> PAGE_SHIFT);

Checking MREMAP_DONTUNMAP in the do_munmap path is unnecessary as
MREMAP_DONTUNMAP will have already returned by this point.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
  2020-12-28 18:21   ` Brian Geffon
@ 2020-12-28 19:12     ` Dmitry Safonov
  2020-12-30 15:43       ` Brian Geffon
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Safonov @ 2020-12-28 19:12 UTC (permalink / raw)
  To: Brian Geffon
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

On 12/28/20 6:21 PM, Brian Geffon wrote:
> This looks good to me with a small comment.
> 
>>         if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
>>                 /* OOM: unable to split vma, just get accounts right */
>> -               if (vm_flags & VM_ACCOUNT)
>> +               if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
>>                         vm_acct_memory(new_len >> PAGE_SHIFT);
> 
> Checking MREMAP_DONTUNMAP in the do_munmap path is unnecessary as
> MREMAP_DONTUNMAP will have already returned by this point.

In this code it is also used as err-path. In case move_page_tables()
fails to move all page tables or .mremap() callback fails, the new VMA
is unmapped.

IOW, MREMAP_DONTUNMAP returns under:
:	if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {

-- 
          Dima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio
  2020-12-28 18:03   ` Brian Geffon
@ 2020-12-28 19:33     ` Dmitry Safonov
  2020-12-28 19:43       ` Dmitry Safonov
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Safonov @ 2020-12-28 19:33 UTC (permalink / raw)
  To: Brian Geffon
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

[I moved your reply to avoid top-posting]

On 12/28/20 6:03 PM, Brian Geffon wrote:
> On Mon, Oct 12, 2020 at 6:34 PM Dmitry Safonov <dima@arista.com> wrote:
>>
>> As kernel expect to see only one of such mappings, any further
>> operations on the VMA-copy may be unexpected by the kernel.
>> Maybe it's being on the safe side, but there doesn't seem to be any
>> expected use-case for this, so restrict it now.
>>
>> Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>
> I don't think this situation can ever happen MREMAP_DONTUNMAP is
> already restricted to anonymous mappings (defined as not having
> vm_ops) and vma_to_resize checks that the mapping is anonymous before
> move_vma is called.

I've looked again now, I think it is possible. One can call
MREMAP_DONTUNMAP without MREMAP_FIXED and without resizing. So that the
old VMA is copied at some free address.

The calltrace would be: mremap()=>move_vma()
[under if (flags & MREMAP_MAYMOVE)].

On the other side I agree with you that the fix could have been better
if I realized the semantics that MREMAP_DONTUNMAP should only work with
anonymous mappings.

Probably, a better fix would be to move
:       if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
:                       vma->vm_flags & VM_SHARED))
:               return ERR_PTR(-EINVAL);

from vma_to_resize() into the mremap() syscall directly.
What do you think?

-- 
          Dima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio
  2020-12-28 19:33     ` Dmitry Safonov
@ 2020-12-28 19:43       ` Dmitry Safonov
  0 siblings, 0 replies; 16+ messages in thread
From: Dmitry Safonov @ 2020-12-28 19:43 UTC (permalink / raw)
  To: Brian Geffon
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

On 12/28/20 7:33 PM, Dmitry Safonov wrote:
> [I moved your reply to avoid top-posting]
> 
> On 12/28/20 6:03 PM, Brian Geffon wrote:
>> On Mon, Oct 12, 2020 at 6:34 PM Dmitry Safonov <dima@arista.com> wrote:
>>>
>>> As kernel expect to see only one of such mappings, any further
>>> operations on the VMA-copy may be unexpected by the kernel.
>>> Maybe it's being on the safe side, but there doesn't seem to be any
>>> expected use-case for this, so restrict it now.
>>>
>>> Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
>>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>>
>> I don't think this situation can ever happen MREMAP_DONTUNMAP is
>> already restricted to anonymous mappings (defined as not having
>> vm_ops) and vma_to_resize checks that the mapping is anonymous before
>> move_vma is called.
> 
> I've looked again now, I think it is possible. One can call
> MREMAP_DONTUNMAP without MREMAP_FIXED and without resizing. So that the
> old VMA is copied at some free address.
> 
> The calltrace would be: mremap()=>move_vma()
> [under if (flags & MREMAP_MAYMOVE)].
> 
> On the other side I agree with you that the fix could have been better
> if I realized the semantics that MREMAP_DONTUNMAP should only work with
> anonymous mappings.
> 
> Probably, a better fix would be to move
> :       if (flags & MREMAP_DONTUNMAP && (!vma_is_anonymous(vma) ||
> :                       vma->vm_flags & VM_SHARED))
> :               return ERR_PTR(-EINVAL);
> 
> from vma_to_resize() into the mremap() syscall directly.
> What do you think?

Ok, I've misread the code now, it checks vma_to_resize() before.
I'll send a revert to this one.

Thanks for noticing,
          Dima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
  2020-12-28 19:12     ` Dmitry Safonov
@ 2020-12-30 15:43       ` Brian Geffon
  0 siblings, 0 replies; 16+ messages in thread
From: Brian Geffon @ 2020-12-30 15:43 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: LKML, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Catalin Marinas, Dan Williams, Dave Jiang,
	Hugh Dickins, Ingo Molnar, Kirill A. Shutemov, Mike Kravetz,
	Minchan Kim, Russell King, Thomas Bogendoerfer, Thomas Gleixner,
	Vishal Verma, Vlastimil Babka, Will Deacon, linux-aio,
	linux-fsdevel, linux-mm

Ah, you're right. This individual patch looks good to me.

Brian

On Mon, Dec 28, 2020 at 11:12 AM Dmitry Safonov <dima@arista.com> wrote:
>
> On 12/28/20 6:21 PM, Brian Geffon wrote:
> > This looks good to me with a small comment.
> >
> >>         if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> >>                 /* OOM: unable to split vma, just get accounts right */
> >> -               if (vm_flags & VM_ACCOUNT)
> >> +               if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
> >>                         vm_acct_memory(new_len >> PAGE_SHIFT);
> >
> > Checking MREMAP_DONTUNMAP in the do_munmap path is unnecessary as
> > MREMAP_DONTUNMAP will have already returned by this point.
>
> In this code it is also used as err-path. In case move_page_tables()
> fails to move all page tables or .mremap() callback fails, the new VMA
> is unmapped.
>
> IOW, MREMAP_DONTUNMAP returns under:
> :       if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
>
> --
>           Dima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] mm: Forbid splitting special mappings
  2020-10-13  1:34 ` [PATCH 6/6] mm: Forbid splitting special mappings Dmitry Safonov
@ 2021-01-22 12:58   ` Will Deacon
  2021-01-22 13:00     ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2021-01-22 12:58 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, linux-aio,
	linux-fsdevel, linux-mm

On Tue, Oct 13, 2020 at 02:34:16AM +0100, Dmitry Safonov wrote:
> Don't allow splitting of vm_special_mapping's.
> It affects vdso/vvar areas. Uprobes have only one page in xol_area so
> they aren't affected.
> 
> Those restrictions were enforced by checks in .mremap() callbacks.
> Restrict resizing with generic .split() callback.
> 
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/arm/kernel/vdso.c    |  9 ---------
>  arch/arm64/kernel/vdso.c  | 41 +++------------------------------------
>  arch/mips/vdso/genvdso.c  |  4 ----
>  arch/s390/kernel/vdso.c   | 11 +----------
>  arch/x86/entry/vdso/vma.c | 17 ----------------
>  mm/mmap.c                 | 12 ++++++++++++
>  6 files changed, 16 insertions(+), 78 deletions(-)

For arm64:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] mm: Forbid splitting special mappings
  2021-01-22 12:58   ` Will Deacon
@ 2021-01-22 13:00     ` Will Deacon
  2021-01-22 19:52       ` Dmitry Safonov
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2021-01-22 13:00 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, linux-aio,
	linux-fsdevel, linux-mm

On Fri, Jan 22, 2021 at 12:58:58PM +0000, Will Deacon wrote:
> On Tue, Oct 13, 2020 at 02:34:16AM +0100, Dmitry Safonov wrote:
> > Don't allow splitting of vm_special_mapping's.
> > It affects vdso/vvar areas. Uprobes have only one page in xol_area so
> > they aren't affected.
> > 
> > Those restrictions were enforced by checks in .mremap() callbacks.
> > Restrict resizing with generic .split() callback.
> > 
> > Signed-off-by: Dmitry Safonov <dima@arista.com>
> > ---
> >  arch/arm/kernel/vdso.c    |  9 ---------
> >  arch/arm64/kernel/vdso.c  | 41 +++------------------------------------
> >  arch/mips/vdso/genvdso.c  |  4 ----
> >  arch/s390/kernel/vdso.c   | 11 +----------
> >  arch/x86/entry/vdso/vma.c | 17 ----------------
> >  mm/mmap.c                 | 12 ++++++++++++
> >  6 files changed, 16 insertions(+), 78 deletions(-)
> 
> For arm64:
> 
> Acked-by: Will Deacon <will@kernel.org>

Wait -- this got merged ages ago! Why am I reading such old email?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] mm: Forbid splitting special mappings
  2021-01-22 13:00     ` Will Deacon
@ 2021-01-22 19:52       ` Dmitry Safonov
  0 siblings, 0 replies; 16+ messages in thread
From: Dmitry Safonov @ 2021-01-22 19:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Dmitry Safonov, Alexander Viro, Andrew Morton,
	Andy Lutomirski, Brian Geffon, Catalin Marinas, Dan Williams,
	Dave Jiang, Hugh Dickins, Ingo Molnar, Kirill A. Shutemov,
	Mike Kravetz, Minchan Kim, Russell King, Thomas Bogendoerfer,
	Thomas Gleixner, Vishal Verma, Vlastimil Babka, linux-aio,
	linux-fsdevel, linux-mm

On 1/22/21 1:00 PM, Will Deacon wrote:
> On Fri, Jan 22, 2021 at 12:58:58PM +0000, Will Deacon wrote:
>> On Tue, Oct 13, 2020 at 02:34:16AM +0100, Dmitry Safonov wrote:
>>> Don't allow splitting of vm_special_mapping's.
>>> It affects vdso/vvar areas. Uprobes have only one page in xol_area so
>>> they aren't affected.
>>>
>>> Those restrictions were enforced by checks in .mremap() callbacks.
>>> Restrict resizing with generic .split() callback.
>>>
>>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>>> ---
>>>  arch/arm/kernel/vdso.c    |  9 ---------
>>>  arch/arm64/kernel/vdso.c  | 41 +++------------------------------------
>>>  arch/mips/vdso/genvdso.c  |  4 ----
>>>  arch/s390/kernel/vdso.c   | 11 +----------
>>>  arch/x86/entry/vdso/vma.c | 17 ----------------
>>>  mm/mmap.c                 | 12 ++++++++++++
>>>  6 files changed, 16 insertions(+), 78 deletions(-)
>>
>> For arm64:
>>
>> Acked-by: Will Deacon <will@kernel.org>
> 
> Wait -- this got merged ages ago! Why am I reading such old email?

It's alright, thanks for looking at this again!

-- 
          Dmitry

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-01-22 22:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13  1:34 [PATCH 0/6] mremap: move_vma() fixes Dmitry Safonov
2020-10-13  1:34 ` [PATCH 1/6] mm/mremap: Account memory on do_munmap() failure Dmitry Safonov
2020-10-13  1:34 ` [PATCH 2/6] mm/mremap: For MREMAP_DONTUNMAP check security_vm_enough_memory_mm() Dmitry Safonov
2020-12-28 18:21   ` Brian Geffon
2020-12-28 19:12     ` Dmitry Safonov
2020-12-30 15:43       ` Brian Geffon
2020-10-13  1:34 ` [PATCH 3/6] mremap: Don't allow MREMAP_DONTUNMAP on special_mappings and aio Dmitry Safonov
2020-12-28 18:03   ` Brian Geffon
2020-12-28 19:33     ` Dmitry Safonov
2020-12-28 19:43       ` Dmitry Safonov
2020-10-13  1:34 ` [PATCH 4/6] vm_ops: Rename .split() callback to .may_split() Dmitry Safonov
2020-10-13  1:34 ` [PATCH 5/6] mremap: Check if it's possible to split original vma Dmitry Safonov
2020-10-13  1:34 ` [PATCH 6/6] mm: Forbid splitting special mappings Dmitry Safonov
2021-01-22 12:58   ` Will Deacon
2021-01-22 13:00     ` Will Deacon
2021-01-22 19:52       ` Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).