All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: akpm@linux-foundation.org
Cc: mhocko@kernel.org, mhocko@suse.com, rientjes@google.com,
	willy@infradead.org, hannes@cmpxchg.org, guro@fb.com,
	riel@surriel.com, minchan@kernel.org, kirill@shutemov.name,
	aarcange@redhat.com, christian@brauner.io, hch@infradead.org,
	oleg@redhat.com, david@redhat.com, jannh@google.com,
	shakeelb@google.com, luto@kernel.org,
	christian.brauner@ubuntu.com, fweimer@redhat.com,
	jengelh@inai.de, timmurray@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com,
	surenb@google.com
Subject: [PATCH v4 1/3] mm: protect free_pgtables with mmap_lock write lock in exit_mmap
Date: Wed,  8 Dec 2021 13:22:09 -0800	[thread overview]
Message-ID: <20211208212211.2860249-1-surenb@google.com> (raw)

oom-reaper and process_mrelease system call should protect against
races with exit_mmap which can destroy page tables while they
walk the VMA tree. oom-reaper protects from that race by setting
MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP
before taking and releasing mmap_write_lock. process_mrelease has
to elevate mm->mm_users to prevent such race. Both oom-reaper and
process_mrelease hold mmap_read_lock when walking the VMA tree.
The locking rules and mechanisms could be simpler if exit_mmap takes
mmap_write_lock while executing destructive operations such as
free_pgtables.
Change exit_mmap to hold the mmap_write_lock when calling
free_pgtables and remove_vma. Operations like unmap_vmas and
unlock_range are not destructive and could run under mmap_read_lock
but for simplicity we take one mmap_write_lock during almost the entire
operation. Note also that because oom-reaper checks VM_LOCKED flag,
unlock_range() should not be allowed to race with it.
Before this patch, remove_vma used to be called with no locks held,
however with fput being executed asynchronously and vm_ops->close
not being allowed to hold mmap_lock (it is called from __split_vma
with mmap_sem held for write), changing that should be fine.
In most cases this lock should be uncontended. Previously, Kirill
reported ~4% regression caused by a similar change [1]. We reran the
same test and although the individual results are quite noisy, the
percentiles show lower regression with 1.6% being the worst case [2].
The change allows oom-reaper and process_mrelease to execute safely
under mmap_read_lock without worries that exit_mmap might destroy page
tables from under them.

[1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@node.shutemov.name/
[2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@mail.gmail.com/

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
changes in v4
- Separated comments describing vm_operations_struct::close locking
requirements into a separate patch, per Matthew Wilcox

 mm/mmap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index bfb0ea164a90..f4e09d390a07 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3142,25 +3142,27 @@ void exit_mmap(struct mm_struct *mm)
 		 * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
 		 * __oom_reap_task_mm() will not block.
 		 *
-		 * This needs to be done before calling munlock_vma_pages_all(),
+		 * This needs to be done before calling unlock_range(),
 		 * which clears VM_LOCKED, otherwise the oom reaper cannot
 		 * reliably test it.
 		 */
 		(void)__oom_reap_task_mm(mm);
 
 		set_bit(MMF_OOM_SKIP, &mm->flags);
-		mmap_write_lock(mm);
-		mmap_write_unlock(mm);
 	}
 
+	mmap_write_lock(mm);
 	if (mm->locked_vm)
 		unlock_range(mm->mmap, ULONG_MAX);
 
 	arch_exit_mmap(mm);
 
 	vma = mm->mmap;
-	if (!vma)	/* Can happen if dup_mmap() received an OOM */
+	if (!vma) {
+		/* Can happen if dup_mmap() received an OOM */
+		mmap_write_unlock(mm);
 		return;
+	}
 
 	lru_add_drain();
 	flush_cache_mm(mm);
@@ -3171,16 +3173,14 @@ void exit_mmap(struct mm_struct *mm)
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb);
 
-	/*
-	 * Walk the list again, actually closing and freeing it,
-	 * with preemption enabled, without holding any MM locks.
-	 */
+	/* Walk the list again, actually closing and freeing it. */
 	while (vma) {
 		if (vma->vm_flags & VM_ACCOUNT)
 			nr_accounted += vma_pages(vma);
 		vma = remove_vma(vma);
 		cond_resched();
 	}
+	mmap_write_unlock(mm);
 	vm_unacct_memory(nr_accounted);
 }
 
-- 
2.34.1.400.ga245620fadb-goog


             reply	other threads:[~2021-12-08 21:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 21:22 Suren Baghdasaryan [this message]
2021-12-08 21:22 ` [PATCH v4 2/3] mm: document locking restrictions for vm_operations_struct::close Suren Baghdasaryan
2021-12-09  8:55   ` Michal Hocko
2021-12-08 21:22 ` [PATCH v4 3/3] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
2021-12-09  8:59   ` Michal Hocko
2021-12-09 19:03     ` Suren Baghdasaryan
2021-12-09  8:55 ` [PATCH v4 1/3] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Michal Hocko
2021-12-09 19:03   ` Suren Baghdasaryan
2021-12-10  9:20     ` Michal Hocko
2021-12-09  9:12 ` [PATCH 4/3] mm: drop MMF_OOM_SKIP from exit_mmap Michal Hocko
2021-12-09 16:24   ` Suren Baghdasaryan
2021-12-09 16:47     ` Michal Hocko
2021-12-09 17:06       ` Suren Baghdasaryan
2021-12-16  2:26         ` Suren Baghdasaryan
2021-12-16 11:49           ` Johannes Weiner
2021-12-16 17:23             ` Suren Baghdasaryan
2021-12-30  5:59               ` Suren Baghdasaryan
2021-12-30  8:24                 ` Michal Hocko
2021-12-30 17:29                   ` Suren Baghdasaryan
2022-01-03 12:11                     ` Michal Hocko
2022-01-03 21:16                       ` Hugh Dickins
2022-01-04 22:24                         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211208212211.2860249-1-surenb@google.com \
    --to=surenb@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=david@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=jengelh@inai.de \
    --cc=kernel-team@android.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.