All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Michal Hocko <mhocko@suse.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 32/62] mm, oom: fix concurrent munlock and oom reaper unmap, v3
Date: Mon, 14 May 2018 08:48:48 +0200	[thread overview]
Message-ID: <20180514064818.131463728@linuxfoundation.org> (raw)
In-Reply-To: <20180514064816.436958006@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit 27ae357fa82be5ab73b2ef8d39dcb8ca2563483a upstream.

Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/oom.h |    2 +
 mm/mmap.c           |   44 ++++++++++++++++++------------
 mm/oom_kill.c       |   74 ++++++++++++++++++++++++++++------------------------
 3 files changed, 68 insertions(+), 52 deletions(-)

--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
 	return 0;
 }
 
+void __oom_reap_task_mm(struct mm_struct *mm);
+
 extern unsigned long oom_badness(struct task_struct *p,
 		struct mem_cgroup *memcg, const nodemask_t *nodemask,
 		unsigned long totalpages);
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2982,6 +2982,32 @@ void exit_mmap(struct mm_struct *mm)
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
 
+	if (unlikely(mm_is_oom_victim(mm))) {
+		/*
+		 * Manually reap the mm to free as much memory as possible.
+		 * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+		 * this mm from further consideration.  Taking mm->mmap_sem for
+		 * write after setting MMF_OOM_SKIP will guarantee that the oom
+		 * reaper will not run on this mm again after mmap_sem is
+		 * dropped.
+		 *
+		 * Nothing can be holding mm->mmap_sem here and the above call
+		 * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+		 * __oom_reap_task_mm() will not block.
+		 *
+		 * This needs to be done before calling munlock_vma_pages_all(),
+		 * which clears VM_LOCKED, otherwise the oom reaper cannot
+		 * reliably test it.
+		 */
+		mutex_lock(&oom_lock);
+		__oom_reap_task_mm(mm);
+		mutex_unlock(&oom_lock);
+
+		set_bit(MMF_OOM_SKIP, &mm->flags);
+		down_write(&mm->mmap_sem);
+		up_write(&mm->mmap_sem);
+	}
+
 	if (mm->locked_vm) {
 		vma = mm->mmap;
 		while (vma) {
@@ -3003,24 +3029,6 @@ void exit_mmap(struct mm_struct *mm)
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
-
-	if (unlikely(mm_is_oom_victim(mm))) {
-		/*
-		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
-		 * calling down_read(), oom_reap_task() will not run
-		 * on this "mm" post up_write().
-		 *
-		 * mm_is_oom_victim() cannot be set from under us
-		 * either because victim->mm is already set to NULL
-		 * under task_lock before calling mmput and oom_mm is
-		 * set not NULL by the OOM killer only if victim->mm
-		 * is found not NULL while holding the task_lock.
-		 */
-		set_bit(MMF_OOM_SKIP, &mm->flags);
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
-	}
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
 
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -456,7 +456,6 @@ bool process_shares_mm(struct task_struc
 	return false;
 }
 
-
 #ifdef CONFIG_MMU
 /*
  * OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -467,16 +466,51 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
 static struct task_struct *oom_reaper_list;
 static DEFINE_SPINLOCK(oom_reaper_lock);
 
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
 {
-	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
+
+	/*
+	 * Tell all users of get_user/copy_from_user etc... that the content
+	 * is no longer stable. No barriers really needed because unmapping
+	 * should imply barriers already and the reader would hit a page fault
+	 * if it stumbled over a reaped memory.
+	 */
+	set_bit(MMF_UNSTABLE, &mm->flags);
+
+	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+		if (!can_madv_dontneed_vma(vma))
+			continue;
+
+		/*
+		 * Only anonymous pages have a good chance to be dropped
+		 * without additional steps which we cannot afford as we
+		 * are OOM already.
+		 *
+		 * We do not even care about fs backed pages because all
+		 * which are reclaimable have already been reclaimed and
+		 * we do not want to block exit_mmap by keeping mm ref
+		 * count elevated without a good reason.
+		 */
+		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+			struct mmu_gather tlb;
+
+			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
+			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
+					 NULL);
+			tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
+		}
+	}
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
 	bool ret = true;
 
 	/*
 	 * We have to make sure to not race with the victim exit path
 	 * and cause premature new oom victim selection:
-	 * __oom_reap_task_mm		exit_mm
+	 * oom_reap_task_mm		exit_mm
 	 *   mmget_not_zero
 	 *				  mmput
 	 *				    atomic_dec_and_test
@@ -524,35 +558,8 @@ static bool __oom_reap_task_mm(struct ta
 
 	trace_start_task_reaping(tsk->pid);
 
-	/*
-	 * Tell all users of get_user/copy_from_user etc... that the content
-	 * is no longer stable. No barriers really needed because unmapping
-	 * should imply barriers already and the reader would hit a page fault
-	 * if it stumbled over a reaped memory.
-	 */
-	set_bit(MMF_UNSTABLE, &mm->flags);
-
-	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
-		if (!can_madv_dontneed_vma(vma))
-			continue;
+	__oom_reap_task_mm(mm);
 
-		/*
-		 * Only anonymous pages have a good chance to be dropped
-		 * without additional steps which we cannot afford as we
-		 * are OOM already.
-		 *
-		 * We do not even care about fs backed pages because all
-		 * which are reclaimable have already been reclaimed and
-		 * we do not want to block exit_mmap by keeping mm ref
-		 * count elevated without a good reason.
-		 */
-		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
-			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
-			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
-					 NULL);
-			tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
-		}
-	}
 	pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 			task_pid_nr(tsk), tsk->comm,
 			K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -573,13 +580,12 @@ static void oom_reap_task(struct task_st
 	struct mm_struct *mm = tsk->signal->oom_mm;
 
 	/* Retry the down_read_trylock(mmap_sem) a few times */
-	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+	while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
 		schedule_timeout_idle(HZ/10);
 
 	if (attempts <= MAX_OOM_REAP_RETRIES)
 		goto done;
 
-
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
 		task_pid_nr(tsk), tsk->comm);
 	debug_show_all_locks();

  parent reply	other threads:[~2018-05-14  6:57 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14  6:48 [PATCH 4.14 00/62] 4.14.41-stable review Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 01/62] ipvs: fix rtnl_lock lockups caused by start_sync_thread Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 02/62] netfilter: ebtables: dont attempt to allocate 0-sized compat array Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 03/62] kcm: Call strp_stop before strp_done in kcm_attach Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 04/62] crypto: af_alg - fix possible uninit-value in alg_bind() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 05/62] netlink: fix uninit-value in netlink_sendmsg Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 06/62] net: fix rtnh_ok() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 07/62] net: initialize skb->peeked when cloning Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 08/62] net: fix uninit-value in __hw_addr_add_ex() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 09/62] dccp: initialize ireq->ir_mark Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 10/62] ipv4: fix uninit-value in ip_route_output_key_hash_rcu() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 11/62] soreuseport: initialise timewait reuseport field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 12/62] inetpeer: fix uninit-value in inet_getpeer Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 13/62] memcg: fix per_node_info cleanup Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 14/62] perf: Remove superfluous allocation error check Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 15/62] tcp: fix TCP_REPAIR_QUEUE bound checking Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 16/62] bdi: wake up concurrent wb_shutdown() callers Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 17/62] bdi: Fix oops in wb_workfn() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 18/62] KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 19/62] KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 20/62] KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 21/62] arm64: Add work around for Arm Cortex-A55 Erratum 1024718 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 22/62] compat: fix 4-byte infoleak via uninitialized struct field Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 23/62] gpioib: do not free unrequested descriptors Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 24/62] gpio: fix aspeed_gpio unmask irq Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 25/62] gpio: fix error path in lineevent_create Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 26/62] rfkill: gpio: fix memory leak in probe error path Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 27/62] libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 28/62] dm integrity: use kvfree for kvmallocd memory Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 29/62] tracing: Fix regex_match_front() to not over compare the test string Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 30/62] z3fold: fix reclaim lock-ups Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 31/62] mm: sections are not offlined during memory hotremove Greg Kroah-Hartman
2018-05-14  6:48 ` Greg Kroah-Hartman [this message]
2018-05-14  6:48 ` [PATCH 4.14 33/62] ceph: fix rsize/wsize capping in ceph_direct_read_write() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 34/62] can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 35/62] can: hi311x: Acquire SPI lock on ->do_get_berr_counter Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 36/62] can: hi311x: Work around TX complete interrupt erratum Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 37/62] drm/vc4: Fix scaling of uni-planar formats Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 38/62] drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 39/62] drm/nouveau: Fix deadlock in nv50_mstm_register_connector() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 40/62] drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear() Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 41/62] drm/atomic: Clean private obj " Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 42/62] net: atm: Fix potential Spectre v1 Greg Kroah-Hartman
2018-05-14  6:48 ` [PATCH 4.14 43/62] atm: zatm: " Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 44/62] PCI / PM: Always check PME wakeup capability for runtime wakeup support Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 45/62] PCI / PM: Check device_may_wakeup() in pci_enable_wake() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 46/62] cpufreq: schedutil: Avoid using invalid next_freq Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 47/62] Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 48/62] Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 49/62] Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets Greg Kroah-Hartman
2018-05-15 17:07   ` Brian Norris
2018-05-15 19:42     ` Guenter Roeck
2018-05-16  7:57       ` gregkh
2018-05-14  6:49 ` [PATCH 4.14 50/62] thermal: exynos: Reading temperature makes sense only when TMU is turned on Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 51/62] thermal: exynos: Propagate error value from tmu_read() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 52/62] nvme: add quirk to force medium priority for SQ creation Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 53/62] smb3: directory sync should not return an error Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 54/62] sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 55/62] tracing/uprobe_event: Fix strncpy corner case Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 56/62] perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 57/62] perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 58/62] perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 59/62] perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 60/62] perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 61/62] KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler Greg Kroah-Hartman
2018-05-14  6:49 ` [PATCH 4.14 62/62] KVM: x86: remove APIC Timer periodic/oneshot spikes Greg Kroah-Hartman
2018-05-14  8:28 ` [PATCH 4.14 00/62] 4.14.41-stable review Nathan Chancellor
2018-05-14 12:45 ` kernelci.org bot
2018-05-14 16:21 ` Guenter Roeck
2018-05-14 22:02 ` Shuah Khan
2018-05-15  5:36 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514064818.131463728@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.