From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, David Rientjes <rientjes@google.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Michal Hocko <mhocko@suse.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.16 33/72] mm, oom: fix concurrent munlock and oom reaper unmap, v3
Date: Mon, 14 May 2018 08:48:50 +0200 [thread overview]
Message-ID: <20180514064824.534798031@linuxfoundation.org> (raw)
In-Reply-To: <20180514064823.033169170@linuxfoundation.org>
4.16-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Rientjes <rientjes@google.com>
commit 27ae357fa82be5ab73b2ef8d39dcb8ca2563483a upstream.
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.
Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP. The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap(). It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/oom.h | 2 +
mm/mmap.c | 44 ++++++++++++++++------------
mm/oom_kill.c | 81 +++++++++++++++++++++++++++-------------------------
3 files changed, 71 insertions(+), 56 deletions(-)
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
return 0;
}
+void __oom_reap_task_mm(struct mm_struct *mm);
+
extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
unsigned long totalpages);
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2997,6 +2997,32 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Manually reap the mm to free as much memory as possible.
+ * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+ * this mm from further consideration. Taking mm->mmap_sem for
+ * write after setting MMF_OOM_SKIP will guarantee that the oom
+ * reaper will not run on this mm again after mmap_sem is
+ * dropped.
+ *
+ * Nothing can be holding mm->mmap_sem here and the above call
+ * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+ * __oom_reap_task_mm() will not block.
+ *
+ * This needs to be done before calling munlock_vma_pages_all(),
+ * which clears VM_LOCKED, otherwise the oom reaper cannot
+ * reliably test it.
+ */
+ mutex_lock(&oom_lock);
+ __oom_reap_task_mm(mm);
+ mutex_unlock(&oom_lock);
+
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3018,24 +3044,6 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -474,7 +474,6 @@ bool process_shares_mm(struct task_struc
return false;
}
-
#ifdef CONFIG_MMU
/*
* OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -485,16 +484,54 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
{
- struct mmu_gather tlb;
struct vm_area_struct *vma;
+
+ /*
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over a reaped memory.
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+
+ for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+ if (!can_madv_dontneed_vma(vma))
+ continue;
+
+ /*
+ * Only anonymous pages have a good chance to be dropped
+ * without additional steps which we cannot afford as we
+ * are OOM already.
+ *
+ * We do not even care about fs backed pages because all
+ * which are reclaimable have already been reclaimed and
+ * we do not want to block exit_mmap by keeping mm ref
+ * count elevated without a good reason.
+ */
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ const unsigned long start = vma->vm_start;
+ const unsigned long end = vma->vm_end;
+ struct mmu_gather tlb;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ mmu_notifier_invalidate_range_start(mm, start, end);
+ unmap_page_range(&tlb, vma, start, end, NULL);
+ mmu_notifier_invalidate_range_end(mm, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+ }
+ }
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
bool ret = true;
/*
* We have to make sure to not race with the victim exit path
* and cause premature new oom victim selection:
- * __oom_reap_task_mm exit_mm
+ * oom_reap_task_mm exit_mm
* mmget_not_zero
* mmput
* atomic_dec_and_test
@@ -539,39 +576,8 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
- for (vma = mm->mmap ; vma; vma = vma->vm_next) {
- if (!can_madv_dontneed_vma(vma))
- continue;
+ __oom_reap_task_mm(mm);
- /*
- * Only anonymous pages have a good chance to be dropped
- * without additional steps which we cannot afford as we
- * are OOM already.
- *
- * We do not even care about fs backed pages because all
- * which are reclaimable have already been reclaimed and
- * we do not want to block exit_mmap by keeping mm ref
- * count elevated without a good reason.
- */
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
- const unsigned long start = vma->vm_start;
- const unsigned long end = vma->vm_end;
-
- tlb_gather_mmu(&tlb, mm, start, end);
- mmu_notifier_invalidate_range_start(mm, start, end);
- unmap_page_range(&tlb, vma, start, end, NULL);
- mmu_notifier_invalidate_range_end(mm, start, end);
- tlb_finish_mmu(&tlb, start, end);
- }
- }
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -592,13 +598,12 @@ static void oom_reap_task(struct task_st
struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+ while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
if (attempts <= MAX_OOM_REAP_RETRIES)
goto done;
-
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
debug_show_all_locks();
next prev parent reply other threads:[~2018-05-14 6:48 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-14 6:48 [PATCH 4.16 00/72] 4.16.9-stable review Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 01/72] ipvs: fix rtnl_lock lockups caused by start_sync_thread Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 02/72] netfilter: ebtables: dont attempt to allocate 0-sized compat array Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 03/72] clk: ti: fix flag space conflict with clkctrl clocks Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 04/72] kcm: Call strp_stop before strp_done in kcm_attach Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 05/72] rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 06/72] crypto: af_alg - fix possible uninit-value in alg_bind() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 07/72] netlink: fix uninit-value in netlink_sendmsg Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 08/72] net: fix rtnh_ok() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 09/72] net: initialize skb->peeked when cloning Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 10/72] net: fix uninit-value in __hw_addr_add_ex() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 11/72] dccp: initialize ireq->ir_mark Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 12/72] ipv4: fix uninit-value in ip_route_output_key_hash_rcu() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 13/72] soreuseport: initialise timewait reuseport field Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 14/72] inetpeer: fix uninit-value in inet_getpeer Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 15/72] bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 16/72] memcg: fix per_node_info cleanup Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 17/72] perf: Remove superfluous allocation error check Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 18/72] i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 19/72] tcp: fix TCP_REPAIR_QUEUE bound checking Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 20/72] bdi: wake up concurrent wb_shutdown() callers Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 21/72] bdi: Fix use after free bug in debugfs_remove() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 22/72] bdi: Fix oops in wb_workfn() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 23/72] compat: fix 4-byte infoleak via uninitialized struct field Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 24/72] gpioib: do not free unrequested descriptors Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 25/72] gpio: fix aspeed_gpio unmask irq Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 26/72] gpio: fix error path in lineevent_create Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 27/72] rfkill: gpio: fix memory leak in probe error path Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 28/72] libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 29/72] dm integrity: use kvfree for kvmallocd memory Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 30/72] tracing: Fix regex_match_front() to not over compare the test string Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 31/72] z3fold: fix reclaim lock-ups Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 32/72] mm: sections are not offlined during memory hotremove Greg Kroah-Hartman
2018-05-14 6:48 ` Greg Kroah-Hartman [this message]
2018-05-14 6:48 ` [PATCH 4.16 34/72] ceph: fix rsize/wsize capping in ceph_direct_read_write() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 35/72] can: flexcan: fix endianess detection Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 36/72] can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg() Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 37/72] can: hi311x: Acquire SPI lock on ->do_get_berr_counter Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 38/72] can: hi311x: Work around TX complete interrupt erratum Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 39/72] mtd: rawnand: marvell: pass ms delay to wait_op Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 40/72] mtd: rawnand: marvell: fix command xtype in BCH write hook Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 41/72] mtd: rawnand: Make sure we wait tWB before polling the STATUS reg Greg Kroah-Hartman
2018-05-14 7:32 ` Geert Uytterhoeven
2018-05-14 9:04 ` Greg Kroah-Hartman
2018-05-14 9:09 ` Boris Brezillon
2018-05-14 10:54 ` Geert Uytterhoeven
2018-05-14 9:32 ` Geert Uytterhoeven
2018-05-14 16:50 ` Greg Kroah-Hartman
2018-05-14 6:48 ` [PATCH 4.16 42/72] drm/vc4: Fix scaling of uni-planar formats Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 43/72] drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 44/72] drm/i915: Fix drm:intel_enable_lvds ERROR message in kernel log Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 45/72] drm/i915: Adjust eDPs logical vco in a reliable place Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 46/72] drm/nouveau: Fix deadlock in nv50_mstm_register_connector() Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 47/72] drm/nouveau/ttm: dont dereference nvbo::cli, it can outlive client Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 48/72] drm/atomic: Clean old_state/new_state in drm_atomic_state_default_clear() Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 49/72] drm/atomic: Clean private obj " Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 50/72] net: atm: Fix potential Spectre v1 Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 51/72] atm: zatm: " Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 52/72] PCI / PM: Always check PME wakeup capability for runtime wakeup support Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 53/72] PCI / PM: Check device_may_wakeup() in pci_enable_wake() Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 54/72] cpufreq: schedutil: Avoid using invalid next_freq Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 55/72] arm: dts: imx[35]*: declare flexcan devices to be compatible to imx25s flexcan Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 56/72] Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174" Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 57/72] Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 58/72] Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 59/72] thermal: exynos: Reading temperature makes sense only when TMU is turned on Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 60/72] thermal: exynos: Propagate error value from tmu_read() Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 61/72] nvme: add quirk to force medium priority for SQ creation Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 62/72] nvme: Fix sync controller reset return Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 63/72] smb3: directory sync should not return an error Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 64/72] swiotlb: silent unwanted warning "buffer is full" Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 65/72] sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 66/72] sched/autogroup: " Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 67/72] tracing/uprobe_event: Fix strncpy corner case Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 68/72] perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 69/72] perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 70/72] perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 71/72] perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] Greg Kroah-Hartman
2018-05-14 6:49 ` [PATCH 4.16 72/72] perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() Greg Kroah-Hartman
2018-05-14 13:45 ` [PATCH 4.16 00/72] 4.16.9-stable review kernelci.org bot
2018-05-14 16:27 ` Guenter Roeck
2018-05-14 16:51 ` Greg Kroah-Hartman
2018-05-14 22:01 ` Shuah Khan
2018-05-15 6:47 ` Greg Kroah-Hartman
2018-05-15 5:31 ` Naresh Kamboju
2018-05-15 6:47 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180514064824.534798031@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).