All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Slaby <jslaby@suse.cz>
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Slaby <jslaby@suse.cz>
Subject: [PATCH 3.12 20/98] memcg: do not hang on OOM when killed by userspace OOM access to memory reserves
Date: Mon, 11 Apr 2016 15:22:22 +0200	[thread overview]
Message-ID: <1f7e1e7f0018706fa29c752fd88a919c7e25b456.1460380917.git.jslaby@suse.cz> (raw)
In-Reply-To: <def91dea48a3acb14c2379b9461ae659d95616a2.1460380917.git.jslaby@suse.cz>
In-Reply-To: <cover.1460380917.git.jslaby@suse.cz>

From: Michal Hocko <mhocko@suse.cz>

3.12-stable review patch.  If anyone has any objections, please let me know.

===============

commit d8dc595ce3909fbc131bdf5ab8c9808fe624b18d upstream.

Eric has reported that he can see task(s) stuck in memcg OOM handler
regularly.  The only way out is to

	echo 0 > $GROUP/memory.oom_control

His usecase is:

- Setup a hierarchy with memory and the freezer (disable kernel oom and
  have a process watch for oom).

- In that memory cgroup add a process with one thread per cpu.

- In one thread slowly allocate once per second I think it is 16M of ram
  and mlock and dirty it (just to force the pages into ram and stay
  there).

- When oom is achieved loop:
  * attempt to freeze all of the tasks.
  * if frozen send every task SIGKILL, unfreeze, remove the directory in
    cgroupfs.

Eric has then pinpointed the issue to be memcg specific.

All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
Those that have received fatal signal will bypass the charge and should
continue on their way out.  The tricky part is that the exit path might
trigger a page fault (e.g.  exit_robust_list), thus the memcg charge,
while its memcg is still under OOM because nobody has released any charges
yet.

Unlike with the in-kernel OOM handler the exiting task doesn't get
TIF_MEMDIE set so it doesn't shortcut further charges of the killed task
and falls to the memcg OOM again without any way out of it as there are no
fatal signals pending anymore.

This patch fixes the issue by checking PF_EXITING early in
mem_cgroup_try_charge and bypass the charge same as if it had fatal
signal pending or TIF_MEMDIE set.

Normally exiting tasks (aka not killed) will bypass the charge now but
this should be OK as the task is leaving and will release memory and
increasing the memory pressure just to release it in a moment seems
dubious wasting of cycles.  Besides that charges after exit_signals should
be rare.

I am bringing this patch again (rebased on the current mmotm tree). I
hope we can move forward finally. If there is still an opposition then
I would really appreciate a concurrent approach so that we can discuss
alternatives.

http://comments.gmane.org/gmane.linux.kernel.stable/77650 is a reference
to the followup discussion when the patch has been dropped from the mmotm
last time.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 mm/memcontrol.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5904fc833523..4a1559d8739f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2710,7 +2710,8 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
 	 * MEMDIE process.
 	 */
 	if (unlikely(test_thread_flag(TIF_MEMDIE)
-		     || fatal_signal_pending(current)))
+		     || fatal_signal_pending(current)
+		     || current->flags & PF_EXITING))
 		goto bypass;
 
 	if (unlikely(task_in_memcg_oom(current)))
-- 
2.8.1

  parent reply	other threads:[~2016-04-11 14:01 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-11 13:23 [PATCH 3.12 00/98] 3.12.58-stable review Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 01/98] ipr: Fix out-of-bounds null overwrite Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 02/98] ipr: Fix regression when loading firmware Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 03/98] ipv4: Don't do expensive useless work during inetdev destroy Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 04/98] umount: Do not allow unmounting rootfs Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 05/98] kernel: Provide READ_ONCE and ASSIGN_ONCE Jiri Slaby
2016-04-11 14:13   ` Christian Borntraeger
2016-04-11 13:22 ` [PATCH 3.12 06/98] xen: Add RING_COPY_REQUEST() Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 07/98] xen-netback: don't use last request to determine minimum Tx credit Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 08/98] xen-netback: use RING_COPY_REQUEST() throughout Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 09/98] xen-blkback: only read request operation from shared ring once Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 10/98] xen-blkback: read from indirect descriptors only once Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 11/98] xen/pciback: Save xen_pci_op commands before processing it Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 12/98] xen/pciback: Save the number of MSI-X entries to be copied later Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 13/98] xfs: allow inode allocations in post-growfs disk space Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 14/98] sched: Fix race between task_group and sched_task_group Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 15/98] Btrfs: skip locking when searching commit root Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 16/98] bnx2x: Add new device ids under the Qlogic vendor Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 17/98] drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 18/98] ALSA: rawmidi: Make snd_rawmidi_transmit() race-free Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 19/98] ALSA: seq: Fix leak of pool buffer at concurrent writes Jiri Slaby
2016-04-11 13:22 ` Jiri Slaby [this message]
2016-04-11 13:22 ` [PATCH 3.12 21/98] USB: fix invalid memory access in hub_activate() Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 22/98] intel_pstate: Use del_timer_sync in intel_pstate_cpu_stop Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 23/98] KVM: SVM: add rdmsr support for AMD event registers Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 24/98] USB: visor: fix null-deref at probe Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 25/98] s390/mm: four page table levels vs. fork Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 26/98] hwmon: (coretemp) Increase limit of maximum core ID from 32 to 128 Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 27/98] perf, nmi: Fix unknown NMI warning Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 28/98] Fix kmalloc overflow in LPFC driver at large core count Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 29/98] nfs: fix high load average due to callback thread sleeping Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 30/98] net/ipv6: fix DEVCONF_ constants Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 31/98] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 32/98] mld, igmp: Fix reserved tailroom calculation Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 33/98] ahci: Add Device ID for Intel Sunrise Point PCH Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 34/98] x86/iopl/64: Properly context-switch IOPL on Xen PV Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 35/98] KVM: i8254: change PIT discard tick policy Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 36/98] KVM: fix spin_lock_init order on x86 Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 37/98] KVM: VMX: avoid guest hang on invalid invept instruction Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 38/98] EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr() Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 39/98] PCI: Disable IO/MEM decoding for devices with non-compliant BARs Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 40/98] x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt() Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 41/98] x86/iopl: Fix iopl capability check on Xen PV Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 42/98] sg: fix dxferp in from_to case Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 43/98] aacraid: Fix memory leak in aac_fib_map_free Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 44/98] be2iscsi: set the boot_kset pointer to NULL in case of failure Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 45/98] usb: retry reset if a device times out Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 46/98] usb: hub: fix a typo in hub_port_init() leading to wrong logic Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 47/98] USB: cdc-acm: more sanity checking Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 48/98] USB: iowarrior: fix oops with malicious USB descriptors Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 49/98] USB: usb_driver_claim_interface: add sanity checking Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 50/98] USB: mct_u232: add sanity checking in probe Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 51/98] USB: digi_acceleport: do sanity checking for the number of ports Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 52/98] USB: cypress_m8: add endpoint sanity check Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 53/98] USB: serial: cp210x: Adding GE Healthcare Device ID Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 54/98] USB: serial: ftdi_sio: Add support for ICP DAS I-756xU devices Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 55/98] USB: option: add "D-Link DWM-221 B1" device id Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 56/98] pwc: Add USB id for Philips Spc880nc webcam Jiri Slaby
2016-04-11 13:22 ` [PATCH 3.12 57/98] Input: powermate - fix oops with malicious USB descriptors Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 58/98] ALSA: usb-audio: Fix NULL dereference in create_fixed_stream_quirk() Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 59/98] ALSA: usb-audio: Add sanity checks for endpoint accesses Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 60/98] ALSA: usb-audio: Minor code cleanup in create_fixed_stream_quirk() Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 61/98] ALSA: usb-audio: Fix double-free in error paths after snd_usb_add_audio_stream() call Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 62/98] Bluetooth: btusb: Add new AR3012 ID 13d3:3395 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 63/98] Bluetooth: btusb: Add a new AR3012 ID 04ca:3014 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 64/98] Bluetooth: btusb: Add a new AR3012 ID 13d3:3472 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 65/98] net: irda: Fix use-after-free in irtty_open() Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 66/98] 8250: use callbacks to access UART_DLL/UART_DLM Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 67/98] saa7134: Fix bytesperline not being set correctly for planar formats Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 68/98] adv7511: TX_EDID_PRESENT is still 1 after a disconnect Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 69/98] bttv: Width must be a multiple of 16 when capturing planar formats Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 70/98] media: v4l2-compat-ioctl32: fix missing length copy in put_v4l2_buffer32 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 71/98] ALSA: intel8x0: Add clock quirk entry for AD1981B on IBM ThinkPad X41 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 72/98] ALSA: hda - Fix unconditional GPIO toggle via automute Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 73/98] jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 74/98] bcache: fix cache_set_flush() NULL pointer dereference on OOM Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 75/98] watchdog: rc32434_wdt: fix ioctl error handling Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 76/98] Bluetooth: Add new AR3012 ID 0489:e095 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 77/98] splice: handle zero nr_pages in splice_to_pipe() Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 78/98] xtensa: ISS: don't hang if stdin EOF is reached Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 79/98] xtensa: clear all DBREAKC registers on start Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 80/98] xfs: fix two memory leaks in xfs_attr_list.c error paths Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 81/98] md/raid5: Compare apples to apples (or sectors to sectors) Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 82/98] fs/coredump: prevent fsuid=0 dumps into user-controlled directories Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 83/98] rapidio/rionet: fix deadlock on SMP Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 84/98] drm/radeon: Don't drop DP 2.7 Ghz link setup on some cards Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 85/98] tracing: Have preempt(irqs)off trace preempt disabled functions Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 86/98] tracing: Fix crash from reading trace_pipe with sendfile Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 87/98] tracing: Fix trace_printk() to print when not using bprintk() Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 88/98] scripts/coccinelle: modernize & Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 89/98] kbuild/mkspec: fix grub2 installkernel issue Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 90/98] target: Fix target_release_cmd_kref shutdown comp leak Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 91/98] Input: ims-pcu - sanity check against missing interfaces Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 92/98] Input: ati_remote2 - fix crashes on detecting device with invalid descriptor Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 93/98] ocfs2/dlm: fix race between convert and recovery Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 94/98] ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 95/98] mtd: onenand: fix deadlock in onenand_block_markbad Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 96/98] sched/cputime: Fix steal time accounting vs. CPU hotplug Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 97/98] perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2 Jiri Slaby
2016-04-11 13:23 ` [PATCH 3.12 98/98] perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere Jiri Slaby
2016-04-11 13:37 ` [PATCH 3.12 00/98] 3.12.58-stable review Guenter Roeck
2016-04-14  8:24   ` Jiri Slaby
2016-04-11 17:28 ` shuahkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f7e1e7f0018706fa29c752fd88a919c7e25b456.1460380917.git.jslaby@suse.cz \
    --to=jslaby@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.