linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	"yuwang.yuwang" <yuwang.yuwang@alibaba-inc.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Dave Hansen <dave.hansen@intel.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 30/67] mm: dont warn about allocations which stall for too long
Date: Tue, 11 Dec 2018 16:41:30 +0100	[thread overview]
Message-ID: <20181211151631.910085422@linuxfoundation.org> (raw)
In-Reply-To: <20181211151630.378216233@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 400e22499dd92613821374c8c6c88c7225359980 ]

Commit 63f53dea0c98 ("mm: warn about allocations which stall for too
long") was a great step for reducing possibility of silent hang up
problem caused by memory allocation stalls.  But this commit reverts it,
for it is possible to trigger OOM lockup and/or soft lockups when many
threads concurrently called warn_alloc() (in order to warn about memory
allocation stalls) due to current implementation of printk(), and it is
difficult to obtain useful information due to limitation of synchronous
warning approach.

Current printk() implementation flushes all pending logs using the
context of a thread which called console_unlock().  printk() should be
able to flush all pending logs eventually unless somebody continues
appending to printk() buffer.

Since warn_alloc() started appending to printk() buffer while waiting
for oom_kill_process() to make forward progress when oom_kill_process()
is processing pending logs, it became possible for warn_alloc() to force
oom_kill_process() loop inside printk().  As a result, warn_alloc()
significantly increased possibility of preventing oom_kill_process()
from making forward progress.

---------- Pseudo code start ----------
Before warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    }
    goto retry;

After warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else if (waited_for_10seconds()) {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

Although waited_for_10seconds() becomes true once per 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same
time.  Also, since threads doing waited_for_10seconds() keep doing
almost busy loop, the thread doing print_one_log() can use little CPU
resource.  Therefore, this situation can be simplified like

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

when printk() is called faster than print_one_log() can process a log.

One of possible mitigation would be to introduce a new lock in order to
make sure that no other series of printk() (either oom_kill_process() or
warn_alloc()) can append to printk() buffer when one series of printk()
(either oom_kill_process() or warn_alloc()) is already in progress.

Such serialization will also help obtaining kernel messages in readable
form.

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      mutex_lock(&oom_printk_lock);
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_printk_lock);
      mutex_unlock(&oom_lock)
    } else {
      if (mutex_trylock(&oom_printk_lock)) {
        atomic_inc(&printk_pending_logs);
        mutex_unlock(&oom_printk_lock);
      }
    }
    goto retry;
---------- Pseudo code end ----------

But this commit does not go that direction, for we don't want to
introduce a new lock dependency, and we unlikely be able to obtain
useful information even if we serialized oom_kill_process() and
warn_alloc().

Synchronous approach is prone to unexpected results (e.g.  too late [1],
too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
helped with providing information other than "something is going wrong".
I want to consider asynchronous approach which can obtain information
during stalls with possibly relevant threads (e.g.  the owner of
oom_lock and kswapd-like threads) and serve as a trigger for actions
(e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
of stalling KVM guest for diagnostic purpose).

This commit temporarily loses ability to report e.g.  OOM lockup due to
unable to invoke the OOM killer due to !__GFP_FS allocation request.
But asynchronous approach will be able to detect such situation and emit
warning.  Thus, let's remove warn_alloc().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
[2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
[3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever"))

Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/page_alloc.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2074f424dabf..6be91a1a00d9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3862,8 +3862,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	enum compact_result compact_result;
 	int compaction_retries;
 	int no_progress_loops;
-	unsigned long alloc_start = jiffies;
-	unsigned int stall_timeout = 10 * HZ;
 	unsigned int cpuset_mems_cookie;
 	int reserve_flags;
 
@@ -3983,14 +3981,6 @@ retry:
 	if (!can_direct_reclaim)
 		goto nopage;
 
-	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
-		warn_alloc(gfp_mask & ~__GFP_NOWARN, ac->nodemask,
-			"page allocation stalls for %ums, order:%u",
-			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
-	}
-
 	/* Avoid recursion of direct reclaim */
 	if (current->flags & PF_MEMALLOC)
 		goto nopage;
-- 
2.19.1




  parent reply	other threads:[~2018-12-11 15:52 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 15:41 [PATCH 4.14 00/67] 4.14.88-stable review Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 01/67] media: omap3isp: Unregister media device as first Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 02/67] iommu/vt-d: Fix NULL pointer dereference in prq_event_thread() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 03/67] brcmutil: really fix decoding channel info for 160 MHz bandwidth Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 04/67] iommu/ipmmu-vmsa: Fix crash on early domain free Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 05/67] can: rcar_can: Fix erroneous registration Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 06/67] test_firmware: fix error return getting clobbered Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 07/67] HID: input: Ignore battery reported by Symbol DS4308 Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 08/67] batman-adv: Use explicit tvlv padding for ELP packets Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 09/67] batman-adv: Expand merged fragment buffer for full packet Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 10/67] amd/iommu: Fix Guest Virtual APIC Log Tail Address Register Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 11/67] bnx2x: Assign unique DMAE channel number for FW DMAE transactions Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 12/67] qed: Fix PTT leak in qed_drain() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 13/67] qed: Fix reading wrong value in loop condition Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 14/67] Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 15/67] net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 16/67] net/mlx4_core: Fix uninitialized variable compilation warning Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 17/67] net/mlx4: Fix UBSAN warning of signed integer overflow Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 18/67] gpio: mockup: fix indicated direction Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 19/67] mtd: rawnand: qcom: Namespace prefix some commands Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 20/67] HID: multitouch: Add pointstick support for Cirque Touchpad Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 21/67] mtd: spi-nor: Fix Cadence QSPI page fault kernel panic Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 22/67] qed: Fix bitmap_weight() check Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 23/67] qed: Fix QM getters to always return a valid pq Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 24/67] net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 25/67] iommu/vt-d: Use memunmap to free memremap Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 26/67] flexfiles: use per-mirror specified stateid for IO Greg Kroah-Hartman
2018-12-11 18:48   ` Mkrtchyan, Tigran
2018-12-11 15:41 ` [PATCH 4.14 27/67] ibmvnic: Fix RX queue buffer cleanup Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 28/67] team: no need to do team_notify_peers or team_mcast_rejoin when disabling port Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 29/67] net: amd: add missing of_node_put() Greg Kroah-Hartman
2018-12-11 15:41 ` Greg Kroah-Hartman [this message]
2018-12-11 15:41 ` [PATCH 4.14 31/67] usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 32/67] usb: appledisplay: Add 27" Apple Cinema Display Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 33/67] USB: check usb_get_extra_descriptor for proper size Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 34/67] ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 35/67] ALSA: hda: Add support for AMD Stoney Ridge Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 36/67] ALSA: pcm: Fix starvation on down_write_nonblock() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 37/67] ALSA: pcm: Call snd_pcm_unlink() conditionally at closing Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 38/67] ALSA: pcm: Fix interval evaluation with openmin/max Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 39/67] ALSA: hda/realtek - Fix speaker output regression on Thinkpad T570 Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 40/67] virtio/s390: avoid race on vcdev->config Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 41/67] virtio/s390: fix race in ccw_io_helper() Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 42/67] vhost/vsock: fix use-after-free in network stack callers Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 43/67] SUNRPC: Fix leak of krb5p encode pages Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 44/67] dmaengine: dw: Fix FIFO size for Intel Merrifield Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 45/67] dmaengine: cppi41: delete channel from pending list when stop channel Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 46/67] ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 47/67] xhci: workaround CSS timeout on AMD SNPS 3.0 xHC Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 48/67] xhci: Prevent U1/U2 link pm states if exit latency is too long Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 49/67] f2fs: fix to do sanity check with block address in main area v2 Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 50/67] swiotlb: clean up reporting Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 51/67] Staging: lustre: remove two build warnings Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 52/67] staging: atomisp: remove "fun" strncpy warning Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 53/67] cifs: Fix separator when building path from dentry Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 54/67] staging: rtl8712: Fix possible buffer overrun Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 55/67] Revert commit ef9209b642f "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c" Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 56/67] drm/amdgpu: update mc firmware image for polaris12 variants Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 57/67] drm/amdgpu/gmc8: update MC firmware for polaris Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 58/67] Drivers: hv: vmbus: Offload the handling of channels to two workqueues Greg Kroah-Hartman
2018-12-11 15:41 ` [PATCH 4.14 59/67] tty: serial: 8250_mtk: always resume the device in probe Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 60/67] tty: do not set TTY_IO_ERROR flag if console port Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 61/67] kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var() Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 62/67] libnvdimm, pfn: Pad pfn namespaces relative to other regions Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 63/67] mac80211_hwsim: Timer should be initialized before device registered Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 64/67] mac80211: Clear beacon_int in ieee80211_do_stop Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 65/67] mac80211: ignore tx status for PS stations in ieee80211_tx_status_ext Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 66/67] mac80211: fix reordering of buffered broadcast packets Greg Kroah-Hartman
2018-12-11 15:42 ` [PATCH 4.14 67/67] mac80211: ignore NullFunc frames in the duplicate detection Greg Kroah-Hartman
2018-12-11 21:33 ` [PATCH 4.14 00/67] 4.14.88-stable review kernelci.org bot
2018-12-11 23:59 ` shuah
2018-12-12  6:37 ` Naresh Kamboju
2018-12-12 18:50 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181211151631.910085422@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yuwang.yuwang@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).