linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm: silence soft lockups from unlock_page
@ 2020-07-21  6:32 Michal Hocko
  2020-07-21 11:10 ` Qian Cai
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Michal Hocko @ 2020-07-21  6:32 UTC (permalink / raw)
  To: linux-mm; +Cc: LKML, Andrew Morton, Linus Torvalds, Tim Chen, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

We have seen a bug report with huge number of soft lockups during the
system boot on !PREEMPT kernel
NMI watchdog: BUG: soft lockup - CPU#1291 stuck for 22s! [systemd-udevd:43283]
[...]
NIP [c00000000094e66c] _raw_spin_lock_irqsave+0xac/0x100
LR [c00000000094e654] _raw_spin_lock_irqsave+0x94/0x100
Call Trace:
[c00002293d883b30] [c0000000012cdee8] __pmd_index_size+0x0/0x8 (unreliable)
[c00002293d883b70] [c0000000002b7490] wake_up_page_bit+0xc0/0x150
[c00002293d883bf0] [c00000000030ceb8] do_fault+0x448/0x870
[c00002293d883c40] [c000000000310080] __handle_mm_fault+0x880/0x16b0
[c00002293d883d10] [c000000000311018] handle_mm_fault+0x168/0x250
[c00002293d883d50] [c00000000006c488] do_page_fault+0x568/0x8d0
[c00002293d883e30] [c00000000000a534] handle_page_fault+0x18/0x38

on a large ppc machine. The very likely cause is a suboptimal
configuration when systed-udev spawns way too many workders to bring the
system up.

The lockup is in page_unlock in do_read_fault and I suspect that this is
yet another effect of a very long waitqueue chain which has been
addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in
wake_up_page_bit") previously. The commit primarily aimed at hard lockup
prevention but it doesn't really help !PREEMPT case which still has to
process all the work without any rescheduling point. This is however not
really trivial because page_unlock is called from many contexts many of
which are likely called from an atomic context.

Introducing page_unlock_sleepable is certainly an option but it seems
like a hard to maintain option which doesn't really fix the underlying
problem as the same might happen from other unlock_page callers. This
patch doesn't address the underlying problem but it reduces the visible
effect. Tell the soft lockup to shut up when retrying batches on queued
waiters. This will also allow systems configured to panic on warning to
proceed with the boot.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/filemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 385759c4ce4b..74681c40a6e5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -41,6 +41,7 @@
 #include <linux/delayacct.h>
 #include <linux/psi.h>
 #include <linux/ramfs.h>
+#include <linux/nmi.h>
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -1055,6 +1056,7 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 		 */
 		spin_unlock_irqrestore(&q->lock, flags);
 		cpu_relax();
+		touch_softlockup_watchdog();
 		spin_lock_irqsave(&q->lock, flags);
 		__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2020-08-18 13:50 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21  6:32 [RFC PATCH] mm: silence soft lockups from unlock_page Michal Hocko
2020-07-21 11:10 ` Qian Cai
2020-07-21 11:25   ` Michal Hocko
2020-07-21 11:44     ` Qian Cai
2020-07-21 12:17       ` Michal Hocko
2020-07-21 13:23         ` Qian Cai
2020-07-21 13:38           ` Michal Hocko
2020-07-21 14:15             ` Qian Cai
2020-07-21 14:17 ` Chris Down
2020-07-21 15:00   ` Michal Hocko
2020-07-21 15:33 ` Linus Torvalds
2020-07-21 15:49   ` Michal Hocko
2020-07-22 18:29   ` Linus Torvalds
2020-07-22 21:29     ` Hugh Dickins
2020-07-22 22:10       ` Linus Torvalds
2020-07-22 23:42         ` Linus Torvalds
2020-07-23  0:23           ` Linus Torvalds
2020-07-23 12:47           ` Oleg Nesterov
2020-07-23 17:32             ` Linus Torvalds
2020-07-23 18:01               ` Oleg Nesterov
2020-07-23 18:22                 ` Linus Torvalds
2020-07-23 19:03                   ` Linus Torvalds
2020-07-24 14:45                     ` Oleg Nesterov
2020-07-23 20:03               ` Linus Torvalds
2020-07-23 23:11                 ` Hugh Dickins
2020-07-23 23:43                   ` Linus Torvalds
2020-07-24  0:07                     ` Hugh Dickins
2020-07-24  0:46                       ` Linus Torvalds
2020-07-24  3:45                         ` Hugh Dickins
2020-07-24 15:24                     ` Oleg Nesterov
2020-07-24 17:32                       ` Linus Torvalds
2020-07-24 23:25                         ` Linus Torvalds
2020-07-25  2:08                           ` Hugh Dickins
2020-07-25  2:46                             ` Linus Torvalds
2020-07-25 10:14                           ` Oleg Nesterov
2020-07-25 18:48                             ` Linus Torvalds
2020-07-25 19:27                               ` Oleg Nesterov
2020-07-25 19:51                                 ` Linus Torvalds
2020-07-26 13:57                                   ` Oleg Nesterov
2020-07-25 21:19                               ` Hugh Dickins
2020-07-26  4:22                                 ` Hugh Dickins
2020-07-26 20:30                                   ` Hugh Dickins
2020-07-26 20:41                                     ` Linus Torvalds
2020-07-26 22:09                                       ` Hugh Dickins
2020-07-27 19:35                                     ` Greg KH
2020-08-06  5:46                                       ` Hugh Dickins
2020-08-18 13:50                                         ` Greg KH
2020-08-06  5:21                                     ` Hugh Dickins
2020-08-06 17:07                                       ` Linus Torvalds
2020-08-06 18:00                                         ` Matthew Wilcox
2020-08-06 18:32                                           ` Linus Torvalds
2020-08-07 18:41                                             ` Hugh Dickins
2020-08-07 19:07                                               ` Linus Torvalds
2020-08-07 19:35                                               ` Matthew Wilcox
2020-08-03 13:14                           ` Michal Hocko
2020-08-03 17:56                             ` Linus Torvalds
2020-07-25  9:39                         ` Oleg Nesterov
2020-07-23  8:03     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).