From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750898Ab2KVS02 (ORCPT ); Thu, 22 Nov 2012 13:26:28 -0500 Received: from science.horizon.com ([71.41.210.146]:47717 "HELO science.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750750Ab2KVS01 (ORCPT ); Thu, 22 Nov 2012 13:26:27 -0500 Date: 22 Nov 2012 12:58:24 -0500 Message-ID: <20121122175824.19604.qmail@science.horizon.com> From: "George Spelvin" To: linux-kernel@vger.kernel.org Subject: 3.7-rc6 soft lockup in kswapd0 Cc: linux@horizon.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I'm having an interesting issue with a uniprocessor Pentium 4 machine locking up overnight. 3.6.5 didn't do that, but 3.7-rc6 is not doing so well. It's kind of a funny lockup. Some things work: - TCP SYN handshake - Alt-SysRq And others don't: - Caps lock - Shift-PgUp - Alt-Fn - Screen unblanking - Actually talking to a daemon This is a "headless" machine that boots to a text console and has zero console activity until the lockup. This has happened overnight, three nights in a row. I had to turn screen blanking off to see anything on the screen. Running the daily cron jobs manually just now didn't trigger it, so I haven't found a proximate cause. The *first* error has scrolled off the screen, but what I can see an infinite stream (at about 20s intervals) of: BUG: soft lockup - CPU#0 stuck for 22s! [kswapd0:317] Pid: 317, comm: kswapd0 Not tainted 3.7.0-RC6 #224 HP Pavilion 04 P6319A-ABA 750N/P4B266LA EIP: 0060:[] EFLAGS: 00000202 CPU: 0 EIP is at __zone_watermark_ok+0x5f/7e, 0x67/7e, 0x6e/0x7e, or 0x74/7e (Didn't type registers & stack) Call Trace: [] ? zone_watermark_ok_safe+0x34/0x3a [] ? kswapd+0x2fa/0x6f6 [] ? try_to_free_pages+0x4b8/0x4b8 [] ? kthread+0x67/0x6c [] ? ret_from_kernel_thread+0x1b/0x28 [] ? -_kthread_parkme+0x4c.0x4c Code: (didn't type in first line) 5f 67 6e 74 7e c9 39 d6 7f 14 eb 1c 6b c1 2c <8b> 44 05 60 d3 e0 29 c6 fb 39 de 7e 09 41 <39> f9 7c ea b0 01 02 31 c0 5a 5b 5e 5f 5d c3 01 14 85 7c 16 The lack of scrollback limits me to 49 lines of SysRq output, and usually the most interesting part disappears off the screen. Two things I can see: - SysRq-W shows no blocked tasks - SysRq-M shows zero swap in use, and apparently adequate free memory DMA: = 9048kB Normal: = 116312kB HighMem: = 41660kB 416557 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 4883724kB Total swap = 4883724kB 524260 pages RAM 296958 pages HighMem 5221 pages reserved 406417 pages shared 351419 pages non-shared Does anyone have any debugging suggestions? Waiting overnight to make a good/bad decision makes bisecting pretty slow...