From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757922Ab3BRGS3 (ORCPT ); Mon, 18 Feb 2013 01:18:29 -0500 Received: from mail-da0-f42.google.com ([209.85.210.42]:36953 "EHLO mail-da0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753372Ab3BRGS1 (ORCPT ); Mon, 18 Feb 2013 01:18:27 -0500 Message-ID: <5121C7AF.2090803@numascale-asia.com> Date: Mon, 18 Feb 2013 14:18:23 +0800 From: Daniel J Blueman Organization: Numascale Asia User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Jiri Slaby CC: "Linux Kernel" , "Steffen Persvold" Subject: Re: kswapd craziness round 2 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday, 18 February 2013 06:10:02 UTC+8, Jiri Slaby wrote: > Hi, > > You still feel the sour taste of the "kswapd craziness in v3.7" thread, > right? Welcome to the hell, part two :{. > > I believe this started happening after update from > 3.8.0-rc4-next-20130125 to 3.8.0-rc7-next-20130211. The same as before, > many hours of uptime are needed and perhaps some suspend/resume cycles > too. Memory pressure is not high, plenty of I/O cache: > # free > total used free shared buffers cached > Mem: 6026692 5571184 455508 0 351252 2016648 > -/+ buffers/cache: 3203284 2823408 > Swap: 0 0 0 > > kswap is working very toughly though: > root 580 0.6 0.0 0 0 ? S úno12 46:21 [kswapd0] > > This happens on I/O activity right now. For example by updatedb or find > /. This is what the stack trace of kswapd0 looks like: > [] shrink_slab+0xa1/0x2d0 > [] kswapd+0x541/0x930 > [] kthread+0xc0/0xd0 > [] ret_from_fork+0x7c/0xb0 > [] 0xffffffffffffffff Likewise with 3.8-rc, I've been able to reproduce [1] a livelock scenario which hoses the box and observe RCU stalls are observed [2]. There may be a connection; I'll do a bit more debugging in the next few days. Daniel --- [1] 1. live-booted image using ramdisk 2. boot 3.8-rc with <16GB memory and without swap 3. run OpenMP NAS Parallel Benchmark dc.B against local disk (ie not ramdisk) 4. observe hang O(30) mins later --- [2] [ 2675.587878] INFO: rcu_sched self-detected stall on CPU { 5} (t=24000 jiffies g=6313 c=6312 q=68) -- Daniel J Blueman Principal Software Engineer, Numascale Asia