From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760888Ab1D2Tfk (ORCPT ); Fri, 29 Apr 2011 15:35:40 -0400 Received: from legolas.restena.lu ([158.64.1.34]:42304 "EHLO legolas.restena.lu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760876Ab1D2Tfh convert rfc822-to-8bit (ORCPT ); Fri, 29 Apr 2011 15:35:37 -0400 Date: Fri, 29 Apr 2011 21:35:24 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Markus Trippelsdorf Cc: Dave Chinner , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner , linux-kernel@vger.kernel.org, James Bottomley Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110429213524.449e003b@neptune.home> In-Reply-To: <20110429151841.GA893@x4.trippels.de> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110429151841.GA893@x4.trippels.de> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 29 April 2011 Markus Trippelsdorf wrote: > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > > > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > > > On Wed, 27 April 2011 Dave Chinner wrote: > > > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > > > up in D state. > > > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > > > (+ 1 patch of mine to stop disk on reboot), > > > > > > full dmesg available if needed; kernel config attached (only selected > > > > > > options). In case there is something I should do at next occurrence > > > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > > > happen very often. > > > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > > > [32040.120153] Call Trace: > > > > > > [32040.120220] [] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > > > [32040.120239] [] ? kmem_cache_alloc+0x2d/0x110 > > > > > > [32040.120294] [] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > > > [32040.120346] [] xlog_wait+0x4b/0x70 [xfs] > > > > > > [32040.120359] [] ? try_to_wake_up+0xc0/0xc0 > > > > > > [32040.120411] [] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > > > [32040.120464] [] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > > > [32040.120516] [] xfs_log_reserve+0xab/0xb0 [xfs] > > > > > > [32040.120571] [] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > > > work queue. Can you post the output of "xfs_info " and the > > > > > mount options (/proc/mounts) used on you system? > > > > > > I may have hit the same problem today and managed to capture some sysrq-l > > > and sysrq-w output. > > > > > > The system was largely unusable during this incident. I could still > > > switch between X and the console (and press the sysrq key-combination), > > > but I couldn't run any commands in the terminal. > > > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > that Thomas Gleixner fixed yesterday: > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > It also looks similar to the issue that James Bottomley reported > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without other changes to kernel). Going to run that kernel for the week-end and beyond if it survives to see what happens. Bruno From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p3TJW5au078314 for ; Fri, 29 Apr 2011 14:32:05 -0500 Date: Fri, 29 Apr 2011 21:35:24 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110429213524.449e003b@neptune.home> In-Reply-To: <20110429151841.GA893@x4.trippels.de> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110429151841.GA893@x4.trippels.de> Mime-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Markus Trippelsdorf Cc: James Bottomley , Dave Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Christoph Hellwig , xfs-masters@oss.sgi.com, Alex Elder T24gRnJpLCAyOSBBcHJpbCAyMDExIE1hcmt1cyBUcmlwcGVsc2RvcmYgd3JvdGU6Cj4gT24gMjAx MS4wNC4yOSBhdCAxMToxOSArMTAwMCwgRGF2ZSBDaGlubmVyIHdyb3RlOgo+ID4gT24gVGh1LCBB cHIgMjgsIDIwMTEgYXQgMDk6NDU6MjhQTSArMDIwMCwgTWFya3VzIFRyaXBwZWxzZG9yZiB3cm90 ZToKPiA+ID4gT24gMjAxMS4wNC4yNyBhdCAxODoyNiArMDIwMCwgQnJ1bm8gUHLDqW1vbnQgd3Jv dGU6Cj4gPiA+ID4gT24gV2VkLCAyNyBBcHJpbCAyMDExIERhdmUgQ2hpbm5lciB3cm90ZToKPiA+ ID4gPiA+IE9uIFNhdCwgQXByIDIzLCAyMDExIGF0IDEwOjQ0OjAzUE0gKzAyMDAsIEJydW5vIFBy w6ltb250IHdyb3RlOgo+ID4gPiA+ID4gPiBSdW5uaW5nIDIuNi4zOS1yYzMrIGFuZCBub3cgYWdh aW4gb24gMi42LjM5LXJjNCsgKEkndmUgbm90IHRlc3RlZCAtcmMxCj4gPiA+ID4gPiA+IG9yIC1y YzIpIEkndmUgaGl0IGEgImR5aW5nIG1hY2hpbmUiIHdoZXJlIHByb2Nlc3NlcyB3cml0aW5nIHRv IGRpc2sgZW5kCj4gPiA+ID4gPiA+IHVwIGluIEQgc3RhdGUuCj4gPiA+ID4gPiA+IEZyb20gb2Nj dXJyZW5jZSB3aXRoIC1yYzMrIEkgZG9uJ3QgaGF2ZSBsb2dzIGFzIHRob3NlIG5ldmVyIGhpdCB0 aGUgZGlzaywKPiA+ID4gPiA+ID4gZm9yIC1yYzQrIEkgaGF2ZSB0aGUgZm9sbG93aW5nIChzeXNy cSt0IHdhcyB0b28gYmlnLCB3aGF0IEkgaGF2ZSBvZiBpdAo+ID4gPiA+ID4gPiBtaXNzZXMgYSBk b3plbiBvZiBrZXJuZWwgdGFza3MgLSBpZiBuZWVkZWQsIHBsZWFzZSBhc2spOgo+ID4gPiA+ID4g PiAKPiA+ID4gPiA+ID4gVGhlIC1yYzQga2VybmVsIGlzIGF0IGNvbW1pdCA1ODRmNzkwNDY3ODBl MTBjYjI0MzY3YTY5MWY4YzI4Mzk4YTAwZTg0Cj4gPiA+ID4gPiA+ICgrIDEgcGF0Y2ggb2YgbWlu ZSB0byBzdG9wIGRpc2sgb24gcmVib290KSwKPiA+ID4gPiA+ID4gZnVsbCBkbWVzZyBhdmFpbGFi bGUgaWYgbmVlZGVkOyBrZXJuZWwgY29uZmlnIGF0dGFjaGVkIChvbmx5IHNlbGVjdGVkCj4gPiA+ ID4gPiA+IG9wdGlvbnMpLiBJbiBjYXNlIHRoZXJlIGlzIHNvbWV0aGluZyBJIHNob3VsZCBkbyBh dCBuZXh0IG9jY3VycmVuY2UKPiA+ID4gPiA+ID4gcGxlYXNlIHRlbGwuIFVuZm9ydHVuYXRlbHkg SSBoYXZlIG5vIHRyaWdnZXIgZm9yIGl0IGFuZCBpdCBkb2VzIG5vdAo+ID4gPiA+ID4gPiBoYXBw ZW4gdmVyeSBvZnRlbi4KPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+IFszMjA0MC4xMjAwNTVdIElO Rk86IHRhc2sgZmx1c2gtODowOjE2NjUgYmxvY2tlZCBmb3IgbW9yZSB0aGFuIDEyMCBzZWNvbmRz Lgo+ID4gPiA+ID4gPiBbMzIwNDAuMTIwMDY4XSAiZWNobyAwID4gL3Byb2Mvc3lzL2tlcm5lbC9o dW5nX3Rhc2tfdGltZW91dF9zZWNzIiBkaXNhYmxlcyB0aGlzIG1lc3NhZ2UuCj4gPiA+ID4gPiA+ IFszMjA0MC4xMjAwNzddIGZsdXNoLTg6MCAgICAgICBEIDAwMDAwMDAwICA0OTA4ICAxNjY1ICAg ICAgMiAweDAwMDAwMDAwCj4gPiA+ID4gPiA+IFszMjA0MC4xMjAwOTldICBmNTVlZmI1YyAwMDAw MDA0NiAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMSBlMDM4MjkyNCAwMDAwMDAw MAo+ID4gPiA+ID4gPiBbMzIwNDAuMTIwMTE4XSAgZjU1ZWZiMGMgZjU1ZWZiNWMgMDAwMDAwMDQg ZjYyOWJhNzAgNTcyZjAxYTIgMDAwMDFjZmUgZjYyOWJhNzAgZmZmZmZmYzAKPiA+ID4gPiA+ID4g WzMyMDQwLjEyMDEzNV0gIGY1NWVmYzY4IGY1NWVmYjMwIGY4ODlkN2Y4IGY1NWVmYjIwIDAwMDAw MDAwIGY1NWVmYzY4IGUwMzgyOTAwIGY1NWVmYzk0Cj4gPiA+ID4gPiA+IFszMjA0MC4xMjAxNTNd IENhbGwgVHJhY2U6Cj4gPiA+ID4gPiA+IFszMjA0MC4xMjAyMjBdICBbPGY4ODlkN2Y4Pl0gPyB4 ZnNfYm1hcF9zZWFyY2hfbXVsdGlfZXh0ZW50cysweDg4LzB4ZTAgW3hmc10KPiA+ID4gPiA+ID4g WzMyMDQwLjEyMDIzOV0gIFs8YzEwOWNlMWQ+XSA/IGttZW1fY2FjaGVfYWxsb2MrMHgyZC8weDEx MAo+ID4gPiA+ID4gPiBbMzIwNDAuMTIwMjk0XSAgWzxmODhjODhjYT5dID8geGxvZ19zcGFjZV9s ZWZ0KzB4MmEvMHhjMCBbeGZzXQo+ID4gPiA+ID4gPiBbMzIwNDAuMTIwMzQ2XSAgWzxmODhjODVj Yj5dIHhsb2dfd2FpdCsweDRiLzB4NzAgW3hmc10KPiA+ID4gPiA+ID4gWzMyMDQwLjEyMDM1OV0g IFs8YzEwMmNhMDA+XSA/IHRyeV90b193YWtlX3VwKzB4YzAvMHhjMAo+ID4gPiA+ID4gPiBbMzIw NDAuMTIwNDExXSAgWzxmODhjOTQ4Yj5dIHhsb2dfZ3JhbnRfbG9nX3NwYWNlKzB4OGIvMHgyNDAg W3hmc10KPiA+ID4gPiA+ID4gWzMyMDQwLjEyMDQ2NF0gIFs8Zjg4YzkzNmU+XSA/IHhsb2dfZ3Jh bnRfcHVzaF9haWwrMHhiZS8weGYwIFt4ZnNdCj4gPiA+ID4gPiA+IFszMjA0MC4xMjA1MTZdICBb PGY4OGM5OWRiPl0geGZzX2xvZ19yZXNlcnZlKzB4YWIvMHhiMCBbeGZzXQo+ID4gPiA+ID4gPiBb MzIwNDAuMTIwNTcxXSAgWzxmODhkNmRjOD5dIHhmc190cmFuc19yZXNlcnZlKzB4NzgvMHgxZjAg W3hmc10KPiA+ID4gPiA+IAo+ID4gPiA+ID4gSG1tbW1tLiBUaGF0IG1heSBiZSBjYXVzZWQgYnkg dGhlIGNvbnZlcnNpb24gb2YgdGhlIHhmc2FpbGQgdG8gYQo+ID4gPiA+ID4gd29yayBxdWV1ZS4g Q2FuIHlvdSBwb3N0IHRoZSBvdXRwdXQgb2YgInhmc19pbmZvIDxtbnRwdD4iIGFuZCB0aGUKPiA+ ID4gPiA+IG1vdW50IG9wdGlvbnMgKC9wcm9jL21vdW50cykgdXNlZCBvbiB5b3Ugc3lzdGVtPwo+ ID4gPiAKPiA+ID4gSSBtYXkgaGF2ZSBoaXQgdGhlIHNhbWUgcHJvYmxlbSB0b2RheSBhbmQgbWFu YWdlZCB0byBjYXB0dXJlIHNvbWUgc3lzcnEtbAo+ID4gPiBhbmQgc3lzcnEtdyBvdXRwdXQuIAo+ ID4gPiAKPiA+ID4gVGhlIHN5c3RlbSB3YXMgbGFyZ2VseSB1bnVzYWJsZSBkdXJpbmcgdGhpcyBp bmNpZGVudC4gSSBjb3VsZCBzdGlsbAo+ID4gPiBzd2l0Y2ggYmV0d2VlbiBYIGFuZCB0aGUgY29u c29sZSAoYW5kIHByZXNzIHRoZSBzeXNycSBrZXktY29tYmluYXRpb24pLAo+ID4gPiBidXQgSSBj b3VsZG4ndCBydW4gYW55IGNvbW1hbmRzIGluIHRoZSB0ZXJtaW5hbC4KPiA+IAo+ID4gT0ssIHNv IHRoZSBjb21tb24gZWxlbWVudHMgaGVyZSBhcHBlYXJzIHRvIGJlIHJvb3QgZmlsZXN5c3RlbXMK PiA+IHdpdGggc21hbGwgbG9nIHNpemVzLCB3aGljaCBtZWFucyB0aGV5IGFyZSB0YWlsIHB1c2hp bmcgYWxsIHRoZQo+ID4gdGltZSBtZXRhZGF0YSBvcGVyYXRpb25zIGFyZSBpbiBwcm9ncmVzcy4g RGVmaW5pdGVseSBzZWVtcyBsaWtlIGEKPiA+IHJhY2UgaW4gdGhlIEFJTCB3b3JrcXVldWUgdHJp Z2dlciBtZWNoYW5pc20uIEknbGwgc2VlIGlmIEkgY2FuCj4gPiByZXByb2R1Y2UgdGhpcyBhbmQg Y29vayB1cCBhIHBhdGNoIHRvIGZpeCBpdC4KPiAKPiBIbW0sIEknbSB3b25kZXJpbmcgaWYgdGhp cyBpc3N1ZSBpcyBzb21laG93IHJlbGF0ZWQgdG8gdGhlIGhydGltZXIgYnVnLAo+IHRoYXQgVGhv bWFzIEdsZWl4bmVyIGZpeGVkIHllc3RlcmRheToKPiBodHRwOi8vZ2l0LnVzLmtlcm5lbC5vcmcv P3A9bGludXgva2VybmVsL2dpdC90aXAvbGludXgtMi42LXRpcC5naXQ7YT1jb21taXQ7aD1jZTMx MzMyZDNjNzc1MzJkNmVhOTdkZGNiNDc1YTJiMDJkZDM1OGI0Cj4gaHR0cDovL3RocmVhZC5nbWFu ZS5vcmcvZ21hbmUubGludXgua2VybmVsLm1tLzYxOTA5Lwo+IAo+IEl0IGFsc28gbG9va3Mgc2lt aWxhciB0byB0aGUgaXNzdWUgdGhhdCBKYW1lcyBCb3R0b21sZXkgcmVwb3J0ZWQKPiBlYXJsaWVy OiBodHRwOi8vdGhyZWFkLmdtYW5lLm9yZy9nbWFuZS5saW51eC5rZXJuZWwubW0vNjIxODUvIAoK SSdtIGdvaW5nIHRvIHNlZSwgSSd2ZSBhcHBsaWVkIFRob21hcycgZml4IG9uIHRoZSBib3ggc2Vl aW5nIFhGUyBmcmVlemUgKHdpdGhvdXQKb3RoZXIgY2hhbmdlcyB0byBrZXJuZWwpLgpHb2luZyB0 byBydW4gdGhhdCBrZXJuZWwgZm9yIHRoZSB3ZWVrLWVuZCBhbmQgYmV5b25kIGlmIGl0IHN1cnZp dmVzIHRvIHNlZSB3aGF0CmhhcHBlbnMuCgpCcnVubwoKX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX18KeGZzIG1haWxpbmcgbGlzdAp4ZnNAb3NzLnNnaS5jb20K aHR0cDovL29zcy5zZ2kuY29tL21haWxtYW4vbGlzdGluZm8veGZzCg==