From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752626Ab1EIF5L (ORCPT ); Mon, 9 May 2011 01:57:11 -0400 Received: from smtprelay.restena.lu ([158.64.1.62]:44041 "EHLO smtprelay.restena.lu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752523Ab1EIF5J convert rfc822-to-8bit (ORCPT ); Mon, 9 May 2011 01:57:09 -0400 Date: Mon, 9 May 2011 07:57:09 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: Dave Chinner Cc: linux-kernel@vger.kernel.org, Markus Trippelsdorf , xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Christoph Hellwig , Alex Elder , Dave Chinner Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110509075709.3c527fd2@pluto.restena.lu> In-Reply-To: <20110505223513.3654c041@neptune.home> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> <20110505223513.3654c041@neptune.home> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 5 May 2011 22:35:13 Bruno Prémont wrote: > On Thu, 05 May 2011 Dave Chinner wrote: > > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > > Dave Chinner wrote: > > > > > > OK, so the common elements here appears to be root filesystems > > > > > > with small log sizes, which means they are tail pushing all the > > > > > > time metadata operations are in progress. Definitely seems like a > > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > > myself in the past week or so, though I have no reliable reproduction > > > > > technique. Just wondering if more data points will help isolate the > > > > > cause, and if so, how to be prepared to get them. > > > > > > > > > > For whatever its worth, my last lockup was while running > > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > > appear to occur with the following patch to he race condition in > > > > the AIL workqueue trigger. > > > > > > They are still there, just harder to hit. > > > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > > gets the system moving again because that changes the push target. > > > > > > I've found two more bugs, and now my test case is now reliably > > > reproducably a 5-10s pause at ~1M created 1byte files and then > > > hanging at about 1.25M files. So there's yet another problem lurking > > > that I need to get to the bottom of. > > > > Which, of course, was the real regression. The patch below has > > survived a couple of hours of testing, which fixes all 4 of the > > problems I found. Please test. > > Successfully survives my 2-hours session of today. Will continue testing > during week-end and see if it also survives the longer whole-day sessions. > > Will report results at end of week-end (or earlier in case of trouble). Also survived the whole week-end (at least twice 10 hours) with normal desktop work as well as a few hours of software compilation. (without the patch it would probably have frozen at least twice a day) So looks really good! Thanks, Bruno From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p495rSpA066727 for ; Mon, 9 May 2011 00:53:28 -0500 Date: Mon, 9 May 2011 07:57:09 +0200 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= Subject: Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Message-ID: <20110509075709.3c527fd2@pluto.restena.lu> In-Reply-To: <20110505223513.3654c041@neptune.home> References: <20110423224403.5fd1136a@neptune.home> <20110427050850.GG12436@dastard> <20110427182622.05a068a2@neptune.home> <20110428194528.GA1627@x4.trippels.de> <20110429011929.GA13542@dastard> <20110504005736.GA2958@cucamonga.audible.transient.net> <20110505002126.GA26797@dastard> <20110505022613.GA26837@dastard> <20110505122117.GB26837@dastard> <20110505223513.3654c041@neptune.home> Mime-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Christoph Hellwig , xfs-masters@oss.sgi.com, Dave@oss.sgi.com, Alex Elder , Markus Trippelsdorf T24gVGh1LCA1IE1heSAyMDExIDIyOjM1OjEzIEJydW5vIFByw6ltb250IHdyb3RlOgo+IE9uIFRo dSwgMDUgTWF5IDIwMTEgRGF2ZSBDaGlubmVyIHdyb3RlOgo+ID4gT24gVGh1LCBNYXkgMDUsIDIw MTEgYXQgMTI6MjY6MTNQTSArMTAwMCwgRGF2ZSBDaGlubmVyIHdyb3RlOgo+ID4gPiBPbiBUaHUs IE1heSAwNSwgMjAxMSBhdCAxMDoyMToyNkFNICsxMDAwLCBEYXZlIENoaW5uZXIgd3JvdGU6Cj4g PiA+ID4gT24gV2VkLCBNYXkgMDQsIDIwMTEgYXQgMTI6NTc6MzZBTSArMDAwMCwgSmFtaWUgSGVp bG1hbiB3cm90ZToKPiA+ID4gPiA+IERhdmUgQ2hpbm5lciB3cm90ZToKPiA+ID4gPiA+ID4gT0ss IHNvIHRoZSBjb21tb24gZWxlbWVudHMgaGVyZSBhcHBlYXJzIHRvIGJlIHJvb3QgZmlsZXN5c3Rl bXMKPiA+ID4gPiA+ID4gd2l0aCBzbWFsbCBsb2cgc2l6ZXMsIHdoaWNoIG1lYW5zIHRoZXkgYXJl IHRhaWwgcHVzaGluZyBhbGwgdGhlCj4gPiA+ID4gPiA+IHRpbWUgbWV0YWRhdGEgb3BlcmF0aW9u cyBhcmUgaW4gcHJvZ3Jlc3MuIERlZmluaXRlbHkgc2VlbXMgbGlrZSBhCj4gPiA+ID4gPiA+IHJh Y2UgaW4gdGhlIEFJTCB3b3JrcXVldWUgdHJpZ2dlciBtZWNoYW5pc20uIEknbGwgc2VlIGlmIEkg Y2FuCj4gPiA+ID4gPiA+IHJlcHJvZHVjZSB0aGlzIGFuZCBjb29rIHVwIGEgcGF0Y2ggdG8gZml4 IGl0Lgo+ID4gPiA+ID4gCj4gPiA+ID4gPiBJcyB0aGVyZSB2YWx1ZSBpbiBjb250aW51aW5nIHRv IHBvc3Qgc3lzcnEtdywgc3lzcnEtbCwgeGZzX2luZm8sIGFuZAo+ID4gPiA+ID4gb3RoZXIgYXNz b3J0ZWQgZmVlZGJhY2sgd3J0IHRoaXMgaXNzdWU/ICBJJ3ZlIGhhZCBpdCBoYXBwZW4gdHdpY2Ug bm93Cj4gPiA+ID4gPiBteXNlbGYgaW4gdGhlIHBhc3Qgd2VlayBvciBzbywgdGhvdWdoIEkgaGF2 ZSBubyByZWxpYWJsZSByZXByb2R1Y3Rpb24KPiA+ID4gPiA+IHRlY2huaXF1ZS4gIEp1c3Qgd29u ZGVyaW5nIGlmIG1vcmUgZGF0YSBwb2ludHMgd2lsbCBoZWxwIGlzb2xhdGUgdGhlCj4gPiA+ID4g PiBjYXVzZSwgYW5kIGlmIHNvLCBob3cgdG8gYmUgcHJlcGFyZWQgdG8gZ2V0IHRoZW0uCj4gPiA+ ID4gPiAKPiA+ID4gPiA+IEZvciB3aGF0ZXZlciBpdHMgd29ydGgsIG15IGxhc3QgbG9ja3VwIHdh cyB3aGlsZSBydW5uaW5nCj4gPiA+ID4gPiAyLjYuMzktcmM1LTAwMTI3LWcxYmU2YTFmIHdpdGgg YSBwcmVlbXB0IGNvbmZpZyB3aXRob3V0IGNncm91cHMuCj4gPiA+ID4gCj4gPiA+ID4gQ2FuIHlv dSBhbGwgdHJ5IHRoZSBwYXRjaCBiZWxvdz8gSSd2ZSBtYW5hZ2VkIHRvIHRyaWdnZXIgYSBjb3Vw bGUgb2YKPiA+ID4gPiB4bG9nX3dhaXQoKSBsb2NrdXBzIGluIHNvbWUgY29udHJvbGxlZCBsb2Fk IHRlc3RzLiBUaGUgbG9ja3VwcyBkb24ndAo+ID4gPiA+IGFwcGVhciB0byBvY2N1ciB3aXRoIHRo ZSBmb2xsb3dpbmcgcGF0Y2ggdG8gaGUgcmFjZSBjb25kaXRpb24gaW4KPiA+ID4gPiB0aGUgQUlM IHdvcmtxdWV1ZSB0cmlnZ2VyLgo+ID4gPiAKPiA+ID4gVGhleSBhcmUgc3RpbGwgdGhlcmUsIGp1 c3QgaGFyZGVyIHRvIGhpdC4KPiA+ID4gCj4gPiA+IEZXSVcsIEkndmUgYWxzbyBkaXNjb3ZlcmVk IHRoYXQgImVjaG8gMiA+IC9wcm9jL3N5cy92bS9kcm9wX2NhY2hlcyIKPiA+ID4gZ2V0cyB0aGUg c3lzdGVtIG1vdmluZyBhZ2FpbiBiZWNhdXNlIHRoYXQgY2hhbmdlcyB0aGUgcHVzaCB0YXJnZXQu Cj4gPiA+IAo+ID4gPiBJJ3ZlIGZvdW5kIHR3byBtb3JlIGJ1Z3MsIGFuZCBub3cgbXkgdGVzdCBj YXNlIGlzIG5vdyByZWxpYWJseQo+ID4gPiByZXByb2R1Y2FibHkgYSA1LTEwcyBwYXVzZSBhdCB+ MU0gY3JlYXRlZCAxYnl0ZSBmaWxlcyBhbmQgdGhlbgo+ID4gPiBoYW5naW5nIGF0IGFib3V0IDEu MjVNIGZpbGVzLiBTbyB0aGVyZSdzIHlldCBhbm90aGVyIHByb2JsZW0gbHVya2luZwo+ID4gPiB0 aGF0IEkgbmVlZCB0byBnZXQgdG8gdGhlIGJvdHRvbSBvZi4KPiA+IAo+ID4gV2hpY2gsIG9mIGNv dXJzZSwgd2FzIHRoZSByZWFsIHJlZ3Jlc3Npb24uIFRoZSBwYXRjaCBiZWxvdyBoYXMKPiA+IHN1 cnZpdmVkIGEgY291cGxlIG9mIGhvdXJzIG9mIHRlc3RpbmcsIHdoaWNoIGZpeGVzIGFsbCA0IG9m IHRoZQo+ID4gcHJvYmxlbXMgSSBmb3VuZC4gUGxlYXNlIHRlc3QuCj4gCj4gU3VjY2Vzc2Z1bGx5 IHN1cnZpdmVzIG15IDItaG91cnMgc2Vzc2lvbiBvZiB0b2RheS4gV2lsbCBjb250aW51ZSB0ZXN0 aW5nCj4gZHVyaW5nIHdlZWstZW5kIGFuZCBzZWUgaWYgaXQgYWxzbyBzdXJ2aXZlcyB0aGUgbG9u Z2VyIHdob2xlLWRheSBzZXNzaW9ucy4KPiAKPiBXaWxsIHJlcG9ydCByZXN1bHRzIGF0IGVuZCBv ZiB3ZWVrLWVuZCAob3IgZWFybGllciBpbiBjYXNlIG9mIHRyb3VibGUpLgoKQWxzbyBzdXJ2aXZl ZCB0aGUgd2hvbGUgd2Vlay1lbmQgKGF0IGxlYXN0IHR3aWNlIDEwIGhvdXJzKSB3aXRoCm5vcm1h bCBkZXNrdG9wIHdvcmsgYXMgd2VsbCBhcyBhIGZldyBob3VycyBvZiBzb2Z0d2FyZSBjb21waWxh dGlvbi4KKHdpdGhvdXQgdGhlIHBhdGNoIGl0IHdvdWxkIHByb2JhYmx5IGhhdmUgZnJvemVuIGF0 IGxlYXN0IHR3aWNlIGEgZGF5KQoKU28gbG9va3MgcmVhbGx5IGdvb2QhCgpUaGFua3MsCkJydW5v CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwp4ZnMgbWFp bGluZyBsaXN0Cnhmc0Bvc3Muc2dpLmNvbQpodHRwOi8vb3NzLnNnaS5jb20vbWFpbG1hbi9saXN0 aW5mby94ZnMK