From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p666l7NL088268 for ; Wed, 6 Jul 2011 01:47:07 -0500 Received: from mail-qy0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 94462134E0A1 for ; Tue, 5 Jul 2011 23:47:03 -0700 (PDT) Received: from mail-qy0-f181.google.com (mail-qy0-f181.google.com [209.85.216.181]) by cuda.sgi.com with ESMTP id pvHQQ4ZpFMPbHJcz for ; Tue, 05 Jul 2011 23:47:03 -0700 (PDT) Received: by qyk9 with SMTP id 9so4476207qyk.5 for ; Tue, 05 Jul 2011 23:47:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110706045301.GA11604@localhost> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> <20110706045301.GA11604@localhost> Date: Wed, 6 Jul 2011 15:47:02 +0900 Message-ID: Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering From: Minchan Kim List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Wu Fengguang Cc: "xfs@oss.sgi.com" , Christoph Hellwig , "linux-mm@kvack.org" , Mel Gorman , Johannes Weiner T24gV2VkLCBKdWwgNiwgMjAxMSBhdCAxOjUzIFBNLCBXdSBGZW5nZ3VhbmcgPGZlbmdndWFuZy53 dUBpbnRlbC5jb20+IHdyb3RlOgo+IE9uIE1vbiwgSnVsIDA0LCAyMDExIGF0IDExOjI1OjM0QU0g KzA4MDAsIERhdmUgQ2hpbm5lciB3cm90ZToKPj4gT24gRnJpLCBKdWwgMDEsIDIwMTEgYXQgMTE6 NDE6MzZQTSArMDgwMCwgV3UgRmVuZ2d1YW5nIHdyb3RlOgo+PiA+IENocmlzdG9waCwKPj4gPgo+ PiA+IE9uIEZyaSwgSnVsIDAxLCAyMDExIGF0IDA1OjMzOjA1UE0gKzA4MDAsIENocmlzdG9waCBI ZWxsd2lnIHdyb3RlOgo+PiA+ID4gSm9oYW5uZXMsIE1lbCwgV3UsCj4+ID4gPgo+PiA+ID4gRGF2 ZSBoYXMgYmVlbiBzdHJlc3Npbmcgc29tZSBYRlMgcGF0Y2hlcyBvZiBtaW5lIHRoYXQgcmVtb3Zl IHRoZSBYRlMKPj4gPiA+IGludGVybmFsIHdyaXRlYmFjayBjbHVzdGVyaW5nIGluIGZhdm91ciBv ZiB1c2luZyB3cml0ZV9jYWNoZV9wYWdlcy4KPj4gPiA+Cj4+ID4gPiBBcyBwYXJ0IG9mIGludmVz dGlnYXRpbmcgdGhlIGJlaGF2aW91ciBoZSBmb3VuZCBvdXQgdGhhdCB3ZSdyZSBzdGlsbAo+PiA+ ID4gZG9pbmcgbG90cyBvZiBJL08gZnJvbSB0aGUgZW5kIG9mIHRoZSBMUlUgaW4ga3N3YXBkLiDC oE5vdCBvbmx5IGlzIHRoYXQKPj4gPiA+IHByZXR0eSBiYWQgYmVoYXZpb3VyIGluIGdlbmVyYWws IGJ1dCBpdCBhbHNvIG1lYW5zIHdlIHJlYWxseSBjYW4ndAo+PiA+ID4ganVzdCByZW1vdmUgdGhl IHdyaXRlYmFjayBjbHVzdGVyaW5nIGluIHdyaXRlcGFnZSBnaXZlbiBob3cgbXVjaAo+PiA+ID4g SS9PIGlzIHN0aWxsIGRvbmUgdGhyb3VnaCB0aGF0Lgo+PiA+ID4KPj4gPiA+IEFueSBjaGFuY2Ug d2UgY291bGQgdGhlIHdyaXRlYmFjayB2cyBrc3dhcCBiZWhhdmlvdXIgc29ydGVkIG91dCBhIGJp dAo+PiA+ID4gYmV0dGVyIGZpbmFsbHk/Cj4+ID4KPj4gPiBJIG9uY2UgdHJpZWQgdGhpcyBhcHBy b2FjaDoKPj4gPgo+PiA+IGh0dHA6Ly93d3cuc3Bpbmljcy5uZXQvbGlzdHMvbGludXgtbW0vbXNn MDkyMDIuaHRtbAo+PiA+Cj4+ID4gSXQgdXNlZCBhIGxpc3Qgc3RydWN0dXJlIHRoYXQgaXMgbm90 IGxpbmVhcmx5IHNjYWxhYmxlLCBob3dldmVyIHRoYXQKPj4gPiBwYXJ0IHNob3VsZCBiZSBpbmRl cGVuZGVudGx5IGltcHJvdmFibGUgd2hlbiBuZWNlc3NhcnkuCj4+Cj4+IEkgZG9uJ3QgdGhpbmsg dGhhdCBoYW5kaW5nIHJhbmRvbSB3cml0ZWJhY2sgdG8gdGhlIGZsdXNoZXIgdGhyZWFkIGlzCj4+ IG11Y2ggYmV0dGVyIHRoYW4gZG9pbmcgcmFuZG9tIHdyaXRlYmFjayBkaXJlY3RseS4gwqBZZXMs IHlvdSBhZGRlZAo+PiBzb21lIGNsdXN0ZXJpbmcsIGJ1dCBJJ20gc3RpbGwgZG9uJ3QgdGhpbmsg d3JpdGluZyBzcGVjaWZpYyBwYWdlcyBpcwo+PiB0aGUgYmVzdCBzb2x1dGlvbi4KPgo+IEkgYWdy ZWUgdGhhdCB0aGUgVk0gc2hvdWxkIGF2b2lkIHdyaXRpbmcgc3BlY2lmaWMgcGFnZXMgYXMgbXVj aCBhcwo+IHBvc3NpYmxlLiBNb3N0bHkgb2Z0ZW4sIGl0J3MgaW5kZWVkIE9LIHRvIGp1c3Qgc2tp cCBzcG9yYWRpY2FsbHkKPiBlbmNvdW50ZXJlZCBkaXJ0eSBwYWdlIGFuZCByZWNsYWltIHRoZSBj bGVhbiBwYWdlcyBwcmVzdW1hYmx5IG5vdAo+IGZhciBhd2F5IGluIHRoZSBMUlUgbGlzdC4gU28g eW91ciAyLWxpbmVyIHBhdGNoIGlzIGFsbCBnb29kIGlmCj4gY29uc3RyYWluaW5nIGl0IHRvIGxv dyBzY2FuIHByZXNzdXJlLCB3aGljaCB3aWxsIGxvb2sgbGlrZQo+Cj4gwqAgwqAgwqAgwqBpZiAo cHJpb3JpdHkgPT0gREVGX1BSSU9SSVRZKQo+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgdGFnIFBH X3JlY2xhaW0gb24gZW5jb3VudGVyZWQgZGlydHkgcGFnZXMgYW5kCj4gwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqBza2lwIHdyaXRpbmcgaXQKPgo+IEhvd2V2ZXIgdGhlIFZNIGluIGdlbmVyYWwgZG9l cyBuZWVkIHRoZSBhYmlsaXR5IHRvIHdyaXRlIHNwZWNpZmljCj4gcGFnZXMsIHN1Y2ggYXMgd2hl biByZWNsYWltaW5nIGZyb20gc3BlY2lmaWMgem9uZS9tZW1jZy4gU28gSSdsbCBzdGlsbAo+IHBy b3Bvc2UgdG8gZG8gYmRpX3N0YXJ0X2lub2RlX3dyaXRlYmFjaygpLgo+Cj4gQmVsb3cgaXMgdGhl IHBhdGNoIHJlYmFzZWQgdG8gbGludXgtbmV4dC4gSXQncyBnb29kIGVub3VnaCBmb3IgdGVzdGlu Zwo+IHB1cnBvc2UsIGFuZCBJIGd1ZXNzIGV2ZW4gd2l0aCB0aGUgLT5ucl9wYWdlcyB3b3JrIGlz c3VlLCBpdCdzCj4gY29tcGxldGUgZW5vdWdoIHRvIGdldCByb3VnaGx5IHRoZSBzYW1lIHBlcmZv cm1hbmNlIGFzIHlvdXIgMi1saW5lcgo+IHBhdGNoLgo+Cj4+ID4gVGhlIHJlYWwgcHJvYmxlbSB3 YXMsIGl0IHNlZW0gdG8gbm90IHZlcnkgZWZmZWN0aXZlIGluIG15IHRlc3QgcnVucy4KPj4gPiBJ IGZvdW5kIG1hbnkgLT5ucl9wYWdlcyB3b3JrcyBxdWV1ZWQgYmVmb3JlIHRoZSAtPmlub2RlIHdv cmtzLCB3aGljaAo+PiA+IGVmZmVjdGl2ZWx5IG1ha2VzIHRoZSBmbHVzaGVyIHdvcmtpbmcgb24g bW9yZSBkaXNwZXJzZWQgcGFnZXMgcmF0aGVyCj4+ID4gdGhhbiBmb2N1c2luZyBvbiB0aGUgZGly dHkgcGFnZXMgZW5jb3VudGVyZWQgaW4gTFJVIHJlY2xhaW0uCj4+Cj4+IEJ1dCB0aGF0J3MgcmVh bGx5IGp1c3QgYW4gaW1wbGVtZW50YXRpb24gaXNzdWUgcmVsYXRlZCB0byBob3cgeW91Cj4+IHRy aWVkIHRvIHNvbHZlIHRoZSBwcm9ibGVtLiBUaGF0IGNvdWxkIGJlIGFkZHJlc3NlZC4KPj4KPj4g SG93ZXZlciwgd2hhdCBJJ20gcXVlc3Rpb25pbmcgaXMgd2hldGhlciB3ZSBzaG91bGQgZXZlbiBj YXJlIHdoYXQKPj4gcGFnZSBtZW1vcnkgcmVjbGFpbSB3YW50cyB0byB3cml0ZSAtIGl0IHNlZW1z IHRvIG1ha2UgZnVuZGFtZW50YWxseQo+PiBiYWQgZGVjaXNpb25zIGZyb20gYW4gSU8gcGVyc2Vw Y3RpdmUuCj4+Cj4+IFdlIGhhdmUgdG8gcmVtZW1iZXIgdGhhdCBtZW1vcnkgcmVjbGFpbSBpcyBk b2luZyBMUlUgcmVjbGFpbSBhbmQgdGhlCj4+IGZsdXNoZXIgdGhyZWFkcyBhcmUgZG9pbmcgIm9s ZGVzdCBmaXJzdCIgd3JpdGViYWNrLiBJT1dzLCBib3RoIGFyZSB0cnlpbmcKPj4gdG8gb3BlcmF0 ZSBpbiB0aGUgc2FtZSBkaXJlY3Rpb24gKG9sZGVzdCB0byB5b3VuZ2VzdCkgZm9yIHRoZSBzYW1l Cj4+IHB1cnBvc2UuIMKgVGhlIGZ1bmRhbWVudGFsIHByb2JsZW0gdGhhdCBvY2N1cnMgd2hlbiBt ZW1vcnkgcmVjbGFpbQo+PiBzdGFydHMgd3JpdGluZyBwYWdlcyBiYWNrIGZyb20gdGhlIExSVSBp cyB0aGlzOgo+Pgo+PiDCoCDCoCDCoCAtIG1lbW9yeSByZWNsYWltIGhhcyBydW4gYWhlYWQgb2Yg SU8gd3JpdGViYWNrIC0KPj4KPj4gVGhlIExSVSB1c3VhbGx5IGxvb2tzIGxpa2UgdGhpczoKPj4K Pj4gwqAgwqAgwqAgb2xkZXN0IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKgeW91bmdlc3QKPj4gwqAgwqAgwqAgKy0tLS0tLS0tLS0tLS0tLSstLS0tLS0t LS0tLS0tLS0rLS0tLS0tLS0tLS0tLS0rCj4+IMKgIMKgIMKgIGNsZWFuIMKgIMKgIMKgIMKgIMKg IHdyaXRlYmFjayDCoCDCoCDCoCBkaXJ0eQo+PiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDC oCDCoCBeIMKgIMKgIMKgIMKgIMKgIMKgIMKgIF4KPj4gwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgfCDCoCDCoCDCoCDCoCDCoCDCoCDCoCB8Cj4+IMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIMKgIMKgIMKgIHwgwqAgwqAgwqAgwqAgwqAgwqAgwqAgV2hlcmUgZmx1c2hlciB3aWxsIG5l eHQgd29yayBmcm9tCj4+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIHwgwqAgwqAg wqAgwqAgwqAgwqAgwqAgV2hlcmUga3N3YXBkIGlzIHdvcmtpbmcgZnJvbQo+PiDCoCDCoCDCoCDC oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCB8Cj4+IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKg IMKgIElPIHN1Ym1pdHRlZCBieSBmbHVzaGVyLCB3YWl0aW5nIG9uIGNvbXBsZXRpb24KPj4KPj4K Pj4gSWYgbWVtb3J5IHJlY2xhaW0gaXMgaGl0dGluZyBkaXJ0eSBwYWdlcyBvbiB0aGUgTFJVLCBp dCBtZWFucyBpdCBoYXMKPj4gZ290IGFoZWFkIG9mIHdyaXRlYmFjayB3aXRob3V0IGJlaW5nIHRo cm90dGxlZCAtIGl0J3MgcGFzc2VkIG92ZXIKPj4gYWxsIHRoZSBwYWdlcyBjdXJyZW50bHkgdW5k ZXIgd3JpdGViYWNrIGFuZCBpcyB0cnlpbmcgdG8gd3JpdGUgYmFjawo+PiBwYWdlcyB0aGF0IGFy ZSAqbmV3ZXIqIHRoYW4gd2hhdCB3cml0ZWJhY2sgaXMgd29ya2luZyBvbi4gSU9XcywgaXQKPj4g c3RhcnRzIHRyeWluZyB0byBkbyB0aGUgam9iIG9mIHRoZSBmbHVzaGVyIHRocmVhZHMsIGFuZCBp dCBkb2VzIHRoYXQKPj4gdmVyeSBiYWRseS4KPj4KPj4gVGhlICQxMDAgcXVlc3Rpb24gaXMg4oiX d2h5IGlzIGl0IGdldHRpbmcgYWhlYWQgb2Ygd3JpdGViYWNrKj8KPgo+IFRoZSBtb3N0IGltcG9y dGFudCBjYXNlIGlzOiBmYXN0ZXIgcmVhZGVyICsgcmVsYXRpdmVseSBzbG93IHdyaXRlci4KPgo+ IEFzc3VtZSBmb3IgZXZlcnkgMTAgcGFnZXMgcmVhZCwgMSBwYWdlIGlzIGRpcnRpZWQsIGFuZCB0 aGUgZGlydHkgc3BlZWQKPiBpcyBmYXN0IGVub3VnaCB0byB0cmlnZ2VyIHRoZSAyMCUgZGlydHkg cmF0aW8gYW5kIGhlbmNlIGRpcnR5IGJhbGFuY2luZy4KPgo+IFRoYXQgcGF0dGVybiBpcyBhYmxl IHRvIGV2ZW5seSBkaXN0cmlidXRlIGRpcnR5IHBhZ2VzIGFsbCBvdmVyIHRoZSBMUlUKPiBsaXN0 IGFuZCBoZW5jZSB0cmlnZ2VyIGxvdHMgb2YgcGFnZW91dCgpcy4gVGhlICJza2lwIHJlY2xhaW0g d3JpdGVzIG9uCj4gbG93IHByZXNzdXJlIiBhcHByb2FjaCBjYW4gZml4IHRoaXMgY2FzZS4KPgo+ IFRoYW5rcywKPiBGZW5nZ3VhbmcKPiAtLS0KPiBTdWJqZWN0OiB3cml0ZWJhY2s6IGludHJvZHVj ZSBiZGlfc3RhcnRfaW5vZGVfd3JpdGViYWNrKCkKPiBEYXRlOiBUaHUgSnVsIDI5IDE0OjQxOjE5 IENTVCAyMDEwCj4KPiBUaGlzIHJlbGF5cyBBU1lOQyBmaWxlIHdyaXRlYmFjayBJT3MgdG8gdGhl IGZsdXNoZXIgdGhyZWFkcy4KPgo+IHBhZ2VvdXQoKSB3aWxsIGNvbnRpbnVlIHRvIHNlcnZlIHRo ZSBTWU5DIGZpbGUgcGFnZSB3cml0ZXMgZm9yIG5lY2Vzc2FyeQo+IHRocm90dGxpbmcgZm9yIHBy ZXZlbnRpbmcgT09NLCB3aGljaCBtYXkgaGFwcGVuIGlmIHRoZSBMUlUgbGlzdCBpcyBzbWFsbAo+ IGFuZC9vciB0aGUgc3RvcmFnZSBpcyBzbG93LCBzbyB0aGF0IHRoZSBmbHVzaGVyIGNhbm5vdCBj bGVhbiBlbm91Z2gKPiBwYWdlcyBiZWZvcmUgdGhlIExSVSBpcyBmdWxsIHNjYW5uZWQuCj4KPiBP bmx5IEFTWU5DIHBhZ2VvdXQoKSBpcyByZWxheWVkIHRvIHRoZSBmbHVzaGVyIHRocmVhZHMsIHRo ZSBsZXNzCj4gZnJlcXVlbnQgU1lOQyBwYWdlb3V0KClzIHdpbGwgd29yayBhcyBiZWZvcmUgYXMg YSBsYXN0IHJlc29ydC4KPiBUaGlzIGhlbHBzIHRvIGF2b2lkIE9PTSB3aGVuIHRoZSBMUlUgbGlz dCBpcyBzbWFsbCBhbmQvb3IgdGhlIHN0b3JhZ2UgaXMKPiBzbG93LCBhbmQgdGhlIGZsdXNoZXIg Y2Fubm90IGNsZWFuIGVub3VnaCBwYWdlcyBiZWZvcmUgdGhlIExSVSBpcwo+IGZ1bGwgc2Nhbm5l ZC4KPgo+IFRoZSBmbHVzaGVyIHdpbGwgcGlnZ3kgYmFjayBtb3JlIGRpcnR5IHBhZ2VzIGZvciBJ Two+IC0gaXQncyBtb3JlIElPIGVmZmljaWVudAo+IC0gaXQgaGVscHMgY2xlYW4gbW9yZSBwYWdl cywgYSBnb29kIG51bWJlciBvZiB0aGVtIG1heSBzaXQgaW4gdGhlIHNhbWUKPiDCoExSVSBsaXN0 IHRoYXQgaXMgYmVpbmcgc2Nhbm5lZC4KPgo+IFRvIGF2b2lkIG1lbW9yeSBhbGxvY2F0aW9ucyBh dCBwYWdlIHJlY2xhaW0sIGEgbWVtcG9vbCBpcyBjcmVhdGVkLgo+Cj4gQmFja2dyb3VuZC9wZXJp b2RpYyB3b3JrcyB3aWxsIHF1aXQgYXV0b21hdGljYWxseSAoYXMgZG9uZSBpbiBhbm90aGVyCj4g cGF0Y2gpLCBzbyBhcyB0byBjbGVhbiB0aGUgcGFnZXMgdW5kZXIgcmVjbGFpbSBBU0FQLiBIb3dl dmVyIGZvciBub3cgdGhlCj4gc3luYyB3b3JrIGNhbiBzdGlsbCBibG9jayB1cyBmb3IgbG9uZyB0 aW1lLgo+Cj4gSmFuIEthcmE6IGxpbWl0IHRoZSBzZWFyY2ggc2NvcGUuCj4KPiBDQzogSmFuIEth cmEgPGphY2tAc3VzZS5jej4KPiBDQzogUmlrIHZhbiBSaWVsIDxyaWVsQHJlZGhhdC5jb20+Cj4g Q0M6IE1lbCBHb3JtYW4gPG1lbEBsaW51eC52bmV0LmlibS5jb20+Cj4gQ0M6IE1pbmNoYW4gS2lt IDxtaW5jaGFuLmtpbUBnbWFpbC5jb20+Cj4gU2lnbmVkLW9mZi1ieTogV3UgRmVuZ2d1YW5nIDxm ZW5nZ3Vhbmcud3VAaW50ZWwuY29tPgoKSXQgc2VlbXMgdG8gYmUgZW5oYW5jZWQgdmVyc2lvbiBv ZiBvbGQgTWVsJ3MgZG9uZS4KSSBzdXBwb3J0IHRoaXMgYXBwcm9hY2ggOikgYnV0IEkgaGF2ZSBz b21lIHF1ZXN0aW9ucy4KCj4gLS0tCj4gwqBmcy9mcy13cml0ZWJhY2suYyDCoCDCoCDCoCDCoCDC oCDCoCDCoCDCoHwgwqAxNTYgKysrKysrKysrKysrKysrKysrKysrKysrKysrKy0KPiDCoGluY2x1 ZGUvbGludXgvYmFja2luZy1kZXYuaCDCoCDCoCDCoHwgwqAgwqAxCj4gwqBpbmNsdWRlL3RyYWNl L2V2ZW50cy93cml0ZWJhY2suaCB8IMKgIDE1ICsrCj4gwqBtbS92bXNjYW4uYyDCoCDCoCDCoCDC oCDCoCDCoCDCoCDCoCDCoCDCoCDCoHwgwqAgwqA4ICsKPiDCoDQgZmlsZXMgY2hhbmdlZCwgMTc0 IGluc2VydGlvbnMoKyksIDYgZGVsZXRpb25zKC0pCj4KPiAtLS0gbGludXgtbmV4dC5vcmlnL21t L3Ztc2Nhbi5jIDIwMTEtMDYtMjkgMjA6NDM6MTAuMDAwMDAwMDAwIC0wNzAwCj4gKysrIGxpbnV4 LW5leHQvbW0vdm1zY2FuLmMgwqAgwqAgwqAyMDExLTA3LTA1IDE4OjMwOjE5LjAwMDAwMDAwMCAt MDcwMAo+IEBAIC04MjUsNiArODI1LDE0IEBAIHN0YXRpYyB1bnNpZ25lZCBsb25nIHNocmlua19w YWdlX2xpc3Qoc3QKPiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoGlmIChQYWdlRGlydHkocGFnZSkp IHsKPiDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoG5yX2RpcnR5Kys7Cj4KPiAr IMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIMKgIGlmIChwYWdlX2lzX2ZpbGVfY2FjaGUo cGFnZSkgJiYgbWFwcGluZyAmJgo+ICsgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgc2MtPnJlY2xhaW1fbW9kZSAhPSBSRUNMQUlNX01PREVfU1lOQykgewo+ICsgwqAgwqAg wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgaWYgKGZsdXNoX2lub2RlX3Bh Z2UocGFnZSwgbWFwcGluZykgPj0gMCkgewo+ICsgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAg wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgU2V0UGFnZVJlY2xhaW0ocGFnZSk7Cj4gKyDC oCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCDCoCBn b3RvIGtlZXBfbG9ja2VkOwoKa2VlcF9sb2NrZWQgY2hhbmdlcyBvbGQgYmVoYXZpb3IuCk5vcm1h bGx5LCBpbiBjYXNlIG9mIGFzeW5jIG1vZGUsIHdlIGRvZXMga2VlcF9sdW1weShpZSwgd2UgZGlk bid0CnJlc2V0IHJlY2xhaW1fbW9kZSkgYnV0IG5vdyB5b3UgYXJlIGFsd2F5cyByZXNldHRpbmcg cmVjbGFpbV9tb2RlLiBzbwpzeW5jIGNhbGwgb2Ygc2hyaW5rX3BhZ2VfbGlzdCBuZXZlciBoYXBw ZW4gaWYgZmx1c2hfaW5vZGVfcGFnZSBpcwpzdWNjZXNzZnVsLgpJcyBpdCB5b3VyIGludGVudGlv bj8KCgo+ICsgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgfQo+ ICsgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgfQo+ICsKCklmIGZsdXNoX2lub2Rl X3BhZ2UgZmFpbHMoaWUsIHRoZSBwYWdlIGlzbid0IG5lYXJieSBvZiBjdXJyZW50IHdvcmsncwp3 cml0ZWJhY2sgcmFuZ2UpLCB3ZSBzdGlsbCBkbyBwYWdlb3V0IGFsdGhvdWdoIGl0J3MgYXN5bmMg bW9kZS4gSXMgaXQKeW91ciBpbnRlbnRpb24/CgotLSAKS2luZCByZWdhcmRzLApNaW5jaGFuIEtp bQoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KeGZzIG1h aWxpbmcgbGlzdAp4ZnNAb3NzLnNnaS5jb20KaHR0cDovL29zcy5zZ2kuY29tL21haWxtYW4vbGlz dGluZm8veGZzCg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta12.messagelabs.com (mail6.bemta12.messagelabs.com [216.82.250.247]) by kanga.kvack.org (Postfix) with ESMTP id 67DEF9000C2 for ; Wed, 6 Jul 2011 02:47:05 -0400 (EDT) Received: by qyk32 with SMTP id 32so2284884qyk.14 for ; Tue, 05 Jul 2011 23:47:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110706045301.GA11604@localhost> References: <20110629140109.003209430@bombadil.infradead.org> <20110629140336.950805096@bombadil.infradead.org> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701093305.GA28531@infradead.org> <20110701154136.GA17881@localhost> <20110704032534.GD1026@dastard> <20110706045301.GA11604@localhost> Date: Wed, 6 Jul 2011 15:47:02 +0900 Message-ID: Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering From: Minchan Kim Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Wu Fengguang Cc: Dave Chinner , Christoph Hellwig , Mel Gorman , Johannes Weiner , "xfs@oss.sgi.com" , "linux-mm@kvack.org" On Wed, Jul 6, 2011 at 1:53 PM, Wu Fengguang wrote= : > On Mon, Jul 04, 2011 at 11:25:34AM +0800, Dave Chinner wrote: >> On Fri, Jul 01, 2011 at 11:41:36PM +0800, Wu Fengguang wrote: >> > Christoph, >> > >> > On Fri, Jul 01, 2011 at 05:33:05PM +0800, Christoph Hellwig wrote: >> > > Johannes, Mel, Wu, >> > > >> > > Dave has been stressing some XFS patches of mine that remove the XFS >> > > internal writeback clustering in favour of using write_cache_pages. >> > > >> > > As part of investigating the behaviour he found out that we're still >> > > doing lots of I/O from the end of the LRU in kswapd. =C2=A0Not only = is that >> > > pretty bad behaviour in general, but it also means we really can't >> > > just remove the writeback clustering in writepage given how much >> > > I/O is still done through that. >> > > >> > > Any chance we could the writeback vs kswap behaviour sorted out a bi= t >> > > better finally? >> > >> > I once tried this approach: >> > >> > http://www.spinics.net/lists/linux-mm/msg09202.html >> > >> > It used a list structure that is not linearly scalable, however that >> > part should be independently improvable when necessary. >> >> I don't think that handing random writeback to the flusher thread is >> much better than doing random writeback directly. =C2=A0Yes, you added >> some clustering, but I'm still don't think writing specific pages is >> the best solution. > > I agree that the VM should avoid writing specific pages as much as > possible. Mostly often, it's indeed OK to just skip sporadically > encountered dirty page and reclaim the clean pages presumably not > far away in the LRU list. So your 2-liner patch is all good if > constraining it to low scan pressure, which will look like > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (priority =3D=3D DEF_PRIORITY) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tag PG_reclaim on = encountered dirty pages and > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0skip writing it > > However the VM in general does need the ability to write specific > pages, such as when reclaiming from specific zone/memcg. So I'll still > propose to do bdi_start_inode_writeback(). > > Below is the patch rebased to linux-next. It's good enough for testing > purpose, and I guess even with the ->nr_pages work issue, it's > complete enough to get roughly the same performance as your 2-liner > patch. > >> > The real problem was, it seem to not very effective in my test runs. >> > I found many ->nr_pages works queued before the ->inode works, which >> > effectively makes the flusher working on more dispersed pages rather >> > than focusing on the dirty pages encountered in LRU reclaim. >> >> But that's really just an implementation issue related to how you >> tried to solve the problem. That could be addressed. >> >> However, what I'm questioning is whether we should even care what >> page memory reclaim wants to write - it seems to make fundamentally >> bad decisions from an IO persepctive. >> >> We have to remember that memory reclaim is doing LRU reclaim and the >> flusher threads are doing "oldest first" writeback. IOWs, both are tryin= g >> to operate in the same direction (oldest to youngest) for the same >> purpose. =C2=A0The fundamental problem that occurs when memory reclaim >> starts writing pages back from the LRU is this: >> >> =C2=A0 =C2=A0 =C2=A0 - memory reclaim has run ahead of IO writeback - >> >> The LRU usually looks like this: >> >> =C2=A0 =C2=A0 =C2=A0 oldest =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0youngest >> =C2=A0 =C2=A0 =C2=A0 +---------------+---------------+--------------+ >> =C2=A0 =C2=A0 =C2=A0 clean =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 writeback = =C2=A0 =C2=A0 =C2=A0 dirty >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 ^ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ^ >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Where flusher wil= l next work from >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Where kswapd is w= orking from >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 IO submitted by flusher, waiting on completion >> >> >> If memory reclaim is hitting dirty pages on the LRU, it means it has >> got ahead of writeback without being throttled - it's passed over >> all the pages currently under writeback and is trying to write back >> pages that are *newer* than what writeback is working on. IOWs, it >> starts trying to do the job of the flusher threads, and it does that >> very badly. >> >> The $100 question is =E2=88=97why is it getting ahead of writeback*? > > The most important case is: faster reader + relatively slow writer. > > Assume for every 10 pages read, 1 page is dirtied, and the dirty speed > is fast enough to trigger the 20% dirty ratio and hence dirty balancing. > > That pattern is able to evenly distribute dirty pages all over the LRU > list and hence trigger lots of pageout()s. The "skip reclaim writes on > low pressure" approach can fix this case. > > Thanks, > Fengguang > --- > Subject: writeback: introduce bdi_start_inode_writeback() > Date: Thu Jul 29 14:41:19 CST 2010 > > This relays ASYNC file writeback IOs to the flusher threads. > > pageout() will continue to serve the SYNC file page writes for necessary > throttling for preventing OOM, which may happen if the LRU list is small > and/or the storage is slow, so that the flusher cannot clean enough > pages before the LRU is full scanned. > > Only ASYNC pageout() is relayed to the flusher threads, the less > frequent SYNC pageout()s will work as before as a last resort. > This helps to avoid OOM when the LRU list is small and/or the storage is > slow, and the flusher cannot clean enough pages before the LRU is > full scanned. > > The flusher will piggy back more dirty pages for IO > - it's more IO efficient > - it helps clean more pages, a good number of them may sit in the same > =C2=A0LRU list that is being scanned. > > To avoid memory allocations at page reclaim, a mempool is created. > > Background/periodic works will quit automatically (as done in another > patch), so as to clean the pages under reclaim ASAP. However for now the > sync work can still block us for long time. > > Jan Kara: limit the search scope. > > CC: Jan Kara > CC: Rik van Riel > CC: Mel Gorman > CC: Minchan Kim > Signed-off-by: Wu Fengguang It seems to be enhanced version of old Mel's done. I support this approach :) but I have some questions. > --- > =C2=A0fs/fs-writeback.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0| =C2=A0156 ++++++++++++++++++++++++++++- > =C2=A0include/linux/backing-dev.h =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A01 > =C2=A0include/trace/events/writeback.h | =C2=A0 15 ++ > =C2=A0mm/vmscan.c =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0| =C2=A0 =C2=A08 + > =C2=A04 files changed, 174 insertions(+), 6 deletions(-) > > --- linux-next.orig/mm/vmscan.c 2011-06-29 20:43:10.000000000 -0700 > +++ linux-next/mm/vmscan.c =C2=A0 =C2=A0 =C2=A02011-07-05 18:30:19.000000= 000 -0700 > @@ -825,6 +825,14 @@ static unsigned long shrink_page_list(st > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (PageDirty(page= )) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0nr_dirty++; > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 if (page_is_file_cache(page) && mapping && > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 sc->reclaim_mode !=3D RECLAIM_MODE_SYNC) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (flush_inode_page(page, mapping) >=3D= 0) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SetPageRecla= im(page); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto keep_lo= cked; keep_locked changes old behavior. Normally, in case of async mode, we does keep_lumpy(ie, we didn't reset reclaim_mode) but now you are always resetting reclaim_mode. so sync call of shrink_page_list never happen if flush_inode_page is successful. Is it your intention? > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 } > + If flush_inode_page fails(ie, the page isn't nearby of current work's writeback range), we still do pageout although it's async mode. Is it your intention? --=20 Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org